Adds OpenCV-based phone-photo handling that runs before the standard preprocessing pipeline for IMAGE source kinds (PDF renders are flat by construction and skip this stage). Pipeline additions in src/ocr_sprint/pipeline/document_detect.py: - _find_document_quad: Canny + dilate + contour search, picks the largest convex 4-point polygon above a configurable area threshold; fails gracefully and returns None when no usable quad is found. - _four_point_warp: orders corners (TL/TR/BR/BL via sum/diff trick) and runs cv2.getPerspectiveTransform + warpPerspective. - _remove_shadow: per-channel background-division (dilate + median blur + 255 - absdiff + normalize) for uneven phone-shot lighting. - detect_and_correct: top-level entrypoint with graceful fallback to the original image when detection fails. Wired into the synchronous orchestrator: only enabled for IMAGE sources, skipped for PDF. New settings: - preprocess_detect_document (default: true) - preprocess_remove_shadow (default: true) - preprocess_min_quad_area_fraction (default: 0.20) Tests: 9 new unit tests covering corner ordering, quad detection on synthetic skewed documents, perspective warp output sanity, shadow removal shape preservation, full-pipeline behavior, and graceful fallback when detection fails. 70 tests total, all green. ML-based dewarping (DewarpNet) and DocTR detector are deferred to a future phase per the roadmap; the existing API is structured so they can be added as alternative backends behind DocumentDetectConfig. Co-authored-by: adrian kuman firmansah <adriancuman@gmail.com>
49 lines
1.4 KiB
Plaintext
49 lines
1.4 KiB
Plaintext
# ==== App ====
|
|
APP_ENV=local # local | dev | staging | prod
|
|
APP_HOST=0.0.0.0
|
|
APP_PORT=8000
|
|
APP_LOG_LEVEL=INFO
|
|
|
|
# ==== Storage (Phase 1: local filesystem) ====
|
|
STORAGE_LOCAL_DIR=./storage
|
|
|
|
# ==== OCR ====
|
|
OCR_LANG=latin # PaddleOCR lang code; "latin" works well for Bahasa Indonesia
|
|
OCR_USE_GPU=false # set true if running on a GPU host
|
|
OCR_DET_MODEL_DIR= # leave empty to use PaddleOCR defaults
|
|
OCR_REC_MODEL_DIR=
|
|
OCR_CLS_MODEL_DIR=
|
|
OCR_MAX_IMAGE_SIDE=2200 # downscale longest side before OCR
|
|
|
|
# ==== Preprocessing ====
|
|
PREPROCESS_TARGET_DPI=300
|
|
PREPROCESS_DENOISE=true
|
|
PREPROCESS_DESKEW=true
|
|
PREPROCESS_ADAPTIVE_THRESHOLD=false # turn on for low-quality phone photos
|
|
|
|
# ==== Document detection (Phase 2, IMAGE sources only) ====
|
|
PREPROCESS_DETECT_DOCUMENT=true
|
|
PREPROCESS_REMOVE_SHADOW=true
|
|
PREPROCESS_MIN_QUAD_AREA_FRACTION=0.20
|
|
|
|
# ==== Confidence / routing (Phase 5) ====
|
|
CONFIDENCE_AUTO_APPROVE=0.95
|
|
CONFIDENCE_NEEDS_REVIEW=0.85
|
|
|
|
# ==== LLM (Phase 5, optional) ====
|
|
LLM_ENABLED=false
|
|
LLM_PROVIDER=ollama
|
|
LLM_MODEL=qwen2.5:1.5b # CPU-friendly default
|
|
LLM_BASE_URL=http://localhost:11434
|
|
LLM_TIMEOUT_S=60
|
|
|
|
# ==== Async pipeline (Phase 4, optional) ====
|
|
QUEUE_ENABLED=false
|
|
REDIS_URL=redis://localhost:6379/0
|
|
DATABASE_URL=postgresql+psycopg://ocr:ocr@localhost:5432/ocr_sprint
|
|
MINIO_ENDPOINT=localhost:9000
|
|
MINIO_ACCESS_KEY=minioadmin
|
|
MINIO_SECRET_KEY=minioadmin
|
|
MINIO_BUCKET=ocr-sprint
|
|
MINIO_SECURE=false
|