Files
OCR-SPRIN-SERVICE/.env.example
Devin AI d0e1835cc1 Phase 2: document detection + perspective correction + shadow removal
Adds OpenCV-based phone-photo handling that runs before the standard
preprocessing pipeline for IMAGE source kinds (PDF renders are flat by
construction and skip this stage).

Pipeline additions in src/ocr_sprint/pipeline/document_detect.py:
- _find_document_quad: Canny + dilate + contour search, picks the
  largest convex 4-point polygon above a configurable area threshold;
  fails gracefully and returns None when no usable quad is found.
- _four_point_warp: orders corners (TL/TR/BR/BL via sum/diff trick)
  and runs cv2.getPerspectiveTransform + warpPerspective.
- _remove_shadow: per-channel background-division (dilate + median
  blur + 255 - absdiff + normalize) for uneven phone-shot lighting.
- detect_and_correct: top-level entrypoint with graceful fallback
  to the original image when detection fails.

Wired into the synchronous orchestrator: only enabled for IMAGE
sources, skipped for PDF. New settings:
- preprocess_detect_document (default: true)
- preprocess_remove_shadow (default: true)
- preprocess_min_quad_area_fraction (default: 0.20)

Tests: 9 new unit tests covering corner ordering, quad detection on
synthetic skewed documents, perspective warp output sanity, shadow
removal shape preservation, full-pipeline behavior, and graceful
fallback when detection fails. 70 tests total, all green.

ML-based dewarping (DewarpNet) and DocTR detector are deferred to a
future phase per the roadmap; the existing API is structured so they
can be added as alternative backends behind DocumentDetectConfig.

Co-authored-by: adrian kuman firmansah <adriancuman@gmail.com>
2026-04-25 15:06:58 +00:00

49 lines
1.4 KiB
Plaintext

# ==== App ====
APP_ENV=local # local | dev | staging | prod
APP_HOST=0.0.0.0
APP_PORT=8000
APP_LOG_LEVEL=INFO
# ==== Storage (Phase 1: local filesystem) ====
STORAGE_LOCAL_DIR=./storage
# ==== OCR ====
OCR_LANG=latin # PaddleOCR lang code; "latin" works well for Bahasa Indonesia
OCR_USE_GPU=false # set true if running on a GPU host
OCR_DET_MODEL_DIR= # leave empty to use PaddleOCR defaults
OCR_REC_MODEL_DIR=
OCR_CLS_MODEL_DIR=
OCR_MAX_IMAGE_SIDE=2200 # downscale longest side before OCR
# ==== Preprocessing ====
PREPROCESS_TARGET_DPI=300
PREPROCESS_DENOISE=true
PREPROCESS_DESKEW=true
PREPROCESS_ADAPTIVE_THRESHOLD=false # turn on for low-quality phone photos
# ==== Document detection (Phase 2, IMAGE sources only) ====
PREPROCESS_DETECT_DOCUMENT=true
PREPROCESS_REMOVE_SHADOW=true
PREPROCESS_MIN_QUAD_AREA_FRACTION=0.20
# ==== Confidence / routing (Phase 5) ====
CONFIDENCE_AUTO_APPROVE=0.95
CONFIDENCE_NEEDS_REVIEW=0.85
# ==== LLM (Phase 5, optional) ====
LLM_ENABLED=false
LLM_PROVIDER=ollama
LLM_MODEL=qwen2.5:1.5b # CPU-friendly default
LLM_BASE_URL=http://localhost:11434
LLM_TIMEOUT_S=60
# ==== Async pipeline (Phase 4, optional) ====
QUEUE_ENABLED=false
REDIS_URL=redis://localhost:6379/0
DATABASE_URL=postgresql+psycopg://ocr:ocr@localhost:5432/ocr_sprint
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_BUCKET=ocr-sprint
MINIO_SECURE=false