Adds OpenCV-based phone-photo handling that runs before the standard
preprocessing pipeline for IMAGE source kinds (PDF renders are flat by
construction and skip this stage).
Pipeline additions in src/ocr_sprint/pipeline/document_detect.py:
- _find_document_quad: Canny + dilate + contour search, picks the
largest convex 4-point polygon above a configurable area threshold;
fails gracefully and returns None when no usable quad is found.
- _four_point_warp: orders corners (TL/TR/BR/BL via sum/diff trick)
and runs cv2.getPerspectiveTransform + warpPerspective.
- _remove_shadow: per-channel background-division (dilate + median
blur + 255 - absdiff + normalize) for uneven phone-shot lighting.
- detect_and_correct: top-level entrypoint with graceful fallback
to the original image when detection fails.
Wired into the synchronous orchestrator: only enabled for IMAGE
sources, skipped for PDF. New settings:
- preprocess_detect_document (default: true)
- preprocess_remove_shadow (default: true)
- preprocess_min_quad_area_fraction (default: 0.20)
Tests: 9 new unit tests covering corner ordering, quad detection on
synthetic skewed documents, perspective warp output sanity, shadow
removal shape preservation, full-pipeline behavior, and graceful
fallback when detection fails. 70 tests total, all green.
ML-based dewarping (DewarpNet) and DocTR detector are deferred to a
future phase per the roadmap; the existing API is structured so they
can be added as alternative backends behind DocumentDetectConfig.
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: adrian kuman firmansah <adriancuman@gmail.com>
Adds OpenCV-based phone-photo handling that runs before the standard
preprocessing pipeline for IMAGE source kinds (PDF renders are flat by
construction and skip this stage).
Pipeline additions in src/ocr_sprint/pipeline/document_detect.py:
- _find_document_quad: Canny + dilate + contour search, picks the
largest convex 4-point polygon above a configurable area threshold;
fails gracefully and returns None when no usable quad is found.
- _four_point_warp: orders corners (TL/TR/BR/BL via sum/diff trick)
and runs cv2.getPerspectiveTransform + warpPerspective.
- _remove_shadow: per-channel background-division (dilate + median
blur + 255 - absdiff + normalize) for uneven phone-shot lighting.
- detect_and_correct: top-level entrypoint with graceful fallback
to the original image when detection fails.
Wired into the synchronous orchestrator: only enabled for IMAGE
sources, skipped for PDF. New settings:
- preprocess_detect_document (default: true)
- preprocess_remove_shadow (default: true)
- preprocess_min_quad_area_fraction (default: 0.20)
Tests: 9 new unit tests covering corner ordering, quad detection on
synthetic skewed documents, perspective warp output sanity, shadow
removal shape preservation, full-pipeline behavior, and graceful
fallback when detection fails. 70 tests total, all green.
ML-based dewarping (DewarpNet) and DocTR detector are deferred to a
future phase per the roadmap; the existing API is structured so they
can be added as alternative backends behind DocumentDetectConfig.
Co-authored-by: adrian kuman firmansah <adriancuman@gmail.com>