OCR-SPRIN-SERVICE

Files

Devin AI 6003d96a94 Phase 7: ground-truth export (JSONL + stats) + CLI tool

- GET /api/v1/ground-truth/export  streaming JSONL (approved_only,
  since, until, has_corrections, limit)
- GET /api/v1/ground-truth/stats   total / approved / corrections
  counts + top-N most-corrected field paths
- python -m ocr_sprint.tools.export_ground_truth  operator CLI with
  the same filters + optional --print-stats
- Ground-truth sample reconstructs the pipeline's original output by
  replaying job_corrections in reverse
- docs/ground-truth-format.md    schema + fine-tuning guidance
- 17 new tests (service replay, endpoint filters, CLI)
- 201 total tests passing, ruff / mypy --strict clean

Co-Authored-By: adrian kuman firmansah <adriancuman@gmail.com>

2026-04-25 20:24:40 +00:00

architecture.md

Phase 1 MVP: synchronous OCR + regex header extraction

2026-04-25 14:58:50 +00:00

ground-truth-format.md

Phase 7: ground-truth export (JSONL + stats) + CLI tool

2026-04-25 20:24:40 +00:00