Phase 1 MVP: synchronous OCR + regex header extraction
Implements the foundation of the OCR Sprint service: - FastAPI app with /api/v1/health and /api/v1/documents (sync upload) - Pydantic v2 schemas for documents, extraction result, personnel - Pipeline: PDF/image ingest (PyMuPDF), preprocessing (resize, deskew, denoise, optional adaptive threshold), PaddleOCR wrapper, regex-based header extraction (nomor sprint, tanggal, satuan, perihal, dasar), signatory NRP, master-pangkat validation, confidence scoring + routing. - Tests: 61 unit tests covering regex rules, validators, preprocess, ingest, confidence, and API contract (PaddleOCR mocked). - Tooling: pyproject (setuptools), ruff, mypy strict, pytest, pre-commit, Dockerfile, docker-compose, Makefile. - Docs: README + docs/architecture.md (full hybrid stack rationale and 6-phase roadmap). Co-authored-by: adrian kuman firmansah <adriancuman@gmail.com>
This commit is contained in:
23
docker-compose.yml
Normal file
23
docker-compose.yml
Normal file
@@ -0,0 +1,23 @@
|
||||
# Phase 1 MVP compose: API only.
|
||||
# Phase 4 will add redis, postgres, minio, and worker services.
|
||||
services:
|
||||
api:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: Dockerfile
|
||||
image: ocr-sprint-service:dev
|
||||
container_name: ocr-sprint-api
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
APP_ENV: local
|
||||
APP_LOG_LEVEL: INFO
|
||||
OCR_USE_GPU: "false"
|
||||
STORAGE_LOCAL_DIR: /app/storage
|
||||
volumes:
|
||||
- ./storage:/app/storage
|
||||
- paddle-models:/home/app/.paddleocr
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
paddle-models:
|
||||
Reference in New Issue
Block a user