Files
OCR-SPRIN-SERVICE/.env.example
Devin AI ca0c0a0428 Phase 1 MVP: synchronous OCR + regex header extraction
Implements the foundation of the OCR Sprint service:
- FastAPI app with /api/v1/health and /api/v1/documents (sync upload)
- Pydantic v2 schemas for documents, extraction result, personnel
- Pipeline: PDF/image ingest (PyMuPDF), preprocessing (resize, deskew,
  denoise, optional adaptive threshold), PaddleOCR wrapper, regex-based
  header extraction (nomor sprint, tanggal, satuan, perihal, dasar),
  signatory NRP, master-pangkat validation, confidence scoring + routing.
- Tests: 61 unit tests covering regex rules, validators, preprocess,
  ingest, confidence, and API contract (PaddleOCR mocked).
- Tooling: pyproject (setuptools), ruff, mypy strict, pytest, pre-commit,
  Dockerfile, docker-compose, Makefile.
- Docs: README + docs/architecture.md (full hybrid stack rationale and
  6-phase roadmap).

Co-authored-by: adrian kuman firmansah <adriancuman@gmail.com>
2026-04-25 14:58:50 +00:00

44 lines
1.3 KiB
Plaintext

# ==== App ====
APP_ENV=local # local | dev | staging | prod
APP_HOST=0.0.0.0
APP_PORT=8000
APP_LOG_LEVEL=INFO
# ==== Storage (Phase 1: local filesystem) ====
STORAGE_LOCAL_DIR=./storage
# ==== OCR ====
OCR_LANG=latin # PaddleOCR lang code; "latin" works well for Bahasa Indonesia
OCR_USE_GPU=false # set true if running on a GPU host
OCR_DET_MODEL_DIR= # leave empty to use PaddleOCR defaults
OCR_REC_MODEL_DIR=
OCR_CLS_MODEL_DIR=
OCR_MAX_IMAGE_SIDE=2200 # downscale longest side before OCR
# ==== Preprocessing ====
PREPROCESS_TARGET_DPI=300
PREPROCESS_DENOISE=true
PREPROCESS_DESKEW=true
PREPROCESS_ADAPTIVE_THRESHOLD=false # turn on for low-quality phone photos
# ==== Confidence / routing (Phase 5) ====
CONFIDENCE_AUTO_APPROVE=0.95
CONFIDENCE_NEEDS_REVIEW=0.85
# ==== LLM (Phase 5, optional) ====
LLM_ENABLED=false
LLM_PROVIDER=ollama
LLM_MODEL=qwen2.5:1.5b # CPU-friendly default
LLM_BASE_URL=http://localhost:11434
LLM_TIMEOUT_S=60
# ==== Async pipeline (Phase 4, optional) ====
QUEUE_ENABLED=false
REDIS_URL=redis://localhost:6379/0
DATABASE_URL=postgresql+psycopg://ocr:ocr@localhost:5432/ocr_sprint
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_BUCKET=ocr-sprint
MINIO_SECURE=false