Implements the foundation of the OCR Sprint service: - FastAPI app with /api/v1/health and /api/v1/documents (sync upload) - Pydantic v2 schemas for documents, extraction result, personnel - Pipeline: PDF/image ingest (PyMuPDF), preprocessing (resize, deskew, denoise, optional adaptive threshold), PaddleOCR wrapper, regex-based header extraction (nomor sprint, tanggal, satuan, perihal, dasar), signatory NRP, master-pangkat validation, confidence scoring + routing. - Tests: 61 unit tests covering regex rules, validators, preprocess, ingest, confidence, and API contract (PaddleOCR mocked). - Tooling: pyproject (setuptools), ruff, mypy strict, pytest, pre-commit, Dockerfile, docker-compose, Makefile. - Docs: README + docs/architecture.md (full hybrid stack rationale and 6-phase roadmap). Co-authored-by: adrian kuman firmansah <adriancuman@gmail.com>
71 lines
671 B
Plaintext
71 lines
671 B
Plaintext
# Python
|
|
__pycache__/
|
|
*.py[cod]
|
|
*$py.class
|
|
*.so
|
|
.Python
|
|
build/
|
|
dist/
|
|
*.egg-info/
|
|
*.egg
|
|
.pytest_cache/
|
|
.mypy_cache/
|
|
.ruff_cache/
|
|
.coverage
|
|
.coverage.*
|
|
htmlcov/
|
|
coverage.xml
|
|
.tox/
|
|
.nox/
|
|
|
|
# Virtual environments
|
|
.venv/
|
|
venv/
|
|
env/
|
|
ENV/
|
|
|
|
# IDE
|
|
.idea/
|
|
.vscode/
|
|
*.swp
|
|
*.swo
|
|
.DS_Store
|
|
|
|
# Environment / secrets
|
|
.env
|
|
.env.*
|
|
!.env.example
|
|
|
|
# Local data & artifacts
|
|
samples/*.pdf
|
|
samples/*.PDF
|
|
samples/*.jpg
|
|
samples/*.JPG
|
|
samples/*.jpeg
|
|
samples/*.png
|
|
samples/*.PNG
|
|
samples/*.tif
|
|
samples/*.tiff
|
|
!samples/README.md
|
|
data/local/
|
|
storage/
|
|
*.db
|
|
*.sqlite
|
|
*.sqlite3
|
|
|
|
# OCR / model caches
|
|
.paddleocr/
|
|
~/.paddleocr/
|
|
models/downloaded/
|
|
|
|
# Logs
|
|
logs/
|
|
*.log
|
|
|
|
# Docker
|
|
.docker/
|
|
|
|
# Misc
|
|
*.bak
|
|
*.tmp
|