Files
OCR-SPRIN-SERVICE/.gitignore
Devin AI ca0c0a0428 Phase 1 MVP: synchronous OCR + regex header extraction
Implements the foundation of the OCR Sprint service:
- FastAPI app with /api/v1/health and /api/v1/documents (sync upload)
- Pydantic v2 schemas for documents, extraction result, personnel
- Pipeline: PDF/image ingest (PyMuPDF), preprocessing (resize, deskew,
  denoise, optional adaptive threshold), PaddleOCR wrapper, regex-based
  header extraction (nomor sprint, tanggal, satuan, perihal, dasar),
  signatory NRP, master-pangkat validation, confidence scoring + routing.
- Tests: 61 unit tests covering regex rules, validators, preprocess,
  ingest, confidence, and API contract (PaddleOCR mocked).
- Tooling: pyproject (setuptools), ruff, mypy strict, pytest, pre-commit,
  Dockerfile, docker-compose, Makefile.
- Docs: README + docs/architecture.md (full hybrid stack rationale and
  6-phase roadmap).

Co-authored-by: adrian kuman firmansah <adriancuman@gmail.com>
2026-04-25 14:58:50 +00:00

71 lines
671 B
Plaintext

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
dist/
*.egg-info/
*.egg
.pytest_cache/
.mypy_cache/
.ruff_cache/
.coverage
.coverage.*
htmlcov/
coverage.xml
.tox/
.nox/
# Virtual environments
.venv/
venv/
env/
ENV/
# IDE
.idea/
.vscode/
*.swp
*.swo
.DS_Store
# Environment / secrets
.env
.env.*
!.env.example
# Local data & artifacts
samples/*.pdf
samples/*.PDF
samples/*.jpg
samples/*.JPG
samples/*.jpeg
samples/*.png
samples/*.PNG
samples/*.tif
samples/*.tiff
!samples/README.md
data/local/
storage/
*.db
*.sqlite
*.sqlite3
# OCR / model caches
.paddleocr/
~/.paddleocr/
models/downloaded/
# Logs
logs/
*.log
# Docker
.docker/
# Misc
*.bak
*.tmp