Implements the foundation of the OCR Sprint service: - FastAPI app with /api/v1/health and /api/v1/documents (sync upload) - Pydantic v2 schemas for documents, extraction result, personnel - Pipeline: PDF/image ingest (PyMuPDF), preprocessing (resize, deskew, denoise, optional adaptive threshold), PaddleOCR wrapper, regex-based header extraction (nomor sprint, tanggal, satuan, perihal, dasar), signatory NRP, master-pangkat validation, confidence scoring + routing. - Tests: 61 unit tests covering regex rules, validators, preprocess, ingest, confidence, and API contract (PaddleOCR mocked). - Tooling: pyproject (setuptools), ruff, mypy strict, pytest, pre-commit, Dockerfile, docker-compose, Makefile. - Docs: README + docs/architecture.md (full hybrid stack rationale and 6-phase roadmap). Co-authored-by: adrian kuman firmansah <adriancuman@gmail.com>
20 lines
478 B
YAML
20 lines
478 B
YAML
repos:
|
|
- repo: https://github.com/pre-commit/pre-commit-hooks
|
|
rev: v4.6.0
|
|
hooks:
|
|
- id: trailing-whitespace
|
|
- id: end-of-file-fixer
|
|
- id: check-yaml
|
|
- id: check-toml
|
|
- id: check-added-large-files
|
|
args: ["--maxkb=1024"]
|
|
- id: check-merge-conflict
|
|
- id: detect-private-key
|
|
|
|
- repo: https://github.com/astral-sh/ruff-pre-commit
|
|
rev: v0.6.9
|
|
hooks:
|
|
- id: ruff
|
|
args: ["--fix"]
|
|
- id: ruff-format
|