Phase 1 MVP: synchronous OCR + regex header extraction
Implements the foundation of the OCR Sprint service: - FastAPI app with /api/v1/health and /api/v1/documents (sync upload) - Pydantic v2 schemas for documents, extraction result, personnel - Pipeline: PDF/image ingest (PyMuPDF), preprocessing (resize, deskew, denoise, optional adaptive threshold), PaddleOCR wrapper, regex-based header extraction (nomor sprint, tanggal, satuan, perihal, dasar), signatory NRP, master-pangkat validation, confidence scoring + routing. - Tests: 61 unit tests covering regex rules, validators, preprocess, ingest, confidence, and API contract (PaddleOCR mocked). - Tooling: pyproject (setuptools), ruff, mypy strict, pytest, pre-commit, Dockerfile, docker-compose, Makefile. - Docs: README + docs/architecture.md (full hybrid stack rationale and 6-phase roadmap). Co-authored-by: adrian kuman firmansah <adriancuman@gmail.com>
This commit is contained in:
43
tests/conftest.py
Normal file
43
tests/conftest.py
Normal file
@@ -0,0 +1,43 @@
|
||||
"""Shared pytest fixtures."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import numpy as np
|
||||
import pytest
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def blank_bgr_image() -> np.ndarray:
|
||||
"""A 600x800 white BGR image (uint8) — useful for preprocessing smoke tests."""
|
||||
return np.full((600, 800, 3), 255, dtype=np.uint8)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_sprint_text() -> str:
|
||||
"""Realistic-but-synthetic OCR text for regex extractor tests."""
|
||||
return (
|
||||
"KEPOLISIAN NEGARA REPUBLIK INDONESIA\n"
|
||||
"DAERAH JAWA BARAT\n"
|
||||
"RESOR BANDUNG\n"
|
||||
"\n"
|
||||
"SURAT PERINTAH\n"
|
||||
"Nomor : Sprin/123/IV/2025/Reskrim\n"
|
||||
"\n"
|
||||
"DASAR :\n"
|
||||
"1. Undang-Undang Nomor 2 Tahun 2002 tentang Kepolisian Negara Republik Indonesia.\n"
|
||||
"2. Peraturan Kapolri Nomor 6 Tahun 2017 tentang Susunan Organisasi.\n"
|
||||
"3. Laporan Polisi Nomor LP/123/IV/2025/Reskrim tanggal 20 April 2025.\n"
|
||||
"\n"
|
||||
"DIPERINTAHKAN :\n"
|
||||
"Kepada : 1. Nama anggota tersebut di bawah ini.\n"
|
||||
"\n"
|
||||
"Untuk : Melaksanakan penyelidikan tindak pidana.\n"
|
||||
"\n"
|
||||
"PERIHAL : Pelaksanaan penyelidikan kasus pencurian.\n"
|
||||
"\n"
|
||||
"Bandung, 21 April 2025\n"
|
||||
"KEPALA KEPOLISIAN RESOR BANDUNG\n"
|
||||
"\n"
|
||||
"Drs. BUDI SANTOSO\n"
|
||||
"AKBP NRP 12345678\n"
|
||||
)
|
||||
Reference in New Issue
Block a user