Files
OCR-SPRIN-SERVICE/tests/conftest.py
devin-ai-integration[bot] 2112023b6e Phase 4: async pipeline (Celery+Redis), Postgres job state, local-fs blob storage, API-key auth, Prometheus metrics (#3)
* Phase 4: async pipeline (Celery+Redis), Postgres job state, local-fs blob storage, API-key auth, Prometheus metrics

Co-Authored-By: adrian kuman firmansah <adriancuman@gmail.com>

* Phase 4: fix sync-mode rollback orphaning blobs + use is_relative_to for path-escape check

Devin Review on PR #3 found two real bugs:

1. Sync path mark_failed was rolled back by the request-scoped session.
   When the pipeline raised an exception in ?sync=true mode, _run_inline
   modified the FastAPI session and re-raised; get_session caught the
   exception, called session.rollback(), and wiped both the create() and
   the mark_failed() writes. The blob was already on disk, so it was
   permanently orphaned with no DB record. Fix: commit the pending row
   immediately after create(), and run all subsequent state transitions in
   independent session_scope blocks (matching the worker task pattern).

2. _resolve used str.startswith for path-escape detection, which lets a
   sibling directory whose name begins with the storage root pass (e.g.
   /app/blobs_evil vs /app/blobs). Switched to Path.is_relative_to.

Added regression tests for both.

Co-Authored-By: adrian kuman firmansah <adriancuman@gmail.com>

* Phase 4: honor queue_enabled setting + resolve base_dir for path comparisons

Two more bugs found by Devin Review:

3. queue_enabled was declared in config and documented in .env.example but
   never read by the route. A fresh dev install with QUEUE_ENABLED=false
   (the default) would still enqueue, then fail with a Redis connection
   error. Fixed by making the ?sync= query param default to None and
   resolving to (not queue_enabled) inside the route. Tests now set
   QUEUE_ENABLED=true so the async flow stays exercised, and a new test
   verifies the inline fallback when the queue is disabled.

4. LocalFsBlobStorage stored base_dir as-is. _resolve resolved its
   candidate paths, so the empty-dir cleanup loop in delete() compared a
   resolved candidate against an unresolved base_dir and broke on the
   first iteration (no cleanup ever happened). Fixed by resolving base_dir
   once in __init__ so every path comparison is apples-to-apples.

Co-Authored-By: adrian kuman firmansah <adriancuman@gmail.com>

* Phase 4: derive ocr_jobs_total from DB so worker writes are visible at /metrics

Devin Review correctly noted the Counter-based JOBS_TOTAL would never
increment in production because the worker runs in a separate process from
the API and the registry is process-local. Replaced JOBS_TOTAL with a
custom Collector that issues SELECT status, COUNT(*) FROM jobs GROUP BY
status on every /metrics scrape. Result: the metric stays accurate
regardless of which process wrote the row.

Also corrected the metrics.py docstring (the old comment claimed the
counter was 'incremented by the worker', which was the bug).

Removed the JOBS_TOTAL.inc() calls from the sync route — the DB collector
covers both paths now. JOB_PROCESSING_SECONDS stays as an API-process
histogram with an updated docstring noting its scope; cross-process
latency belongs to derived dashboards over jobs.created_at/updated_at.

Added regression test test_metrics_jobs_total_reflects_worker_writes.

Co-Authored-By: adrian kuman firmansah <adriancuman@gmail.com>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: adrian kuman firmansah <adriancuman@gmail.com>
2026-04-25 16:50:51 +00:00

88 lines
3.1 KiB
Python

"""Shared pytest fixtures."""
from __future__ import annotations
import os
from collections.abc import Iterator
from pathlib import Path
import numpy as np
import pytest
@pytest.fixture(autouse=True)
def _isolated_runtime(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Iterator[None]:
"""Per-test sqlite + blob storage so tests don't share state.
Setting these env vars before ``Settings`` is first read in the test gives
each test its own DB file and blob root. We also clear the lru_cache on
`get_settings`, the engine, and the sessionmaker so the fresh paths take
effect even if a previous test already loaded settings.
"""
db_path = tmp_path / "test.sqlite"
blob_dir = tmp_path / "blobs"
monkeypatch.setenv("DATABASE_URL", f"sqlite:///{db_path}")
monkeypatch.setenv("BLOB_STORAGE_DIR", str(blob_dir))
monkeypatch.setenv("STORAGE_LOCAL_DIR", str(tmp_path / "storage"))
monkeypatch.setenv("API_KEYS", "")
# The async API path is exercised by the test suite, so default it on
# here. Production keeps ``QUEUE_ENABLED=false`` so the route falls back
# to the inline pipeline when no Redis is configured.
monkeypatch.setenv("QUEUE_ENABLED", "true")
# Force Celery to run tasks inline so we don't need a broker.
monkeypatch.setenv("CELERY_TASK_ALWAYS_EAGER", "true")
from ocr_sprint.config import get_settings
from ocr_sprint.db.base import reset_engine_cache
from ocr_sprint.worker.celery_app import celery_app
get_settings.cache_clear()
reset_engine_cache()
# `celery_app` is built once at import-time, so flip the eager flag on the
# already-instantiated instance for this test.
celery_app.conf.task_always_eager = True
celery_app.conf.task_eager_propagates = True
yield
get_settings.cache_clear()
reset_engine_cache()
os.environ.pop("CELERY_TASK_ALWAYS_EAGER", None)
@pytest.fixture
def blank_bgr_image() -> np.ndarray:
"""A 600x800 white BGR image (uint8) — useful for preprocessing smoke tests."""
return np.full((600, 800, 3), 255, dtype=np.uint8)
@pytest.fixture
def sample_sprint_text() -> str:
"""Realistic-but-synthetic OCR text for regex extractor tests."""
return (
"KEPOLISIAN NEGARA REPUBLIK INDONESIA\n"
"DAERAH JAWA BARAT\n"
"RESOR BANDUNG\n"
"\n"
"SURAT PERINTAH\n"
"Nomor : Sprin/123/IV/2025/Reskrim\n"
"\n"
"DASAR :\n"
"1. Undang-Undang Nomor 2 Tahun 2002 tentang Kepolisian Negara Republik Indonesia.\n"
"2. Peraturan Kapolri Nomor 6 Tahun 2017 tentang Susunan Organisasi.\n"
"3. Laporan Polisi Nomor LP/123/IV/2025/Reskrim tanggal 20 April 2025.\n"
"\n"
"DIPERINTAHKAN :\n"
"Kepada : 1. Nama anggota tersebut di bawah ini.\n"
"\n"
"Untuk : Melaksanakan penyelidikan tindak pidana.\n"
"\n"
"PERIHAL : Pelaksanaan penyelidikan kasus pencurian.\n"
"\n"
"Bandung, 21 April 2025\n"
"KEPALA KEPOLISIAN RESOR BANDUNG\n"
"\n"
"Drs. BUDI SANTOSO\n"
"AKBP NRP 12345678\n"
)