Files
OCR-SPRIN-SERVICE/docs/DEPLOYMENT.md

9.2 KiB

Quickstart Deployment OCR Sprint Service

Panduan deployment OCR Sprint Service ke server production untuk pemrosesan dokumen surat sprint Polri.

Prasyarat Server

Spesifikasi Minimum

  • OS: Linux (Ubuntu 20.04+ / Debian 11+ / RHEL 8+)
  • CPU: 4 cores (8 cores recommended untuk throughput tinggi)
  • RAM: 8 GB minimum (16 GB recommended)
  • Storage: 50 GB free space
    • ~3 GB untuk model PaddleOCR
    • ~1.5 GB untuk dependencies Python
    • Sisanya untuk blob storage dokumen
  • Network: Port 8000 terbuka untuk API access

Software Requirements

  • Docker 24.0+ dan Docker Compose v2
  • Git
  • (Opsional) Nginx/Caddy untuk reverse proxy + SSL

1. Clone Repository

# Login ke server sebagai user non-root dengan sudo access
ssh user@your-server.com

# Clone repository
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service

2. Konfigurasi Environment

# Copy template environment
cp .env.example .env

# Edit konfigurasi production
nano .env

Konfigurasi penting untuk production:

# ==== App ====
APP_ENV=prod
APP_LOG_LEVEL=INFO

# ==== Storage ====
STORAGE_LOCAL_DIR=/app/storage
BLOB_STORAGE_DIR=/app/storage/blobs
BLOB_MAX_UPLOAD_MB=25

# ==== OCR ====
OCR_LANG=latin
OCR_USE_GPU=false              # set true jika server punya GPU NVIDIA
OCR_MAX_IMAGE_SIDE=2200

# ==== Preprocessing ====
PREPROCESS_TARGET_DPI=300
PREPROCESS_DENOISE=true
PREPROCESS_DESKEW=true
PREPROCESS_DETECT_DOCUMENT=true
PREPROCESS_REMOVE_SHADOW=true

# ==== Table Extraction ====
TABLES_ENABLED=true

# ==== Async Pipeline ====
QUEUE_ENABLED=true
REDIS_URL=redis://redis:6379/0
CELERY_TASK_DEFAULT_QUEUE=ocr_sprint

# ==== Database ====
DATABASE_URL=postgresql+psycopg://ocr:ocr@postgres:5432/ocr_sprint
DATABASE_ECHO=false

# ==== Auth (WAJIB untuk production!) ====
API_KEYS=your-secret-key-1,your-secret-key-2
API_KEY_HEADER=X-API-Key

Generate API keys yang aman:

# Generate random API key
openssl rand -hex 32

3. Build dan Start Services

# Build Docker images
docker compose build

# Start semua services (API, Worker, Redis, Postgres)
docker compose up -d

# Cek logs untuk memastikan semua berjalan
docker compose logs -f api worker

Services yang berjalan:

  • api: FastAPI server di port 8000
  • worker: Celery worker untuk async processing
  • redis: Message broker untuk job queue
  • postgres: Database untuk job state

4. Verifikasi Deployment

# Health check
curl http://localhost:8000/api/v1/health

# Expected response:
# {"status":"ok","version":"0.1.0"}

# Test OCR endpoint (sync mode untuk testing)
curl -X POST http://localhost:8000/api/v1/documents?sync=true \
  -H "X-API-Key: your-secret-key-1" \
  -F "file=@samples/pdf/example.pdf" \
  | jq

5. Setup Reverse Proxy (Nginx)

Install Nginx:

sudo apt update
sudo apt install nginx certbot python3-certbot-nginx

Konfigurasi Nginx (/etc/nginx/sites-available/ocr-sprint):

upstream ocr_api {
    server localhost:8000;
}

server {
    listen 80;
    server_name ocr.yourdomain.com;

    client_max_body_size 30M;  # Sesuaikan dengan BLOB_MAX_UPLOAD_MB

    location / {
        proxy_pass http://ocr_api;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # Timeout untuk dokumen besar
        proxy_read_timeout 300s;
        proxy_connect_timeout 75s;
    }

    location /metrics {
        # Restrict metrics endpoint
        allow 10.0.0.0/8;  # Internal network only
        deny all;
        proxy_pass http://ocr_api;
    }
}

Enable site dan setup SSL:

# Enable site
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

# Setup SSL dengan Let's Encrypt
sudo certbot --nginx -d ocr.yourdomain.com

Deployment Manual (Tanpa Docker)

1. Install System Dependencies

# Ubuntu/Debian
sudo apt update
sudo apt install -y \
    python3.11 python3.11-venv python3-pip \
    libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \
    libgomp1 libmagic1 \
    redis-server postgresql-14

# Start services
sudo systemctl enable --now redis-server postgresql

2. Setup Database

# Create database dan user
sudo -u postgres psql << EOF
CREATE USER ocr WITH PASSWORD 'your-secure-password';
CREATE DATABASE ocr_sprint OWNER ocr;
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
EOF

3. Install Application

# Clone repository
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -e ".[ocr]"

# Copy dan edit .env
cp .env.example .env
nano .env

Update DATABASE_URL di .env:

DATABASE_URL=postgresql+psycopg://ocr:your-secure-password@localhost:5432/ocr_sprint
REDIS_URL=redis://localhost:6379/0
QUEUE_ENABLED=true

4. Run Database Migrations

alembic upgrade head

5. Setup Systemd Services

API Service (/etc/systemd/system/ocr-sprint-api.service):

[Unit]
Description=OCR Sprint API
After=network.target postgresql.service redis.service

[Service]
Type=simple
User=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin"
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000 --workers 4
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Worker Service (/etc/systemd/system/ocr-sprint-worker.service):

[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis.service

[Service]
Type=simple
User=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin"
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery -A ocr_sprint.worker.celery_app worker -l info --concurrency=2
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable dan start services:

sudo systemctl daemon-reload
sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker
sudo systemctl status ocr-sprint-api ocr-sprint-worker

Monitoring dan Maintenance

Monitoring Logs

# Docker deployment
docker compose logs -f api worker

# Manual deployment
sudo journalctl -u ocr-sprint-api -f
sudo journalctl -u ocr-sprint-worker -f

Prometheus Metrics

Metrics tersedia di endpoint /metrics:

curl http://localhost:8000/metrics

Key metrics:

  • ocr_documents_total: Total dokumen diproses
  • ocr_processing_duration_seconds: Durasi processing
  • ocr_confidence_score: Distribusi confidence score
  • celery_task_*: Celery worker metrics

Backup Database

# Docker deployment
docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql

# Manual deployment
pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql

Update Service

# Docker deployment
cd ocr-sprint-service
git pull
docker compose build
docker compose up -d

# Manual deployment
cd ocr-sprint-service
git pull
source .venv/bin/activate
pip install -e ".[ocr]"
alembic upgrade head
sudo systemctl restart ocr-sprint-api ocr-sprint-worker

Troubleshooting

Service tidak start

# Cek logs
docker compose logs api worker

# Cek health check
curl http://localhost:8000/api/v1/health

PaddleOCR model download gagal

# Download manual ke volume
docker compose exec api python -c "from paddleocr import PaddleOCR; PaddleOCR(use_angle_cls=True, lang='latin')"

Worker tidak memproses jobs

# Cek Redis connection
docker compose exec worker redis-cli -h redis ping

# Cek Celery worker status
docker compose exec worker celery -A ocr_sprint.worker.celery_app inspect active

Database migration error

# Cek current revision
docker compose exec api alembic current

# Force upgrade
docker compose exec api alembic upgrade head

Out of memory

# Kurangi worker concurrency di docker-compose.yml
# Ubah: --concurrency=1 (default) atau tambahkan memory limit

Security Checklist

  • API_KEYS diset dengan nilai random yang kuat
  • Firewall configured (hanya port 80/443 terbuka)
  • SSL/TLS enabled via Nginx + Let's Encrypt
  • Database password diganti dari default
  • /metrics endpoint restricted ke internal network
  • Regular backup database dan blob storage
  • Log rotation configured
  • OS security updates enabled

Performance Tuning

Untuk throughput tinggi:

  1. Increase worker concurrency:

    # docker-compose.yml
    command: ["celery", "-A", "ocr_sprint.worker.celery_app", "worker", "-l", "info", "--concurrency=4"]
    
  2. Scale workers horizontally:

    docker compose up -d --scale worker=3
    
  3. Enable GPU (jika tersedia):

    # .env
    OCR_USE_GPU=true
    
  4. Tune Postgres:

    -- Increase connection pool
    ALTER SYSTEM SET max_connections = 200;
    ALTER SYSTEM SET shared_buffers = '2GB';
    

Support

Untuk pertanyaan atau issues, hubungi tim development atau buat issue di repository.