Files
OCR-SPRIN-SERVICE/docs/DEPLOYMENT-GUIDE.md

14 KiB
Raw Blame History

Panduan Deployment OCR Sprint Service

Dokumen ini adalah panduan langkah-langkah deployment ocr-sprint-service ke server production. Disusun berdasarkan kondisi kodingan aktual per April 2026 (Phase 14 selesai).


Daftar Isi

  1. Gambaran Arsitektur
  2. Prasyarat Server
  3. Opsi A — Docker Compose (Recommended)
  4. Opsi B — Manual (Tanpa Docker)
  5. Konfigurasi Environment Production
  6. Reverse Proxy & SSL (Nginx)
  7. Firewall
  8. Verifikasi Deployment
  9. Monitoring & Maintenance
  10. Troubleshooting
  11. Security Checklist

1. Gambaran Arsitektur

┌──────────┐     ┌──────────────┐     ┌───────┐
│  Client  │────▶│  Nginx (SSL) │────▶│  API  │──▶ PaddleOCR
└──────────┘     └──────────────┘     │ :8000 │      Pipeline
                                      └───┬───┘
                                          │ async job
                                    ┌─────▼─────┐
                                    │   Redis    │
                                    │   :6379    │
                                    └─────┬─────┘
                                    ┌─────▼──────┐
                                    │   Worker   │──▶ PaddleOCR
                                    │  (Celery)  │      Pipeline
                                    └─────┬──────┘
                                    ┌─────▼──────┐
                                    │ PostgreSQL │
                                    │   :5432    │
                                    └────────────┘

4 services yang harus berjalan:

Service Fungsi
API (FastAPI + Uvicorn) Menerima upload dokumen, serve hasil OCR
Worker (Celery) Async OCR processing di background
Redis Message broker untuk job queue
PostgreSQL Menyimpan job state & hasil ekstraksi

Blob storage menggunakan local filesystem (belum S3/MinIO).


2. Prasyarat Server

Spesifikasi Minimum

Resource Minimum Recommended
OS Ubuntu 20.04+ / Debian 11+ Ubuntu 22.04+
CPU 4 cores 8 cores
RAM 8 GB 16 GB
Storage 50 GB free 100 GB free
Python 3.103.12 3.11 atau 3.12
Network Port 8000 (internal) + Port 80/443 (Nginx)

Kebutuhan Disk

  • ~1.5 GB — PaddlePaddle wheels
  • ~200 MB — PaddleOCR model downloads (otomatis saat pertama jalan)
  • Sisanya — blob storage dokumen yang diupload

Software yang Dibutuhkan

  • Docker Compose — untuk Opsi A
  • Python 3.103.12 + PostgreSQL + Redis — untuk Opsi B
  • Git — kedua opsi
  • Nginx (opsional) — reverse proxy + SSL

Cara paling cepat. Semua service (API, Worker, Redis, Postgres) berjalan dalam container.

3.1 Login & Clone

ssh user@your-server.com

git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service

3.2 Konfigurasi .env

cp .env.example .env
nano .env

Lihat Bagian 5 untuk detail konfigurasi production.

Important

Untuk Docker Compose, jangan ubah DATABASE_URL dan REDIS_URL — sudah dioverride oleh docker-compose.yml via environment variables di masing-masing container.

3.3 Build & Start

# Build image (~510 menit pertama kali)
docker compose build

# Start semua services
docker compose up -d

# Cek logs
docker compose logs -f api worker

Container api akan otomatis menjalankan alembic upgrade head sebelum start server (lihat command di docker-compose.yml).

3.4 First-Run Model Download

Request pertama akan trigger download model PaddleOCR (~200 MB) ke Docker volume paddle-models. Tunggu hingga selesai sebelum test.

# Monitor download di logs
docker compose logs -f api

3.5 Verifikasi

curl http://localhost:8000/api/v1/health
# Expected: {"status":"ok","version":"0.1.0"}

3.6 Update Service (Setelah Ada Perubahan Kode)

cd ocr-sprint-service
git pull
docker compose build
docker compose up -d

4. Opsi B — Manual (Tanpa Docker)

Untuk server yang sudah punya Python, PostgreSQL, dan Redis terinstall.

4.1 Install System Libraries

sudo apt update && sudo apt upgrade -y

# Libraries untuk OpenCV & PaddleOCR
sudo apt install -y \
    python3.11 python3.11-venv python3.11-dev \
    libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \
    libgomp1 libmagic1 \
    build-essential git curl

# Install Redis & PostgreSQL (jika belum ada)
sudo apt install -y redis-server postgresql postgresql-contrib
sudo systemctl enable --now redis-server postgresql

Note

Jika server sudah punya Python 3.12, gunakan python3.12 di semua perintah selanjutnya.

4.2 Setup Database

sudo -u postgres psql
CREATE USER ocr WITH PASSWORD 'ganti-password-kuat';
CREATE DATABASE ocr_sprint OWNER ocr;
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
\c ocr_sprint
GRANT ALL ON SCHEMA public TO ocr;
\q

4.3 Create Application User & Directory

sudo useradd -m -s /bin/bash ocr
sudo mkdir -p /opt/ocr-sprint-service
sudo chown ocr:ocr /opt/ocr-sprint-service

4.4 Clone & Install

sudo su - ocr
cd /opt
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service

# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install dependencies + OCR runtime (~1.5 GB download)
pip install --upgrade pip setuptools wheel
pip install -e ".[ocr]"

# Verify
python -c "import paddleocr; print('PaddleOCR OK')"
python -c "import fastapi; print('FastAPI OK')"

4.5 Konfigurasi .env

cp .env.example .env
nano .env

Wajib diubah untuk manual deployment:

APP_ENV=prod
DATABASE_URL=postgresql+psycopg://ocr:ganti-password-kuat@localhost:5432/ocr_sprint
REDIS_URL=redis://localhost:6379/0
QUEUE_ENABLED=true
API_KEYS=your-generated-api-key
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
# Create storage directories
mkdir -p /opt/ocr-sprint-service/storage/blobs

4.6 Run Database Migrations

source .venv/bin/activate
alembic upgrade head
alembic current  # verify

4.7 Test Manual

uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
# Di terminal lain: curl http://localhost:8000/api/v1/health
# Ctrl+C untuk stop

4.8 Setup Systemd Services

API Service/etc/systemd/system/ocr-sprint-api.service:

[Unit]
Description=OCR Sprint API Service
After=network.target postgresql.service redis-server.service

[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
    ocr_sprint.main:app \
    --host 0.0.0.0 --port 8000 --workers 4 --log-level info
Restart=always
RestartSec=10
LimitNOFILE=65536
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

Worker Service/etc/systemd/system/ocr-sprint-worker.service:

[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis-server.service

[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
    -A ocr_sprint.worker.celery_app worker \
    --loglevel=info --concurrency=2 --max-tasks-per-child=100
Restart=always
RestartSec=10
LimitNOFILE=65536
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

Enable & Start:

# Keluar dari user ocr dulu
exit

sudo systemctl daemon-reload
sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker
sudo systemctl status ocr-sprint-api ocr-sprint-worker

4.9 Update Service (Manual)

sudo su - ocr
cd /opt/ocr-sprint-service
git pull
source .venv/bin/activate
pip install -e ".[ocr]"
alembic upgrade head
exit

sudo systemctl restart ocr-sprint-api ocr-sprint-worker

5. Konfigurasi Environment Production

Berikut konfigurasi .env yang wajib diubah dari default untuk production:

Variable Default Production Keterangan
APP_ENV local prod Mode environment
API_KEYS (kosong) key1,key2 WAJIB! Auth disabled jika kosong
QUEUE_ENABLED false true Aktifkan async processing
DATABASE_URL sqlite:///... postgresql+psycopg://... Docker: otomatis di-override
REDIS_URL redis://localhost:6379/0 Sesuaikan Docker: otomatis di-override
OCR_USE_GPU false true jika ada GPU Mode GPU butuh NVIDIA driver
TABLES_ENABLED true true Ekstraksi tabel personel

Generate API Key:

openssl rand -hex 32

Warning

Jangan pernah deploy ke production tanpa mengisi API_KEYS. Jika kosong, semua endpoint terbuka tanpa autentikasi.


6. Reverse Proxy & SSL (Nginx)

Install

sudo apt install -y nginx certbot python3-certbot-nginx

Konfigurasi — /etc/nginx/sites-available/ocr-sprint

upstream ocr_api {
    server 127.0.0.1:8000;
    keepalive 32;
}

server {
    listen 80;
    server_name ocr.yourdomain.com;

    client_max_body_size 30M;

    proxy_connect_timeout 300s;
    proxy_read_timeout 300s;

    location / {
        proxy_pass http://ocr_api;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    location /metrics {
        allow 127.0.0.1;
        allow 10.0.0.0/8;
        deny all;
        proxy_pass http://ocr_api;
    }
}

Enable & SSL

sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

# SSL
sudo certbot --nginx -d ocr.yourdomain.com

7. Firewall

sudo ufw allow 22/tcp    # SSH — PENTING!
sudo ufw allow 80/tcp    # HTTP
sudo ufw allow 443/tcp   # HTTPS
sudo ufw enable
sudo ufw status

Caution

Pastikan SSH (port 22) di-allow sebelum enable firewall, agar tidak terkunci dari server.


8. Verifikasi Deployment

Health Check

curl http://localhost:8000/api/v1/health
# {"status":"ok","version":"0.1.0"}

Test OCR (Sync)

curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \
  -H "X-API-Key: your-api-key" \
  -F "file=@/path/to/test.pdf" | jq

Test OCR (Async — Production Flow)

# Submit job
curl -X POST http://localhost:8000/api/v1/documents \
  -H "X-API-Key: your-api-key" \
  -F "file=@document.pdf" | jq
# → {"job_id":"8f2a...","status":"pending",...}

# Poll result
curl -H "X-API-Key: your-api-key" \
  http://localhost:8000/api/v1/documents/8f2a... | jq
# → {"status":"completed","confidence":0.93,"data":{...}}

Cek Semua Service Berjalan

# Docker
docker compose ps

# Manual
sudo systemctl status ocr-sprint-api ocr-sprint-worker postgresql redis-server nginx

9. Monitoring & Maintenance

Logs

# Docker
docker compose logs -f api worker

# Manual (systemd)
sudo journalctl -u ocr-sprint-api -f
sudo journalctl -u ocr-sprint-worker -f

Prometheus Metrics

curl http://localhost:8000/metrics

Metrics penting: ocr_documents_total, ocr_processing_duration_seconds, ocr_confidence_score.

Backup Database

# Docker
docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql

# Manual
pg_dump -U ocr -h localhost ocr_sprint | gzip > backup_$(date +%Y%m%d).sql.gz

Automated Backup (Cron)

# /opt/ocr-sprint-service/backup.sh
#!/bin/bash
BACKUP_DIR="/opt/ocr-sprint-service/backups"
mkdir -p $BACKUP_DIR
pg_dump -U ocr -h localhost ocr_sprint | gzip > $BACKUP_DIR/db_$(date +%Y%m%d_%H%M%S).sql.gz
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete
chmod +x /opt/ocr-sprint-service/backup.sh
# Cron: daily at 2 AM
echo "0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1" | sudo crontab -u ocr -

10. Troubleshooting

Masalah Diagnosis Solusi
Service tidak start journalctl -u ocr-sprint-api -n 100 Cek permissions, .env, dan log error
PaddleOCR model gagal download Timeout di logs python -c "from paddleocr import PaddleOCR; PaddleOCR(lang='latin')"
Worker tidak proses jobs redis-cli ping → bukan PONG Pastikan Redis running, cek REDIS_URL
Database migration error alembic current alembic stamp head lalu alembic upgrade head
Port 8000 sudah dipakai `ss -tlnp grep 8000`
Out of memory OOM killer di logs Kurangi --concurrency di worker, atau tambah RAM

11. Security Checklist

  • API_KEYS diisi dengan random key (openssl rand -hex 32)
  • Password database diganti dari default
  • Firewall aktif (hanya port 22, 80, 443 terbuka)
  • SSL/TLS aktif via Nginx + Let's Encrypt
  • Endpoint /metrics restricted ke internal network
  • Backup database otomatis via cron
  • OS security updates enabled (unattended-upgrades)
  • APP_ENV=prod (bukan local)

Quick Reference — Perintah Sehari-hari

# === Docker ===
docker compose up -d          # Start
docker compose down            # Stop
docker compose logs -f api     # Logs
docker compose build && docker compose up -d  # Update

# === Manual ===
sudo systemctl restart ocr-sprint-api ocr-sprint-worker  # Restart
sudo journalctl -u ocr-sprint-api -f                     # Logs
curl http://localhost:8000/api/v1/health                  # Health check