14 KiB
Panduan Deployment OCR Sprint Service
Dokumen ini adalah panduan langkah-langkah deployment ocr-sprint-service ke server production. Disusun berdasarkan kondisi kodingan aktual per April 2026 (Phase 1–4 selesai).
Daftar Isi
- Gambaran Arsitektur
- Prasyarat Server
- Opsi A — Docker Compose (Recommended)
- Opsi B — Manual (Tanpa Docker)
- Konfigurasi Environment Production
- Reverse Proxy & SSL (Nginx)
- Firewall
- Verifikasi Deployment
- Monitoring & Maintenance
- Troubleshooting
- Security Checklist
1. Gambaran Arsitektur
┌──────────┐ ┌──────────────┐ ┌───────┐
│ Client │────▶│ Nginx (SSL) │────▶│ API │──▶ PaddleOCR
└──────────┘ └──────────────┘ │ :8000 │ Pipeline
└───┬───┘
│ async job
┌─────▼─────┐
│ Redis │
│ :6379 │
└─────┬─────┘
┌─────▼──────┐
│ Worker │──▶ PaddleOCR
│ (Celery) │ Pipeline
└─────┬──────┘
┌─────▼──────┐
│ PostgreSQL │
│ :5432 │
└────────────┘
4 services yang harus berjalan:
| Service | Fungsi |
|---|---|
| API (FastAPI + Uvicorn) | Menerima upload dokumen, serve hasil OCR |
| Worker (Celery) | Async OCR processing di background |
| Redis | Message broker untuk job queue |
| PostgreSQL | Menyimpan job state & hasil ekstraksi |
Blob storage menggunakan local filesystem (belum S3/MinIO).
2. Prasyarat Server
Spesifikasi Minimum
| Resource | Minimum | Recommended |
|---|---|---|
| OS | Ubuntu 20.04+ / Debian 11+ | Ubuntu 22.04+ |
| CPU | 4 cores | 8 cores |
| RAM | 8 GB | 16 GB |
| Storage | 50 GB free | 100 GB free |
| Python | 3.10–3.12 | 3.11 atau 3.12 |
| Network | Port 8000 (internal) | + Port 80/443 (Nginx) |
Kebutuhan Disk
- ~1.5 GB — PaddlePaddle wheels
- ~200 MB — PaddleOCR model downloads (otomatis saat pertama jalan)
- Sisanya — blob storage dokumen yang diupload
Software yang Dibutuhkan
- Docker Compose — untuk Opsi A
- Python 3.10–3.12 + PostgreSQL + Redis — untuk Opsi B
- Git — kedua opsi
- Nginx (opsional) — reverse proxy + SSL
3. Opsi A — Docker Compose (Recommended)
Cara paling cepat. Semua service (API, Worker, Redis, Postgres) berjalan dalam container.
3.1 Login & Clone
ssh user@your-server.com
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
3.2 Konfigurasi .env
cp .env.example .env
nano .env
Lihat Bagian 5 untuk detail konfigurasi production.
Important
Untuk Docker Compose, jangan ubah
DATABASE_URLdanREDIS_URL— sudah dioverride olehdocker-compose.ymlvia environment variables di masing-masing container.
3.3 Build & Start
# Build image (~5–10 menit pertama kali)
docker compose build
# Start semua services
docker compose up -d
# Cek logs
docker compose logs -f api worker
Container api akan otomatis menjalankan alembic upgrade head sebelum start server (lihat command di docker-compose.yml).
3.4 First-Run Model Download
Request pertama akan trigger download model PaddleOCR (~200 MB) ke Docker volume paddle-models. Tunggu hingga selesai sebelum test.
# Monitor download di logs
docker compose logs -f api
3.5 Verifikasi
curl http://localhost:8000/api/v1/health
# Expected: {"status":"ok","version":"0.1.0"}
3.6 Update Service (Setelah Ada Perubahan Kode)
cd ocr-sprint-service
git pull
docker compose build
docker compose up -d
4. Opsi B — Manual (Tanpa Docker)
Untuk server yang sudah punya Python, PostgreSQL, dan Redis terinstall.
4.1 Install System Libraries
sudo apt update && sudo apt upgrade -y
# Libraries untuk OpenCV & PaddleOCR
sudo apt install -y \
python3.11 python3.11-venv python3.11-dev \
libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \
libgomp1 libmagic1 \
build-essential git curl
# Install Redis & PostgreSQL (jika belum ada)
sudo apt install -y redis-server postgresql postgresql-contrib
sudo systemctl enable --now redis-server postgresql
Note
Jika server sudah punya Python 3.12, gunakan
python3.12di semua perintah selanjutnya.
4.2 Setup Database
sudo -u postgres psql
CREATE USER ocr WITH PASSWORD 'ganti-password-kuat';
CREATE DATABASE ocr_sprint OWNER ocr;
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
\c ocr_sprint
GRANT ALL ON SCHEMA public TO ocr;
\q
4.3 Create Application User & Directory
sudo useradd -m -s /bin/bash ocr
sudo mkdir -p /opt/ocr-sprint-service
sudo chown ocr:ocr /opt/ocr-sprint-service
4.4 Clone & Install
sudo su - ocr
cd /opt
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
# Install dependencies + OCR runtime (~1.5 GB download)
pip install --upgrade pip setuptools wheel
pip install -e ".[ocr]"
# Verify
python -c "import paddleocr; print('PaddleOCR OK')"
python -c "import fastapi; print('FastAPI OK')"
4.5 Konfigurasi .env
cp .env.example .env
nano .env
Wajib diubah untuk manual deployment:
APP_ENV=prod
DATABASE_URL=postgresql+psycopg://ocr:ganti-password-kuat@localhost:5432/ocr_sprint
REDIS_URL=redis://localhost:6379/0
QUEUE_ENABLED=true
API_KEYS=your-generated-api-key
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
# Create storage directories
mkdir -p /opt/ocr-sprint-service/storage/blobs
4.6 Run Database Migrations
source .venv/bin/activate
alembic upgrade head
alembic current # verify
4.7 Test Manual
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
# Di terminal lain: curl http://localhost:8000/api/v1/health
# Ctrl+C untuk stop
4.8 Setup Systemd Services
API Service — /etc/systemd/system/ocr-sprint-api.service:
[Unit]
Description=OCR Sprint API Service
After=network.target postgresql.service redis-server.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
ocr_sprint.main:app \
--host 0.0.0.0 --port 8000 --workers 4 --log-level info
Restart=always
RestartSec=10
LimitNOFILE=65536
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
Worker Service — /etc/systemd/system/ocr-sprint-worker.service:
[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis-server.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
-A ocr_sprint.worker.celery_app worker \
--loglevel=info --concurrency=2 --max-tasks-per-child=100
Restart=always
RestartSec=10
LimitNOFILE=65536
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
Enable & Start:
# Keluar dari user ocr dulu
exit
sudo systemctl daemon-reload
sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker
sudo systemctl status ocr-sprint-api ocr-sprint-worker
4.9 Update Service (Manual)
sudo su - ocr
cd /opt/ocr-sprint-service
git pull
source .venv/bin/activate
pip install -e ".[ocr]"
alembic upgrade head
exit
sudo systemctl restart ocr-sprint-api ocr-sprint-worker
5. Konfigurasi Environment Production
Berikut konfigurasi .env yang wajib diubah dari default untuk production:
| Variable | Default | Production | Keterangan |
|---|---|---|---|
APP_ENV |
local |
prod |
Mode environment |
API_KEYS |
(kosong) | key1,key2 |
WAJIB! Auth disabled jika kosong |
QUEUE_ENABLED |
false |
true |
Aktifkan async processing |
DATABASE_URL |
sqlite:///... |
postgresql+psycopg://... |
Docker: otomatis di-override |
REDIS_URL |
redis://localhost:6379/0 |
Sesuaikan | Docker: otomatis di-override |
OCR_USE_GPU |
false |
true jika ada GPU |
Mode GPU butuh NVIDIA driver |
TABLES_ENABLED |
true |
true |
Ekstraksi tabel personel |
Generate API Key:
openssl rand -hex 32
Warning
Jangan pernah deploy ke production tanpa mengisi
API_KEYS. Jika kosong, semua endpoint terbuka tanpa autentikasi.
6. Reverse Proxy & SSL (Nginx)
Install
sudo apt install -y nginx certbot python3-certbot-nginx
Konfigurasi — /etc/nginx/sites-available/ocr-sprint
upstream ocr_api {
server 127.0.0.1:8000;
keepalive 32;
}
server {
listen 80;
server_name ocr.yourdomain.com;
client_max_body_size 30M;
proxy_connect_timeout 300s;
proxy_read_timeout 300s;
location / {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /metrics {
allow 127.0.0.1;
allow 10.0.0.0/8;
deny all;
proxy_pass http://ocr_api;
}
}
Enable & SSL
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
# SSL
sudo certbot --nginx -d ocr.yourdomain.com
7. Firewall
sudo ufw allow 22/tcp # SSH — PENTING!
sudo ufw allow 80/tcp # HTTP
sudo ufw allow 443/tcp # HTTPS
sudo ufw enable
sudo ufw status
Caution
Pastikan SSH (port 22) di-allow sebelum enable firewall, agar tidak terkunci dari server.
8. Verifikasi Deployment
Health Check
curl http://localhost:8000/api/v1/health
# {"status":"ok","version":"0.1.0"}
Test OCR (Sync)
curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \
-H "X-API-Key: your-api-key" \
-F "file=@/path/to/test.pdf" | jq
Test OCR (Async — Production Flow)
# Submit job
curl -X POST http://localhost:8000/api/v1/documents \
-H "X-API-Key: your-api-key" \
-F "file=@document.pdf" | jq
# → {"job_id":"8f2a...","status":"pending",...}
# Poll result
curl -H "X-API-Key: your-api-key" \
http://localhost:8000/api/v1/documents/8f2a... | jq
# → {"status":"completed","confidence":0.93,"data":{...}}
Cek Semua Service Berjalan
# Docker
docker compose ps
# Manual
sudo systemctl status ocr-sprint-api ocr-sprint-worker postgresql redis-server nginx
9. Monitoring & Maintenance
Logs
# Docker
docker compose logs -f api worker
# Manual (systemd)
sudo journalctl -u ocr-sprint-api -f
sudo journalctl -u ocr-sprint-worker -f
Prometheus Metrics
curl http://localhost:8000/metrics
Metrics penting: ocr_documents_total, ocr_processing_duration_seconds, ocr_confidence_score.
Backup Database
# Docker
docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
# Manual
pg_dump -U ocr -h localhost ocr_sprint | gzip > backup_$(date +%Y%m%d).sql.gz
Automated Backup (Cron)
# /opt/ocr-sprint-service/backup.sh
#!/bin/bash
BACKUP_DIR="/opt/ocr-sprint-service/backups"
mkdir -p $BACKUP_DIR
pg_dump -U ocr -h localhost ocr_sprint | gzip > $BACKUP_DIR/db_$(date +%Y%m%d_%H%M%S).sql.gz
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete
chmod +x /opt/ocr-sprint-service/backup.sh
# Cron: daily at 2 AM
echo "0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1" | sudo crontab -u ocr -
10. Troubleshooting
| Masalah | Diagnosis | Solusi |
|---|---|---|
| Service tidak start | journalctl -u ocr-sprint-api -n 100 |
Cek permissions, .env, dan log error |
| PaddleOCR model gagal download | Timeout di logs | python -c "from paddleocr import PaddleOCR; PaddleOCR(lang='latin')" |
| Worker tidak proses jobs | redis-cli ping → bukan PONG |
Pastikan Redis running, cek REDIS_URL |
| Database migration error | alembic current |
alembic stamp head lalu alembic upgrade head |
| Port 8000 sudah dipakai | `ss -tlnp | grep 8000` |
| Out of memory | OOM killer di logs | Kurangi --concurrency di worker, atau tambah RAM |
11. Security Checklist
API_KEYSdiisi dengan random key (openssl rand -hex 32)- Password database diganti dari default
- Firewall aktif (hanya port 22, 80, 443 terbuka)
- SSL/TLS aktif via Nginx + Let's Encrypt
- Endpoint
/metricsrestricted ke internal network - Backup database otomatis via cron
- OS security updates enabled (
unattended-upgrades) APP_ENV=prod(bukanlocal)
Quick Reference — Perintah Sehari-hari
# === Docker ===
docker compose up -d # Start
docker compose down # Stop
docker compose logs -f api # Logs
docker compose build && docker compose up -d # Update
# === Manual ===
sudo systemctl restart ocr-sprint-api ocr-sprint-worker # Restart
sudo journalctl -u ocr-sprint-api -f # Logs
curl http://localhost:8000/api/v1/health # Health check