diff --git a/docs/DEPLOYMENT-GUIDE.md b/docs/DEPLOYMENT-GUIDE.md new file mode 100644 index 0000000..06faa67 --- /dev/null +++ b/docs/DEPLOYMENT-GUIDE.md @@ -0,0 +1,571 @@ +# Panduan Deployment OCR Sprint Service + +> Dokumen ini adalah panduan langkah-langkah deployment **ocr-sprint-service** ke server production. Disusun berdasarkan kondisi kodingan aktual per April 2026 (Phase 1–4 selesai). + +--- + +## Daftar Isi + +1. [Gambaran Arsitektur](#1-gambaran-arsitektur) +2. [Prasyarat Server](#2-prasyarat-server) +3. [Opsi A — Docker Compose (Recommended)](#3-opsi-a--docker-compose-recommended) +4. [Opsi B — Manual (Tanpa Docker)](#4-opsi-b--manual-tanpa-docker) +5. [Konfigurasi Environment Production](#5-konfigurasi-environment-production) +6. [Reverse Proxy & SSL (Nginx)](#6-reverse-proxy--ssl-nginx) +7. [Firewall](#7-firewall) +8. [Verifikasi Deployment](#8-verifikasi-deployment) +9. [Monitoring & Maintenance](#9-monitoring--maintenance) +10. [Troubleshooting](#10-troubleshooting) +11. [Security Checklist](#11-security-checklist) + +--- + +## 1. Gambaran Arsitektur + +``` +┌──────────┐ ┌──────────────┐ ┌───────┐ +│ Client │────▶│ Nginx (SSL) │────▶│ API │──▶ PaddleOCR +└──────────┘ └──────────────┘ │ :8000 │ Pipeline + └───┬───┘ + │ async job + ┌─────▼─────┐ + │ Redis │ + │ :6379 │ + └─────┬─────┘ + ┌─────▼──────┐ + │ Worker │──▶ PaddleOCR + │ (Celery) │ Pipeline + └─────┬──────┘ + ┌─────▼──────┐ + │ PostgreSQL │ + │ :5432 │ + └────────────┘ +``` + +**4 services** yang harus berjalan: + +| Service | Fungsi | +|---------|--------| +| **API** (FastAPI + Uvicorn) | Menerima upload dokumen, serve hasil OCR | +| **Worker** (Celery) | Async OCR processing di background | +| **Redis** | Message broker untuk job queue | +| **PostgreSQL** | Menyimpan job state & hasil ekstraksi | + +Blob storage menggunakan **local filesystem** (belum S3/MinIO). + +--- + +## 2. Prasyarat Server + +### Spesifikasi Minimum + +| Resource | Minimum | Recommended | +|----------|---------|-------------| +| OS | Ubuntu 20.04+ / Debian 11+ | Ubuntu 22.04+ | +| CPU | 4 cores | 8 cores | +| RAM | 8 GB | 16 GB | +| Storage | 50 GB free | 100 GB free | +| Python | 3.10–3.12 | 3.11 atau 3.12 | +| Network | Port 8000 (internal) | + Port 80/443 (Nginx) | + +### Kebutuhan Disk + +- ~1.5 GB — PaddlePaddle wheels +- ~200 MB — PaddleOCR model downloads (otomatis saat pertama jalan) +- Sisanya — blob storage dokumen yang diupload + +### Software yang Dibutuhkan + +- **Docker Compose** — untuk Opsi A +- **Python 3.10–3.12 + PostgreSQL + Redis** — untuk Opsi B +- **Git** — kedua opsi +- **Nginx** (opsional) — reverse proxy + SSL + +--- + +## 3. Opsi A — Docker Compose (Recommended) + +> Cara paling cepat. Semua service (API, Worker, Redis, Postgres) berjalan dalam container. + +### 3.1 Login & Clone + +```bash +ssh user@your-server.com + +git clone https://github.com/Adriankf59/ocr-sprint-service.git +cd ocr-sprint-service +``` + +### 3.2 Konfigurasi .env + +```bash +cp .env.example .env +nano .env +``` + +Lihat [Bagian 5](#5-konfigurasi-environment-production) untuk detail konfigurasi production. + +> [!IMPORTANT] +> Untuk Docker Compose, **jangan ubah** `DATABASE_URL` dan `REDIS_URL` — sudah dioverride oleh `docker-compose.yml` via environment variables di masing-masing container. + +### 3.3 Build & Start + +```bash +# Build image (~5–10 menit pertama kali) +docker compose build + +# Start semua services +docker compose up -d + +# Cek logs +docker compose logs -f api worker +``` + +Container `api` akan otomatis menjalankan `alembic upgrade head` sebelum start server (lihat `command` di `docker-compose.yml`). + +### 3.4 First-Run Model Download + +Request pertama akan trigger download model PaddleOCR (~200 MB) ke Docker volume `paddle-models`. Tunggu hingga selesai sebelum test. + +```bash +# Monitor download di logs +docker compose logs -f api +``` + +### 3.5 Verifikasi + +```bash +curl http://localhost:8000/api/v1/health +# Expected: {"status":"ok","version":"0.1.0"} +``` + +### 3.6 Update Service (Setelah Ada Perubahan Kode) + +```bash +cd ocr-sprint-service +git pull +docker compose build +docker compose up -d +``` + +--- + +## 4. Opsi B — Manual (Tanpa Docker) + +> Untuk server yang sudah punya Python, PostgreSQL, dan Redis terinstall. + +### 4.1 Install System Libraries + +```bash +sudo apt update && sudo apt upgrade -y + +# Libraries untuk OpenCV & PaddleOCR +sudo apt install -y \ + python3.11 python3.11-venv python3.11-dev \ + libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \ + libgomp1 libmagic1 \ + build-essential git curl + +# Install Redis & PostgreSQL (jika belum ada) +sudo apt install -y redis-server postgresql postgresql-contrib +sudo systemctl enable --now redis-server postgresql +``` + +> [!NOTE] +> Jika server sudah punya Python 3.12, gunakan `python3.12` di semua perintah selanjutnya. + +### 4.2 Setup Database + +```bash +sudo -u postgres psql +``` + +```sql +CREATE USER ocr WITH PASSWORD 'ganti-password-kuat'; +CREATE DATABASE ocr_sprint OWNER ocr; +GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr; +\c ocr_sprint +GRANT ALL ON SCHEMA public TO ocr; +\q +``` + +### 4.3 Create Application User & Directory + +```bash +sudo useradd -m -s /bin/bash ocr +sudo mkdir -p /opt/ocr-sprint-service +sudo chown ocr:ocr /opt/ocr-sprint-service +``` + +### 4.4 Clone & Install + +```bash +sudo su - ocr +cd /opt +git clone https://github.com/Adriankf59/ocr-sprint-service.git +cd ocr-sprint-service + +# Create virtual environment +python3.11 -m venv .venv +source .venv/bin/activate + +# Install dependencies + OCR runtime (~1.5 GB download) +pip install --upgrade pip setuptools wheel +pip install -e ".[ocr]" + +# Verify +python -c "import paddleocr; print('PaddleOCR OK')" +python -c "import fastapi; print('FastAPI OK')" +``` + +### 4.5 Konfigurasi .env + +```bash +cp .env.example .env +nano .env +``` + +**Wajib diubah untuk manual deployment:** + +```bash +APP_ENV=prod +DATABASE_URL=postgresql+psycopg://ocr:ganti-password-kuat@localhost:5432/ocr_sprint +REDIS_URL=redis://localhost:6379/0 +QUEUE_ENABLED=true +API_KEYS=your-generated-api-key +STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage +BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs +``` + +```bash +# Create storage directories +mkdir -p /opt/ocr-sprint-service/storage/blobs +``` + +### 4.6 Run Database Migrations + +```bash +source .venv/bin/activate +alembic upgrade head +alembic current # verify +``` + +### 4.7 Test Manual + +```bash +uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000 +# Di terminal lain: curl http://localhost:8000/api/v1/health +# Ctrl+C untuk stop +``` + +### 4.8 Setup Systemd Services + +**API Service** — `/etc/systemd/system/ocr-sprint-api.service`: + +```ini +[Unit] +Description=OCR Sprint API Service +After=network.target postgresql.service redis-server.service + +[Service] +Type=simple +User=ocr +Group=ocr +WorkingDirectory=/opt/ocr-sprint-service +Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin" +EnvironmentFile=/opt/ocr-sprint-service/.env +ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \ + ocr_sprint.main:app \ + --host 0.0.0.0 --port 8000 --workers 4 --log-level info +Restart=always +RestartSec=10 +LimitNOFILE=65536 +NoNewPrivileges=true + +[Install] +WantedBy=multi-user.target +``` + +**Worker Service** — `/etc/systemd/system/ocr-sprint-worker.service`: + +```ini +[Unit] +Description=OCR Sprint Celery Worker +After=network.target postgresql.service redis-server.service + +[Service] +Type=simple +User=ocr +Group=ocr +WorkingDirectory=/opt/ocr-sprint-service +Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin" +EnvironmentFile=/opt/ocr-sprint-service/.env +ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \ + -A ocr_sprint.worker.celery_app worker \ + --loglevel=info --concurrency=2 --max-tasks-per-child=100 +Restart=always +RestartSec=10 +LimitNOFILE=65536 +NoNewPrivileges=true + +[Install] +WantedBy=multi-user.target +``` + +**Enable & Start:** + +```bash +# Keluar dari user ocr dulu +exit + +sudo systemctl daemon-reload +sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker +sudo systemctl status ocr-sprint-api ocr-sprint-worker +``` + +### 4.9 Update Service (Manual) + +```bash +sudo su - ocr +cd /opt/ocr-sprint-service +git pull +source .venv/bin/activate +pip install -e ".[ocr]" +alembic upgrade head +exit + +sudo systemctl restart ocr-sprint-api ocr-sprint-worker +``` + +--- + +## 5. Konfigurasi Environment Production + +Berikut konfigurasi `.env` yang **wajib diubah** dari default untuk production: + +| Variable | Default | Production | Keterangan | +|----------|---------|------------|------------| +| `APP_ENV` | `local` | `prod` | Mode environment | +| `API_KEYS` | *(kosong)* | `key1,key2` | **WAJIB!** Auth disabled jika kosong | +| `QUEUE_ENABLED` | `false` | `true` | Aktifkan async processing | +| `DATABASE_URL` | `sqlite:///...` | `postgresql+psycopg://...` | Docker: otomatis di-override | +| `REDIS_URL` | `redis://localhost:6379/0` | Sesuaikan | Docker: otomatis di-override | +| `OCR_USE_GPU` | `false` | `true` jika ada GPU | Mode GPU butuh NVIDIA driver | +| `TABLES_ENABLED` | `true` | `true` | Ekstraksi tabel personel | + +**Generate API Key:** + +```bash +openssl rand -hex 32 +``` + +> [!WARNING] +> Jangan pernah deploy ke production tanpa mengisi `API_KEYS`. Jika kosong, semua endpoint terbuka tanpa autentikasi. + +--- + +## 6. Reverse Proxy & SSL (Nginx) + +### Install + +```bash +sudo apt install -y nginx certbot python3-certbot-nginx +``` + +### Konfigurasi — `/etc/nginx/sites-available/ocr-sprint` + +```nginx +upstream ocr_api { + server 127.0.0.1:8000; + keepalive 32; +} + +server { + listen 80; + server_name ocr.yourdomain.com; + + client_max_body_size 30M; + + proxy_connect_timeout 300s; + proxy_read_timeout 300s; + + location / { + proxy_pass http://ocr_api; + proxy_http_version 1.1; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /metrics { + allow 127.0.0.1; + allow 10.0.0.0/8; + deny all; + proxy_pass http://ocr_api; + } +} +``` + +### Enable & SSL + +```bash +sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/ +sudo nginx -t +sudo systemctl reload nginx + +# SSL +sudo certbot --nginx -d ocr.yourdomain.com +``` + +--- + +## 7. Firewall + +```bash +sudo ufw allow 22/tcp # SSH — PENTING! +sudo ufw allow 80/tcp # HTTP +sudo ufw allow 443/tcp # HTTPS +sudo ufw enable +sudo ufw status +``` + +> [!CAUTION] +> Pastikan SSH (port 22) di-allow **sebelum** enable firewall, agar tidak terkunci dari server. + +--- + +## 8. Verifikasi Deployment + +### Health Check + +```bash +curl http://localhost:8000/api/v1/health +# {"status":"ok","version":"0.1.0"} +``` + +### Test OCR (Sync) + +```bash +curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \ + -H "X-API-Key: your-api-key" \ + -F "file=@/path/to/test.pdf" | jq +``` + +### Test OCR (Async — Production Flow) + +```bash +# Submit job +curl -X POST http://localhost:8000/api/v1/documents \ + -H "X-API-Key: your-api-key" \ + -F "file=@document.pdf" | jq +# → {"job_id":"8f2a...","status":"pending",...} + +# Poll result +curl -H "X-API-Key: your-api-key" \ + http://localhost:8000/api/v1/documents/8f2a... | jq +# → {"status":"completed","confidence":0.93,"data":{...}} +``` + +### Cek Semua Service Berjalan + +```bash +# Docker +docker compose ps + +# Manual +sudo systemctl status ocr-sprint-api ocr-sprint-worker postgresql redis-server nginx +``` + +--- + +## 9. Monitoring & Maintenance + +### Logs + +```bash +# Docker +docker compose logs -f api worker + +# Manual (systemd) +sudo journalctl -u ocr-sprint-api -f +sudo journalctl -u ocr-sprint-worker -f +``` + +### Prometheus Metrics + +```bash +curl http://localhost:8000/metrics +``` + +Metrics penting: `ocr_documents_total`, `ocr_processing_duration_seconds`, `ocr_confidence_score`. + +### Backup Database + +```bash +# Docker +docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql + +# Manual +pg_dump -U ocr -h localhost ocr_sprint | gzip > backup_$(date +%Y%m%d).sql.gz +``` + +### Automated Backup (Cron) + +```bash +# /opt/ocr-sprint-service/backup.sh +#!/bin/bash +BACKUP_DIR="/opt/ocr-sprint-service/backups" +mkdir -p $BACKUP_DIR +pg_dump -U ocr -h localhost ocr_sprint | gzip > $BACKUP_DIR/db_$(date +%Y%m%d_%H%M%S).sql.gz +find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete +``` + +```bash +chmod +x /opt/ocr-sprint-service/backup.sh +# Cron: daily at 2 AM +echo "0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1" | sudo crontab -u ocr - +``` + +--- + +## 10. Troubleshooting + +| Masalah | Diagnosis | Solusi | +|---------|-----------|--------| +| Service tidak start | `journalctl -u ocr-sprint-api -n 100` | Cek permissions, `.env`, dan log error | +| PaddleOCR model gagal download | Timeout di logs | `python -c "from paddleocr import PaddleOCR; PaddleOCR(lang='latin')"` | +| Worker tidak proses jobs | `redis-cli ping` → bukan PONG | Pastikan Redis running, cek `REDIS_URL` | +| Database migration error | `alembic current` | `alembic stamp head` lalu `alembic upgrade head` | +| Port 8000 sudah dipakai | `ss -tlnp | grep 8000` | Kill proses lama atau ganti port di `.env` | +| Out of memory | OOM killer di logs | Kurangi `--concurrency` di worker, atau tambah RAM | + +--- + +## 11. Security Checklist + +- [ ] `API_KEYS` diisi dengan random key (`openssl rand -hex 32`) +- [ ] Password database diganti dari default +- [ ] Firewall aktif (hanya port 22, 80, 443 terbuka) +- [ ] SSL/TLS aktif via Nginx + Let's Encrypt +- [ ] Endpoint `/metrics` restricted ke internal network +- [ ] Backup database otomatis via cron +- [ ] OS security updates enabled (`unattended-upgrades`) +- [ ] `APP_ENV=prod` (bukan `local`) + +--- + +## Quick Reference — Perintah Sehari-hari + +```bash +# === Docker === +docker compose up -d # Start +docker compose down # Stop +docker compose logs -f api # Logs +docker compose build && docker compose up -d # Update + +# === Manual === +sudo systemctl restart ocr-sprint-api ocr-sprint-worker # Restart +sudo journalctl -u ocr-sprint-api -f # Logs +curl http://localhost:8000/api/v1/health # Health check +```