docs: add comprehensive deployment guide for docker and manual setups

This commit is contained in:
Adriankf59
2026-04-27 10:06:38 +07:00
parent 6d793758ff
commit b8a1198e93

571
docs/DEPLOYMENT-GUIDE.md Normal file
View File

@@ -0,0 +1,571 @@
# Panduan Deployment OCR Sprint Service
> Dokumen ini adalah panduan langkah-langkah deployment **ocr-sprint-service** ke server production. Disusun berdasarkan kondisi kodingan aktual per April 2026 (Phase 14 selesai).
---
## Daftar Isi
1. [Gambaran Arsitektur](#1-gambaran-arsitektur)
2. [Prasyarat Server](#2-prasyarat-server)
3. [Opsi A — Docker Compose (Recommended)](#3-opsi-a--docker-compose-recommended)
4. [Opsi B — Manual (Tanpa Docker)](#4-opsi-b--manual-tanpa-docker)
5. [Konfigurasi Environment Production](#5-konfigurasi-environment-production)
6. [Reverse Proxy & SSL (Nginx)](#6-reverse-proxy--ssl-nginx)
7. [Firewall](#7-firewall)
8. [Verifikasi Deployment](#8-verifikasi-deployment)
9. [Monitoring & Maintenance](#9-monitoring--maintenance)
10. [Troubleshooting](#10-troubleshooting)
11. [Security Checklist](#11-security-checklist)
---
## 1. Gambaran Arsitektur
```
┌──────────┐ ┌──────────────┐ ┌───────┐
│ Client │────▶│ Nginx (SSL) │────▶│ API │──▶ PaddleOCR
└──────────┘ └──────────────┘ │ :8000 │ Pipeline
└───┬───┘
│ async job
┌─────▼─────┐
│ Redis │
│ :6379 │
└─────┬─────┘
┌─────▼──────┐
│ Worker │──▶ PaddleOCR
│ (Celery) │ Pipeline
└─────┬──────┘
┌─────▼──────┐
│ PostgreSQL │
│ :5432 │
└────────────┘
```
**4 services** yang harus berjalan:
| Service | Fungsi |
|---------|--------|
| **API** (FastAPI + Uvicorn) | Menerima upload dokumen, serve hasil OCR |
| **Worker** (Celery) | Async OCR processing di background |
| **Redis** | Message broker untuk job queue |
| **PostgreSQL** | Menyimpan job state & hasil ekstraksi |
Blob storage menggunakan **local filesystem** (belum S3/MinIO).
---
## 2. Prasyarat Server
### Spesifikasi Minimum
| Resource | Minimum | Recommended |
|----------|---------|-------------|
| OS | Ubuntu 20.04+ / Debian 11+ | Ubuntu 22.04+ |
| CPU | 4 cores | 8 cores |
| RAM | 8 GB | 16 GB |
| Storage | 50 GB free | 100 GB free |
| Python | 3.103.12 | 3.11 atau 3.12 |
| Network | Port 8000 (internal) | + Port 80/443 (Nginx) |
### Kebutuhan Disk
- ~1.5 GB — PaddlePaddle wheels
- ~200 MB — PaddleOCR model downloads (otomatis saat pertama jalan)
- Sisanya — blob storage dokumen yang diupload
### Software yang Dibutuhkan
- **Docker Compose** — untuk Opsi A
- **Python 3.103.12 + PostgreSQL + Redis** — untuk Opsi B
- **Git** — kedua opsi
- **Nginx** (opsional) — reverse proxy + SSL
---
## 3. Opsi A — Docker Compose (Recommended)
> Cara paling cepat. Semua service (API, Worker, Redis, Postgres) berjalan dalam container.
### 3.1 Login & Clone
```bash
ssh user@your-server.com
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
```
### 3.2 Konfigurasi .env
```bash
cp .env.example .env
nano .env
```
Lihat [Bagian 5](#5-konfigurasi-environment-production) untuk detail konfigurasi production.
> [!IMPORTANT]
> Untuk Docker Compose, **jangan ubah** `DATABASE_URL` dan `REDIS_URL` — sudah dioverride oleh `docker-compose.yml` via environment variables di masing-masing container.
### 3.3 Build & Start
```bash
# Build image (~510 menit pertama kali)
docker compose build
# Start semua services
docker compose up -d
# Cek logs
docker compose logs -f api worker
```
Container `api` akan otomatis menjalankan `alembic upgrade head` sebelum start server (lihat `command` di `docker-compose.yml`).
### 3.4 First-Run Model Download
Request pertama akan trigger download model PaddleOCR (~200 MB) ke Docker volume `paddle-models`. Tunggu hingga selesai sebelum test.
```bash
# Monitor download di logs
docker compose logs -f api
```
### 3.5 Verifikasi
```bash
curl http://localhost:8000/api/v1/health
# Expected: {"status":"ok","version":"0.1.0"}
```
### 3.6 Update Service (Setelah Ada Perubahan Kode)
```bash
cd ocr-sprint-service
git pull
docker compose build
docker compose up -d
```
---
## 4. Opsi B — Manual (Tanpa Docker)
> Untuk server yang sudah punya Python, PostgreSQL, dan Redis terinstall.
### 4.1 Install System Libraries
```bash
sudo apt update && sudo apt upgrade -y
# Libraries untuk OpenCV & PaddleOCR
sudo apt install -y \
python3.11 python3.11-venv python3.11-dev \
libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \
libgomp1 libmagic1 \
build-essential git curl
# Install Redis & PostgreSQL (jika belum ada)
sudo apt install -y redis-server postgresql postgresql-contrib
sudo systemctl enable --now redis-server postgresql
```
> [!NOTE]
> Jika server sudah punya Python 3.12, gunakan `python3.12` di semua perintah selanjutnya.
### 4.2 Setup Database
```bash
sudo -u postgres psql
```
```sql
CREATE USER ocr WITH PASSWORD 'ganti-password-kuat';
CREATE DATABASE ocr_sprint OWNER ocr;
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
\c ocr_sprint
GRANT ALL ON SCHEMA public TO ocr;
\q
```
### 4.3 Create Application User & Directory
```bash
sudo useradd -m -s /bin/bash ocr
sudo mkdir -p /opt/ocr-sprint-service
sudo chown ocr:ocr /opt/ocr-sprint-service
```
### 4.4 Clone & Install
```bash
sudo su - ocr
cd /opt
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
# Install dependencies + OCR runtime (~1.5 GB download)
pip install --upgrade pip setuptools wheel
pip install -e ".[ocr]"
# Verify
python -c "import paddleocr; print('PaddleOCR OK')"
python -c "import fastapi; print('FastAPI OK')"
```
### 4.5 Konfigurasi .env
```bash
cp .env.example .env
nano .env
```
**Wajib diubah untuk manual deployment:**
```bash
APP_ENV=prod
DATABASE_URL=postgresql+psycopg://ocr:ganti-password-kuat@localhost:5432/ocr_sprint
REDIS_URL=redis://localhost:6379/0
QUEUE_ENABLED=true
API_KEYS=your-generated-api-key
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
```
```bash
# Create storage directories
mkdir -p /opt/ocr-sprint-service/storage/blobs
```
### 4.6 Run Database Migrations
```bash
source .venv/bin/activate
alembic upgrade head
alembic current # verify
```
### 4.7 Test Manual
```bash
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
# Di terminal lain: curl http://localhost:8000/api/v1/health
# Ctrl+C untuk stop
```
### 4.8 Setup Systemd Services
**API Service**`/etc/systemd/system/ocr-sprint-api.service`:
```ini
[Unit]
Description=OCR Sprint API Service
After=network.target postgresql.service redis-server.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
ocr_sprint.main:app \
--host 0.0.0.0 --port 8000 --workers 4 --log-level info
Restart=always
RestartSec=10
LimitNOFILE=65536
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
```
**Worker Service**`/etc/systemd/system/ocr-sprint-worker.service`:
```ini
[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis-server.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
-A ocr_sprint.worker.celery_app worker \
--loglevel=info --concurrency=2 --max-tasks-per-child=100
Restart=always
RestartSec=10
LimitNOFILE=65536
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
```
**Enable & Start:**
```bash
# Keluar dari user ocr dulu
exit
sudo systemctl daemon-reload
sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker
sudo systemctl status ocr-sprint-api ocr-sprint-worker
```
### 4.9 Update Service (Manual)
```bash
sudo su - ocr
cd /opt/ocr-sprint-service
git pull
source .venv/bin/activate
pip install -e ".[ocr]"
alembic upgrade head
exit
sudo systemctl restart ocr-sprint-api ocr-sprint-worker
```
---
## 5. Konfigurasi Environment Production
Berikut konfigurasi `.env` yang **wajib diubah** dari default untuk production:
| Variable | Default | Production | Keterangan |
|----------|---------|------------|------------|
| `APP_ENV` | `local` | `prod` | Mode environment |
| `API_KEYS` | *(kosong)* | `key1,key2` | **WAJIB!** Auth disabled jika kosong |
| `QUEUE_ENABLED` | `false` | `true` | Aktifkan async processing |
| `DATABASE_URL` | `sqlite:///...` | `postgresql+psycopg://...` | Docker: otomatis di-override |
| `REDIS_URL` | `redis://localhost:6379/0` | Sesuaikan | Docker: otomatis di-override |
| `OCR_USE_GPU` | `false` | `true` jika ada GPU | Mode GPU butuh NVIDIA driver |
| `TABLES_ENABLED` | `true` | `true` | Ekstraksi tabel personel |
**Generate API Key:**
```bash
openssl rand -hex 32
```
> [!WARNING]
> Jangan pernah deploy ke production tanpa mengisi `API_KEYS`. Jika kosong, semua endpoint terbuka tanpa autentikasi.
---
## 6. Reverse Proxy & SSL (Nginx)
### Install
```bash
sudo apt install -y nginx certbot python3-certbot-nginx
```
### Konfigurasi — `/etc/nginx/sites-available/ocr-sprint`
```nginx
upstream ocr_api {
server 127.0.0.1:8000;
keepalive 32;
}
server {
listen 80;
server_name ocr.yourdomain.com;
client_max_body_size 30M;
proxy_connect_timeout 300s;
proxy_read_timeout 300s;
location / {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /metrics {
allow 127.0.0.1;
allow 10.0.0.0/8;
deny all;
proxy_pass http://ocr_api;
}
}
```
### Enable & SSL
```bash
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
# SSL
sudo certbot --nginx -d ocr.yourdomain.com
```
---
## 7. Firewall
```bash
sudo ufw allow 22/tcp # SSH — PENTING!
sudo ufw allow 80/tcp # HTTP
sudo ufw allow 443/tcp # HTTPS
sudo ufw enable
sudo ufw status
```
> [!CAUTION]
> Pastikan SSH (port 22) di-allow **sebelum** enable firewall, agar tidak terkunci dari server.
---
## 8. Verifikasi Deployment
### Health Check
```bash
curl http://localhost:8000/api/v1/health
# {"status":"ok","version":"0.1.0"}
```
### Test OCR (Sync)
```bash
curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \
-H "X-API-Key: your-api-key" \
-F "file=@/path/to/test.pdf" | jq
```
### Test OCR (Async — Production Flow)
```bash
# Submit job
curl -X POST http://localhost:8000/api/v1/documents \
-H "X-API-Key: your-api-key" \
-F "file=@document.pdf" | jq
# → {"job_id":"8f2a...","status":"pending",...}
# Poll result
curl -H "X-API-Key: your-api-key" \
http://localhost:8000/api/v1/documents/8f2a... | jq
# → {"status":"completed","confidence":0.93,"data":{...}}
```
### Cek Semua Service Berjalan
```bash
# Docker
docker compose ps
# Manual
sudo systemctl status ocr-sprint-api ocr-sprint-worker postgresql redis-server nginx
```
---
## 9. Monitoring & Maintenance
### Logs
```bash
# Docker
docker compose logs -f api worker
# Manual (systemd)
sudo journalctl -u ocr-sprint-api -f
sudo journalctl -u ocr-sprint-worker -f
```
### Prometheus Metrics
```bash
curl http://localhost:8000/metrics
```
Metrics penting: `ocr_documents_total`, `ocr_processing_duration_seconds`, `ocr_confidence_score`.
### Backup Database
```bash
# Docker
docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
# Manual
pg_dump -U ocr -h localhost ocr_sprint | gzip > backup_$(date +%Y%m%d).sql.gz
```
### Automated Backup (Cron)
```bash
# /opt/ocr-sprint-service/backup.sh
#!/bin/bash
BACKUP_DIR="/opt/ocr-sprint-service/backups"
mkdir -p $BACKUP_DIR
pg_dump -U ocr -h localhost ocr_sprint | gzip > $BACKUP_DIR/db_$(date +%Y%m%d_%H%M%S).sql.gz
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete
```
```bash
chmod +x /opt/ocr-sprint-service/backup.sh
# Cron: daily at 2 AM
echo "0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1" | sudo crontab -u ocr -
```
---
## 10. Troubleshooting
| Masalah | Diagnosis | Solusi |
|---------|-----------|--------|
| Service tidak start | `journalctl -u ocr-sprint-api -n 100` | Cek permissions, `.env`, dan log error |
| PaddleOCR model gagal download | Timeout di logs | `python -c "from paddleocr import PaddleOCR; PaddleOCR(lang='latin')"` |
| Worker tidak proses jobs | `redis-cli ping` → bukan PONG | Pastikan Redis running, cek `REDIS_URL` |
| Database migration error | `alembic current` | `alembic stamp head` lalu `alembic upgrade head` |
| Port 8000 sudah dipakai | `ss -tlnp | grep 8000` | Kill proses lama atau ganti port di `.env` |
| Out of memory | OOM killer di logs | Kurangi `--concurrency` di worker, atau tambah RAM |
---
## 11. Security Checklist
- [ ] `API_KEYS` diisi dengan random key (`openssl rand -hex 32`)
- [ ] Password database diganti dari default
- [ ] Firewall aktif (hanya port 22, 80, 443 terbuka)
- [ ] SSL/TLS aktif via Nginx + Let's Encrypt
- [ ] Endpoint `/metrics` restricted ke internal network
- [ ] Backup database otomatis via cron
- [ ] OS security updates enabled (`unattended-upgrades`)
- [ ] `APP_ENV=prod` (bukan `local`)
---
## Quick Reference — Perintah Sehari-hari
```bash
# === Docker ===
docker compose up -d # Start
docker compose down # Stop
docker compose logs -f api # Logs
docker compose build && docker compose up -d # Update
# === Manual ===
sudo systemctl restart ocr-sprint-api ocr-sprint-worker # Restart
sudo journalctl -u ocr-sprint-api -f # Logs
curl http://localhost:8000/api/v1/health # Health check
```