docs: add comprehensive deployment guide for docker and manual setups
This commit is contained in:
571
docs/DEPLOYMENT-GUIDE.md
Normal file
571
docs/DEPLOYMENT-GUIDE.md
Normal file
@@ -0,0 +1,571 @@
|
||||
# Panduan Deployment OCR Sprint Service
|
||||
|
||||
> Dokumen ini adalah panduan langkah-langkah deployment **ocr-sprint-service** ke server production. Disusun berdasarkan kondisi kodingan aktual per April 2026 (Phase 1–4 selesai).
|
||||
|
||||
---
|
||||
|
||||
## Daftar Isi
|
||||
|
||||
1. [Gambaran Arsitektur](#1-gambaran-arsitektur)
|
||||
2. [Prasyarat Server](#2-prasyarat-server)
|
||||
3. [Opsi A — Docker Compose (Recommended)](#3-opsi-a--docker-compose-recommended)
|
||||
4. [Opsi B — Manual (Tanpa Docker)](#4-opsi-b--manual-tanpa-docker)
|
||||
5. [Konfigurasi Environment Production](#5-konfigurasi-environment-production)
|
||||
6. [Reverse Proxy & SSL (Nginx)](#6-reverse-proxy--ssl-nginx)
|
||||
7. [Firewall](#7-firewall)
|
||||
8. [Verifikasi Deployment](#8-verifikasi-deployment)
|
||||
9. [Monitoring & Maintenance](#9-monitoring--maintenance)
|
||||
10. [Troubleshooting](#10-troubleshooting)
|
||||
11. [Security Checklist](#11-security-checklist)
|
||||
|
||||
---
|
||||
|
||||
## 1. Gambaran Arsitektur
|
||||
|
||||
```
|
||||
┌──────────┐ ┌──────────────┐ ┌───────┐
|
||||
│ Client │────▶│ Nginx (SSL) │────▶│ API │──▶ PaddleOCR
|
||||
└──────────┘ └──────────────┘ │ :8000 │ Pipeline
|
||||
└───┬───┘
|
||||
│ async job
|
||||
┌─────▼─────┐
|
||||
│ Redis │
|
||||
│ :6379 │
|
||||
└─────┬─────┘
|
||||
┌─────▼──────┐
|
||||
│ Worker │──▶ PaddleOCR
|
||||
│ (Celery) │ Pipeline
|
||||
└─────┬──────┘
|
||||
┌─────▼──────┐
|
||||
│ PostgreSQL │
|
||||
│ :5432 │
|
||||
└────────────┘
|
||||
```
|
||||
|
||||
**4 services** yang harus berjalan:
|
||||
|
||||
| Service | Fungsi |
|
||||
|---------|--------|
|
||||
| **API** (FastAPI + Uvicorn) | Menerima upload dokumen, serve hasil OCR |
|
||||
| **Worker** (Celery) | Async OCR processing di background |
|
||||
| **Redis** | Message broker untuk job queue |
|
||||
| **PostgreSQL** | Menyimpan job state & hasil ekstraksi |
|
||||
|
||||
Blob storage menggunakan **local filesystem** (belum S3/MinIO).
|
||||
|
||||
---
|
||||
|
||||
## 2. Prasyarat Server
|
||||
|
||||
### Spesifikasi Minimum
|
||||
|
||||
| Resource | Minimum | Recommended |
|
||||
|----------|---------|-------------|
|
||||
| OS | Ubuntu 20.04+ / Debian 11+ | Ubuntu 22.04+ |
|
||||
| CPU | 4 cores | 8 cores |
|
||||
| RAM | 8 GB | 16 GB |
|
||||
| Storage | 50 GB free | 100 GB free |
|
||||
| Python | 3.10–3.12 | 3.11 atau 3.12 |
|
||||
| Network | Port 8000 (internal) | + Port 80/443 (Nginx) |
|
||||
|
||||
### Kebutuhan Disk
|
||||
|
||||
- ~1.5 GB — PaddlePaddle wheels
|
||||
- ~200 MB — PaddleOCR model downloads (otomatis saat pertama jalan)
|
||||
- Sisanya — blob storage dokumen yang diupload
|
||||
|
||||
### Software yang Dibutuhkan
|
||||
|
||||
- **Docker Compose** — untuk Opsi A
|
||||
- **Python 3.10–3.12 + PostgreSQL + Redis** — untuk Opsi B
|
||||
- **Git** — kedua opsi
|
||||
- **Nginx** (opsional) — reverse proxy + SSL
|
||||
|
||||
---
|
||||
|
||||
## 3. Opsi A — Docker Compose (Recommended)
|
||||
|
||||
> Cara paling cepat. Semua service (API, Worker, Redis, Postgres) berjalan dalam container.
|
||||
|
||||
### 3.1 Login & Clone
|
||||
|
||||
```bash
|
||||
ssh user@your-server.com
|
||||
|
||||
git clone https://github.com/Adriankf59/ocr-sprint-service.git
|
||||
cd ocr-sprint-service
|
||||
```
|
||||
|
||||
### 3.2 Konfigurasi .env
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
nano .env
|
||||
```
|
||||
|
||||
Lihat [Bagian 5](#5-konfigurasi-environment-production) untuk detail konfigurasi production.
|
||||
|
||||
> [!IMPORTANT]
|
||||
> Untuk Docker Compose, **jangan ubah** `DATABASE_URL` dan `REDIS_URL` — sudah dioverride oleh `docker-compose.yml` via environment variables di masing-masing container.
|
||||
|
||||
### 3.3 Build & Start
|
||||
|
||||
```bash
|
||||
# Build image (~5–10 menit pertama kali)
|
||||
docker compose build
|
||||
|
||||
# Start semua services
|
||||
docker compose up -d
|
||||
|
||||
# Cek logs
|
||||
docker compose logs -f api worker
|
||||
```
|
||||
|
||||
Container `api` akan otomatis menjalankan `alembic upgrade head` sebelum start server (lihat `command` di `docker-compose.yml`).
|
||||
|
||||
### 3.4 First-Run Model Download
|
||||
|
||||
Request pertama akan trigger download model PaddleOCR (~200 MB) ke Docker volume `paddle-models`. Tunggu hingga selesai sebelum test.
|
||||
|
||||
```bash
|
||||
# Monitor download di logs
|
||||
docker compose logs -f api
|
||||
```
|
||||
|
||||
### 3.5 Verifikasi
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/api/v1/health
|
||||
# Expected: {"status":"ok","version":"0.1.0"}
|
||||
```
|
||||
|
||||
### 3.6 Update Service (Setelah Ada Perubahan Kode)
|
||||
|
||||
```bash
|
||||
cd ocr-sprint-service
|
||||
git pull
|
||||
docker compose build
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Opsi B — Manual (Tanpa Docker)
|
||||
|
||||
> Untuk server yang sudah punya Python, PostgreSQL, dan Redis terinstall.
|
||||
|
||||
### 4.1 Install System Libraries
|
||||
|
||||
```bash
|
||||
sudo apt update && sudo apt upgrade -y
|
||||
|
||||
# Libraries untuk OpenCV & PaddleOCR
|
||||
sudo apt install -y \
|
||||
python3.11 python3.11-venv python3.11-dev \
|
||||
libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \
|
||||
libgomp1 libmagic1 \
|
||||
build-essential git curl
|
||||
|
||||
# Install Redis & PostgreSQL (jika belum ada)
|
||||
sudo apt install -y redis-server postgresql postgresql-contrib
|
||||
sudo systemctl enable --now redis-server postgresql
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> Jika server sudah punya Python 3.12, gunakan `python3.12` di semua perintah selanjutnya.
|
||||
|
||||
### 4.2 Setup Database
|
||||
|
||||
```bash
|
||||
sudo -u postgres psql
|
||||
```
|
||||
|
||||
```sql
|
||||
CREATE USER ocr WITH PASSWORD 'ganti-password-kuat';
|
||||
CREATE DATABASE ocr_sprint OWNER ocr;
|
||||
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
|
||||
\c ocr_sprint
|
||||
GRANT ALL ON SCHEMA public TO ocr;
|
||||
\q
|
||||
```
|
||||
|
||||
### 4.3 Create Application User & Directory
|
||||
|
||||
```bash
|
||||
sudo useradd -m -s /bin/bash ocr
|
||||
sudo mkdir -p /opt/ocr-sprint-service
|
||||
sudo chown ocr:ocr /opt/ocr-sprint-service
|
||||
```
|
||||
|
||||
### 4.4 Clone & Install
|
||||
|
||||
```bash
|
||||
sudo su - ocr
|
||||
cd /opt
|
||||
git clone https://github.com/Adriankf59/ocr-sprint-service.git
|
||||
cd ocr-sprint-service
|
||||
|
||||
# Create virtual environment
|
||||
python3.11 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
|
||||
# Install dependencies + OCR runtime (~1.5 GB download)
|
||||
pip install --upgrade pip setuptools wheel
|
||||
pip install -e ".[ocr]"
|
||||
|
||||
# Verify
|
||||
python -c "import paddleocr; print('PaddleOCR OK')"
|
||||
python -c "import fastapi; print('FastAPI OK')"
|
||||
```
|
||||
|
||||
### 4.5 Konfigurasi .env
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
nano .env
|
||||
```
|
||||
|
||||
**Wajib diubah untuk manual deployment:**
|
||||
|
||||
```bash
|
||||
APP_ENV=prod
|
||||
DATABASE_URL=postgresql+psycopg://ocr:ganti-password-kuat@localhost:5432/ocr_sprint
|
||||
REDIS_URL=redis://localhost:6379/0
|
||||
QUEUE_ENABLED=true
|
||||
API_KEYS=your-generated-api-key
|
||||
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
|
||||
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
|
||||
```
|
||||
|
||||
```bash
|
||||
# Create storage directories
|
||||
mkdir -p /opt/ocr-sprint-service/storage/blobs
|
||||
```
|
||||
|
||||
### 4.6 Run Database Migrations
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
alembic upgrade head
|
||||
alembic current # verify
|
||||
```
|
||||
|
||||
### 4.7 Test Manual
|
||||
|
||||
```bash
|
||||
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
|
||||
# Di terminal lain: curl http://localhost:8000/api/v1/health
|
||||
# Ctrl+C untuk stop
|
||||
```
|
||||
|
||||
### 4.8 Setup Systemd Services
|
||||
|
||||
**API Service** — `/etc/systemd/system/ocr-sprint-api.service`:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=OCR Sprint API Service
|
||||
After=network.target postgresql.service redis-server.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=ocr
|
||||
Group=ocr
|
||||
WorkingDirectory=/opt/ocr-sprint-service
|
||||
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
|
||||
EnvironmentFile=/opt/ocr-sprint-service/.env
|
||||
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
|
||||
ocr_sprint.main:app \
|
||||
--host 0.0.0.0 --port 8000 --workers 4 --log-level info
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
LimitNOFILE=65536
|
||||
NoNewPrivileges=true
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
**Worker Service** — `/etc/systemd/system/ocr-sprint-worker.service`:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=OCR Sprint Celery Worker
|
||||
After=network.target postgresql.service redis-server.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=ocr
|
||||
Group=ocr
|
||||
WorkingDirectory=/opt/ocr-sprint-service
|
||||
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
|
||||
EnvironmentFile=/opt/ocr-sprint-service/.env
|
||||
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
|
||||
-A ocr_sprint.worker.celery_app worker \
|
||||
--loglevel=info --concurrency=2 --max-tasks-per-child=100
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
LimitNOFILE=65536
|
||||
NoNewPrivileges=true
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
**Enable & Start:**
|
||||
|
||||
```bash
|
||||
# Keluar dari user ocr dulu
|
||||
exit
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker
|
||||
sudo systemctl status ocr-sprint-api ocr-sprint-worker
|
||||
```
|
||||
|
||||
### 4.9 Update Service (Manual)
|
||||
|
||||
```bash
|
||||
sudo su - ocr
|
||||
cd /opt/ocr-sprint-service
|
||||
git pull
|
||||
source .venv/bin/activate
|
||||
pip install -e ".[ocr]"
|
||||
alembic upgrade head
|
||||
exit
|
||||
|
||||
sudo systemctl restart ocr-sprint-api ocr-sprint-worker
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Konfigurasi Environment Production
|
||||
|
||||
Berikut konfigurasi `.env` yang **wajib diubah** dari default untuk production:
|
||||
|
||||
| Variable | Default | Production | Keterangan |
|
||||
|----------|---------|------------|------------|
|
||||
| `APP_ENV` | `local` | `prod` | Mode environment |
|
||||
| `API_KEYS` | *(kosong)* | `key1,key2` | **WAJIB!** Auth disabled jika kosong |
|
||||
| `QUEUE_ENABLED` | `false` | `true` | Aktifkan async processing |
|
||||
| `DATABASE_URL` | `sqlite:///...` | `postgresql+psycopg://...` | Docker: otomatis di-override |
|
||||
| `REDIS_URL` | `redis://localhost:6379/0` | Sesuaikan | Docker: otomatis di-override |
|
||||
| `OCR_USE_GPU` | `false` | `true` jika ada GPU | Mode GPU butuh NVIDIA driver |
|
||||
| `TABLES_ENABLED` | `true` | `true` | Ekstraksi tabel personel |
|
||||
|
||||
**Generate API Key:**
|
||||
|
||||
```bash
|
||||
openssl rand -hex 32
|
||||
```
|
||||
|
||||
> [!WARNING]
|
||||
> Jangan pernah deploy ke production tanpa mengisi `API_KEYS`. Jika kosong, semua endpoint terbuka tanpa autentikasi.
|
||||
|
||||
---
|
||||
|
||||
## 6. Reverse Proxy & SSL (Nginx)
|
||||
|
||||
### Install
|
||||
|
||||
```bash
|
||||
sudo apt install -y nginx certbot python3-certbot-nginx
|
||||
```
|
||||
|
||||
### Konfigurasi — `/etc/nginx/sites-available/ocr-sprint`
|
||||
|
||||
```nginx
|
||||
upstream ocr_api {
|
||||
server 127.0.0.1:8000;
|
||||
keepalive 32;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 80;
|
||||
server_name ocr.yourdomain.com;
|
||||
|
||||
client_max_body_size 30M;
|
||||
|
||||
proxy_connect_timeout 300s;
|
||||
proxy_read_timeout 300s;
|
||||
|
||||
location / {
|
||||
proxy_pass http://ocr_api;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
}
|
||||
|
||||
location /metrics {
|
||||
allow 127.0.0.1;
|
||||
allow 10.0.0.0/8;
|
||||
deny all;
|
||||
proxy_pass http://ocr_api;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Enable & SSL
|
||||
|
||||
```bash
|
||||
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
|
||||
sudo nginx -t
|
||||
sudo systemctl reload nginx
|
||||
|
||||
# SSL
|
||||
sudo certbot --nginx -d ocr.yourdomain.com
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Firewall
|
||||
|
||||
```bash
|
||||
sudo ufw allow 22/tcp # SSH — PENTING!
|
||||
sudo ufw allow 80/tcp # HTTP
|
||||
sudo ufw allow 443/tcp # HTTPS
|
||||
sudo ufw enable
|
||||
sudo ufw status
|
||||
```
|
||||
|
||||
> [!CAUTION]
|
||||
> Pastikan SSH (port 22) di-allow **sebelum** enable firewall, agar tidak terkunci dari server.
|
||||
|
||||
---
|
||||
|
||||
## 8. Verifikasi Deployment
|
||||
|
||||
### Health Check
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/api/v1/health
|
||||
# {"status":"ok","version":"0.1.0"}
|
||||
```
|
||||
|
||||
### Test OCR (Sync)
|
||||
|
||||
```bash
|
||||
curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \
|
||||
-H "X-API-Key: your-api-key" \
|
||||
-F "file=@/path/to/test.pdf" | jq
|
||||
```
|
||||
|
||||
### Test OCR (Async — Production Flow)
|
||||
|
||||
```bash
|
||||
# Submit job
|
||||
curl -X POST http://localhost:8000/api/v1/documents \
|
||||
-H "X-API-Key: your-api-key" \
|
||||
-F "file=@document.pdf" | jq
|
||||
# → {"job_id":"8f2a...","status":"pending",...}
|
||||
|
||||
# Poll result
|
||||
curl -H "X-API-Key: your-api-key" \
|
||||
http://localhost:8000/api/v1/documents/8f2a... | jq
|
||||
# → {"status":"completed","confidence":0.93,"data":{...}}
|
||||
```
|
||||
|
||||
### Cek Semua Service Berjalan
|
||||
|
||||
```bash
|
||||
# Docker
|
||||
docker compose ps
|
||||
|
||||
# Manual
|
||||
sudo systemctl status ocr-sprint-api ocr-sprint-worker postgresql redis-server nginx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Monitoring & Maintenance
|
||||
|
||||
### Logs
|
||||
|
||||
```bash
|
||||
# Docker
|
||||
docker compose logs -f api worker
|
||||
|
||||
# Manual (systemd)
|
||||
sudo journalctl -u ocr-sprint-api -f
|
||||
sudo journalctl -u ocr-sprint-worker -f
|
||||
```
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/metrics
|
||||
```
|
||||
|
||||
Metrics penting: `ocr_documents_total`, `ocr_processing_duration_seconds`, `ocr_confidence_score`.
|
||||
|
||||
### Backup Database
|
||||
|
||||
```bash
|
||||
# Docker
|
||||
docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
|
||||
|
||||
# Manual
|
||||
pg_dump -U ocr -h localhost ocr_sprint | gzip > backup_$(date +%Y%m%d).sql.gz
|
||||
```
|
||||
|
||||
### Automated Backup (Cron)
|
||||
|
||||
```bash
|
||||
# /opt/ocr-sprint-service/backup.sh
|
||||
#!/bin/bash
|
||||
BACKUP_DIR="/opt/ocr-sprint-service/backups"
|
||||
mkdir -p $BACKUP_DIR
|
||||
pg_dump -U ocr -h localhost ocr_sprint | gzip > $BACKUP_DIR/db_$(date +%Y%m%d_%H%M%S).sql.gz
|
||||
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete
|
||||
```
|
||||
|
||||
```bash
|
||||
chmod +x /opt/ocr-sprint-service/backup.sh
|
||||
# Cron: daily at 2 AM
|
||||
echo "0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1" | sudo crontab -u ocr -
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Troubleshooting
|
||||
|
||||
| Masalah | Diagnosis | Solusi |
|
||||
|---------|-----------|--------|
|
||||
| Service tidak start | `journalctl -u ocr-sprint-api -n 100` | Cek permissions, `.env`, dan log error |
|
||||
| PaddleOCR model gagal download | Timeout di logs | `python -c "from paddleocr import PaddleOCR; PaddleOCR(lang='latin')"` |
|
||||
| Worker tidak proses jobs | `redis-cli ping` → bukan PONG | Pastikan Redis running, cek `REDIS_URL` |
|
||||
| Database migration error | `alembic current` | `alembic stamp head` lalu `alembic upgrade head` |
|
||||
| Port 8000 sudah dipakai | `ss -tlnp | grep 8000` | Kill proses lama atau ganti port di `.env` |
|
||||
| Out of memory | OOM killer di logs | Kurangi `--concurrency` di worker, atau tambah RAM |
|
||||
|
||||
---
|
||||
|
||||
## 11. Security Checklist
|
||||
|
||||
- [ ] `API_KEYS` diisi dengan random key (`openssl rand -hex 32`)
|
||||
- [ ] Password database diganti dari default
|
||||
- [ ] Firewall aktif (hanya port 22, 80, 443 terbuka)
|
||||
- [ ] SSL/TLS aktif via Nginx + Let's Encrypt
|
||||
- [ ] Endpoint `/metrics` restricted ke internal network
|
||||
- [ ] Backup database otomatis via cron
|
||||
- [ ] OS security updates enabled (`unattended-upgrades`)
|
||||
- [ ] `APP_ENV=prod` (bukan `local`)
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference — Perintah Sehari-hari
|
||||
|
||||
```bash
|
||||
# === Docker ===
|
||||
docker compose up -d # Start
|
||||
docker compose down # Stop
|
||||
docker compose logs -f api # Logs
|
||||
docker compose build && docker compose up -d # Update
|
||||
|
||||
# === Manual ===
|
||||
sudo systemctl restart ocr-sprint-api ocr-sprint-worker # Restart
|
||||
sudo journalctl -u ocr-sprint-api -f # Logs
|
||||
curl http://localhost:8000/api/v1/health # Health check
|
||||
```
|
||||
Reference in New Issue
Block a user