Files
OCR-SPRIN-SERVICE/docs/DEPLOYMENT-GUIDE.md

572 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Panduan Deployment OCR Sprint Service
> Dokumen ini adalah panduan langkah-langkah deployment **ocr-sprint-service** ke server production. Disusun berdasarkan kondisi kodingan aktual per April 2026 (Phase 14 selesai).
---
## Daftar Isi
1. [Gambaran Arsitektur](#1-gambaran-arsitektur)
2. [Prasyarat Server](#2-prasyarat-server)
3. [Opsi A — Docker Compose (Recommended)](#3-opsi-a--docker-compose-recommended)
4. [Opsi B — Manual (Tanpa Docker)](#4-opsi-b--manual-tanpa-docker)
5. [Konfigurasi Environment Production](#5-konfigurasi-environment-production)
6. [Reverse Proxy & SSL (Nginx)](#6-reverse-proxy--ssl-nginx)
7. [Firewall](#7-firewall)
8. [Verifikasi Deployment](#8-verifikasi-deployment)
9. [Monitoring & Maintenance](#9-monitoring--maintenance)
10. [Troubleshooting](#10-troubleshooting)
11. [Security Checklist](#11-security-checklist)
---
## 1. Gambaran Arsitektur
```
┌──────────┐ ┌──────────────┐ ┌───────┐
│ Client │────▶│ Nginx (SSL) │────▶│ API │──▶ PaddleOCR
└──────────┘ └──────────────┘ │ :8000 │ Pipeline
└───┬───┘
│ async job
┌─────▼─────┐
│ Redis │
│ :6379 │
└─────┬─────┘
┌─────▼──────┐
│ Worker │──▶ PaddleOCR
│ (Celery) │ Pipeline
└─────┬──────┘
┌─────▼──────┐
│ PostgreSQL │
│ :5432 │
└────────────┘
```
**4 services** yang harus berjalan:
| Service | Fungsi |
|---------|--------|
| **API** (FastAPI + Uvicorn) | Menerima upload dokumen, serve hasil OCR |
| **Worker** (Celery) | Async OCR processing di background |
| **Redis** | Message broker untuk job queue |
| **PostgreSQL** | Menyimpan job state & hasil ekstraksi |
Blob storage menggunakan **local filesystem** (belum S3/MinIO).
---
## 2. Prasyarat Server
### Spesifikasi Minimum
| Resource | Minimum | Recommended |
|----------|---------|-------------|
| OS | Ubuntu 20.04+ / Debian 11+ | Ubuntu 22.04+ |
| CPU | 4 cores | 8 cores |
| RAM | 8 GB | 16 GB |
| Storage | 50 GB free | 100 GB free |
| Python | 3.103.12 | 3.11 atau 3.12 |
| Network | Port 8000 (internal) | + Port 80/443 (Nginx) |
### Kebutuhan Disk
- ~1.5 GB — PaddlePaddle wheels
- ~200 MB — PaddleOCR model downloads (otomatis saat pertama jalan)
- Sisanya — blob storage dokumen yang diupload
### Software yang Dibutuhkan
- **Docker Compose** — untuk Opsi A
- **Python 3.103.12 + PostgreSQL + Redis** — untuk Opsi B
- **Git** — kedua opsi
- **Nginx** (opsional) — reverse proxy + SSL
---
## 3. Opsi A — Docker Compose (Recommended)
> Cara paling cepat. Semua service (API, Worker, Redis, Postgres) berjalan dalam container.
### 3.1 Login & Clone
```bash
ssh user@your-server.com
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
```
### 3.2 Konfigurasi .env
```bash
cp .env.example .env
nano .env
```
Lihat [Bagian 5](#5-konfigurasi-environment-production) untuk detail konfigurasi production.
> [!IMPORTANT]
> Untuk Docker Compose, **jangan ubah** `DATABASE_URL` dan `REDIS_URL` — sudah dioverride oleh `docker-compose.yml` via environment variables di masing-masing container.
### 3.3 Build & Start
```bash
# Build image (~510 menit pertama kali)
docker compose build
# Start semua services
docker compose up -d
# Cek logs
docker compose logs -f api worker
```
Container `api` akan otomatis menjalankan `alembic upgrade head` sebelum start server (lihat `command` di `docker-compose.yml`).
### 3.4 First-Run Model Download
Request pertama akan trigger download model PaddleOCR (~200 MB) ke Docker volume `paddle-models`. Tunggu hingga selesai sebelum test.
```bash
# Monitor download di logs
docker compose logs -f api
```
### 3.5 Verifikasi
```bash
curl http://localhost:8000/api/v1/health
# Expected: {"status":"ok","version":"0.1.0"}
```
### 3.6 Update Service (Setelah Ada Perubahan Kode)
```bash
cd ocr-sprint-service
git pull
docker compose build
docker compose up -d
```
---
## 4. Opsi B — Manual (Tanpa Docker)
> Untuk server yang sudah punya Python, PostgreSQL, dan Redis terinstall.
### 4.1 Install System Libraries
```bash
sudo apt update && sudo apt upgrade -y
# Libraries untuk OpenCV & PaddleOCR
sudo apt install -y \
python3.11 python3.11-venv python3.11-dev \
libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \
libgomp1 libmagic1 \
build-essential git curl
# Install Redis & PostgreSQL (jika belum ada)
sudo apt install -y redis-server postgresql postgresql-contrib
sudo systemctl enable --now redis-server postgresql
```
> [!NOTE]
> Jika server sudah punya Python 3.12, gunakan `python3.12` di semua perintah selanjutnya.
### 4.2 Setup Database
```bash
sudo -u postgres psql
```
```sql
CREATE USER ocr WITH PASSWORD 'ganti-password-kuat';
CREATE DATABASE ocr_sprint OWNER ocr;
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
\c ocr_sprint
GRANT ALL ON SCHEMA public TO ocr;
\q
```
### 4.3 Create Application User & Directory
```bash
sudo useradd -m -s /bin/bash ocr
sudo mkdir -p /opt/ocr-sprint-service
sudo chown ocr:ocr /opt/ocr-sprint-service
```
### 4.4 Clone & Install
```bash
sudo su - ocr
cd /opt
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
# Install dependencies + OCR runtime (~1.5 GB download)
pip install --upgrade pip setuptools wheel
pip install -e ".[ocr]"
# Verify
python -c "import paddleocr; print('PaddleOCR OK')"
python -c "import fastapi; print('FastAPI OK')"
```
### 4.5 Konfigurasi .env
```bash
cp .env.example .env
nano .env
```
**Wajib diubah untuk manual deployment:**
```bash
APP_ENV=prod
DATABASE_URL=postgresql+psycopg://ocr:ganti-password-kuat@localhost:5432/ocr_sprint
REDIS_URL=redis://localhost:6379/0
QUEUE_ENABLED=true
API_KEYS=your-generated-api-key
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
```
```bash
# Create storage directories
mkdir -p /opt/ocr-sprint-service/storage/blobs
```
### 4.6 Run Database Migrations
```bash
source .venv/bin/activate
alembic upgrade head
alembic current # verify
```
### 4.7 Test Manual
```bash
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
# Di terminal lain: curl http://localhost:8000/api/v1/health
# Ctrl+C untuk stop
```
### 4.8 Setup Systemd Services
**API Service**`/etc/systemd/system/ocr-sprint-api.service`:
```ini
[Unit]
Description=OCR Sprint API Service
After=network.target postgresql.service redis-server.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
ocr_sprint.main:app \
--host 0.0.0.0 --port 8000 --workers 4 --log-level info
Restart=always
RestartSec=10
LimitNOFILE=65536
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
```
**Worker Service**`/etc/systemd/system/ocr-sprint-worker.service`:
```ini
[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis-server.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
-A ocr_sprint.worker.celery_app worker \
--loglevel=info --concurrency=2 --max-tasks-per-child=100
Restart=always
RestartSec=10
LimitNOFILE=65536
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
```
**Enable & Start:**
```bash
# Keluar dari user ocr dulu
exit
sudo systemctl daemon-reload
sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker
sudo systemctl status ocr-sprint-api ocr-sprint-worker
```
### 4.9 Update Service (Manual)
```bash
sudo su - ocr
cd /opt/ocr-sprint-service
git pull
source .venv/bin/activate
pip install -e ".[ocr]"
alembic upgrade head
exit
sudo systemctl restart ocr-sprint-api ocr-sprint-worker
```
---
## 5. Konfigurasi Environment Production
Berikut konfigurasi `.env` yang **wajib diubah** dari default untuk production:
| Variable | Default | Production | Keterangan |
|----------|---------|------------|------------|
| `APP_ENV` | `local` | `prod` | Mode environment |
| `API_KEYS` | *(kosong)* | `key1,key2` | **WAJIB!** Auth disabled jika kosong |
| `QUEUE_ENABLED` | `false` | `true` | Aktifkan async processing |
| `DATABASE_URL` | `sqlite:///...` | `postgresql+psycopg://...` | Docker: otomatis di-override |
| `REDIS_URL` | `redis://localhost:6379/0` | Sesuaikan | Docker: otomatis di-override |
| `OCR_USE_GPU` | `false` | `true` jika ada GPU | Mode GPU butuh NVIDIA driver |
| `TABLES_ENABLED` | `true` | `true` | Ekstraksi tabel personel |
**Generate API Key:**
```bash
openssl rand -hex 32
```
> [!WARNING]
> Jangan pernah deploy ke production tanpa mengisi `API_KEYS`. Jika kosong, semua endpoint terbuka tanpa autentikasi.
---
## 6. Reverse Proxy & SSL (Nginx)
### Install
```bash
sudo apt install -y nginx certbot python3-certbot-nginx
```
### Konfigurasi — `/etc/nginx/sites-available/ocr-sprint`
```nginx
upstream ocr_api {
server 127.0.0.1:8000;
keepalive 32;
}
server {
listen 80;
server_name ocr.yourdomain.com;
client_max_body_size 30M;
proxy_connect_timeout 300s;
proxy_read_timeout 300s;
location / {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /metrics {
allow 127.0.0.1;
allow 10.0.0.0/8;
deny all;
proxy_pass http://ocr_api;
}
}
```
### Enable & SSL
```bash
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
# SSL
sudo certbot --nginx -d ocr.yourdomain.com
```
---
## 7. Firewall
```bash
sudo ufw allow 22/tcp # SSH — PENTING!
sudo ufw allow 80/tcp # HTTP
sudo ufw allow 443/tcp # HTTPS
sudo ufw enable
sudo ufw status
```
> [!CAUTION]
> Pastikan SSH (port 22) di-allow **sebelum** enable firewall, agar tidak terkunci dari server.
---
## 8. Verifikasi Deployment
### Health Check
```bash
curl http://localhost:8000/api/v1/health
# {"status":"ok","version":"0.1.0"}
```
### Test OCR (Sync)
```bash
curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \
-H "X-API-Key: your-api-key" \
-F "file=@/path/to/test.pdf" | jq
```
### Test OCR (Async — Production Flow)
```bash
# Submit job
curl -X POST http://localhost:8000/api/v1/documents \
-H "X-API-Key: your-api-key" \
-F "file=@document.pdf" | jq
# → {"job_id":"8f2a...","status":"pending",...}
# Poll result
curl -H "X-API-Key: your-api-key" \
http://localhost:8000/api/v1/documents/8f2a... | jq
# → {"status":"completed","confidence":0.93,"data":{...}}
```
### Cek Semua Service Berjalan
```bash
# Docker
docker compose ps
# Manual
sudo systemctl status ocr-sprint-api ocr-sprint-worker postgresql redis-server nginx
```
---
## 9. Monitoring & Maintenance
### Logs
```bash
# Docker
docker compose logs -f api worker
# Manual (systemd)
sudo journalctl -u ocr-sprint-api -f
sudo journalctl -u ocr-sprint-worker -f
```
### Prometheus Metrics
```bash
curl http://localhost:8000/metrics
```
Metrics penting: `ocr_documents_total`, `ocr_processing_duration_seconds`, `ocr_confidence_score`.
### Backup Database
```bash
# Docker
docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
# Manual
pg_dump -U ocr -h localhost ocr_sprint | gzip > backup_$(date +%Y%m%d).sql.gz
```
### Automated Backup (Cron)
```bash
# /opt/ocr-sprint-service/backup.sh
#!/bin/bash
BACKUP_DIR="/opt/ocr-sprint-service/backups"
mkdir -p $BACKUP_DIR
pg_dump -U ocr -h localhost ocr_sprint | gzip > $BACKUP_DIR/db_$(date +%Y%m%d_%H%M%S).sql.gz
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete
```
```bash
chmod +x /opt/ocr-sprint-service/backup.sh
# Cron: daily at 2 AM
echo "0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1" | sudo crontab -u ocr -
```
---
## 10. Troubleshooting
| Masalah | Diagnosis | Solusi |
|---------|-----------|--------|
| Service tidak start | `journalctl -u ocr-sprint-api -n 100` | Cek permissions, `.env`, dan log error |
| PaddleOCR model gagal download | Timeout di logs | `python -c "from paddleocr import PaddleOCR; PaddleOCR(lang='latin')"` |
| Worker tidak proses jobs | `redis-cli ping` → bukan PONG | Pastikan Redis running, cek `REDIS_URL` |
| Database migration error | `alembic current` | `alembic stamp head` lalu `alembic upgrade head` |
| Port 8000 sudah dipakai | `ss -tlnp | grep 8000` | Kill proses lama atau ganti port di `.env` |
| Out of memory | OOM killer di logs | Kurangi `--concurrency` di worker, atau tambah RAM |
---
## 11. Security Checklist
- [ ] `API_KEYS` diisi dengan random key (`openssl rand -hex 32`)
- [ ] Password database diganti dari default
- [ ] Firewall aktif (hanya port 22, 80, 443 terbuka)
- [ ] SSL/TLS aktif via Nginx + Let's Encrypt
- [ ] Endpoint `/metrics` restricted ke internal network
- [ ] Backup database otomatis via cron
- [ ] OS security updates enabled (`unattended-upgrades`)
- [ ] `APP_ENV=prod` (bukan `local`)
---
## Quick Reference — Perintah Sehari-hari
```bash
# === Docker ===
docker compose up -d # Start
docker compose down # Stop
docker compose logs -f api # Logs
docker compose build && docker compose up -d # Update
# === Manual ===
sudo systemctl restart ocr-sprint-api ocr-sprint-worker # Restart
sudo journalctl -u ocr-sprint-api -f # Logs
curl http://localhost:8000/api/v1/health # Health check
```