572 lines
14 KiB
Markdown
572 lines
14 KiB
Markdown
# Panduan Deployment OCR Sprint Service
|
||
|
||
> Dokumen ini adalah panduan langkah-langkah deployment **ocr-sprint-service** ke server production. Disusun berdasarkan kondisi kodingan aktual per April 2026 (Phase 1–4 selesai).
|
||
|
||
---
|
||
|
||
## Daftar Isi
|
||
|
||
1. [Gambaran Arsitektur](#1-gambaran-arsitektur)
|
||
2. [Prasyarat Server](#2-prasyarat-server)
|
||
3. [Opsi A — Docker Compose (Recommended)](#3-opsi-a--docker-compose-recommended)
|
||
4. [Opsi B — Manual (Tanpa Docker)](#4-opsi-b--manual-tanpa-docker)
|
||
5. [Konfigurasi Environment Production](#5-konfigurasi-environment-production)
|
||
6. [Reverse Proxy & SSL (Nginx)](#6-reverse-proxy--ssl-nginx)
|
||
7. [Firewall](#7-firewall)
|
||
8. [Verifikasi Deployment](#8-verifikasi-deployment)
|
||
9. [Monitoring & Maintenance](#9-monitoring--maintenance)
|
||
10. [Troubleshooting](#10-troubleshooting)
|
||
11. [Security Checklist](#11-security-checklist)
|
||
|
||
---
|
||
|
||
## 1. Gambaran Arsitektur
|
||
|
||
```
|
||
┌──────────┐ ┌──────────────┐ ┌───────┐
|
||
│ Client │────▶│ Nginx (SSL) │────▶│ API │──▶ PaddleOCR
|
||
└──────────┘ └──────────────┘ │ :8000 │ Pipeline
|
||
└───┬───┘
|
||
│ async job
|
||
┌─────▼─────┐
|
||
│ Redis │
|
||
│ :6379 │
|
||
└─────┬─────┘
|
||
┌─────▼──────┐
|
||
│ Worker │──▶ PaddleOCR
|
||
│ (Celery) │ Pipeline
|
||
└─────┬──────┘
|
||
┌─────▼──────┐
|
||
│ PostgreSQL │
|
||
│ :5432 │
|
||
└────────────┘
|
||
```
|
||
|
||
**4 services** yang harus berjalan:
|
||
|
||
| Service | Fungsi |
|
||
|---------|--------|
|
||
| **API** (FastAPI + Uvicorn) | Menerima upload dokumen, serve hasil OCR |
|
||
| **Worker** (Celery) | Async OCR processing di background |
|
||
| **Redis** | Message broker untuk job queue |
|
||
| **PostgreSQL** | Menyimpan job state & hasil ekstraksi |
|
||
|
||
Blob storage menggunakan **local filesystem** (belum S3/MinIO).
|
||
|
||
---
|
||
|
||
## 2. Prasyarat Server
|
||
|
||
### Spesifikasi Minimum
|
||
|
||
| Resource | Minimum | Recommended |
|
||
|----------|---------|-------------|
|
||
| OS | Ubuntu 20.04+ / Debian 11+ | Ubuntu 22.04+ |
|
||
| CPU | 4 cores | 8 cores |
|
||
| RAM | 8 GB | 16 GB |
|
||
| Storage | 50 GB free | 100 GB free |
|
||
| Python | 3.10–3.12 | 3.11 atau 3.12 |
|
||
| Network | Port 8000 (internal) | + Port 80/443 (Nginx) |
|
||
|
||
### Kebutuhan Disk
|
||
|
||
- ~1.5 GB — PaddlePaddle wheels
|
||
- ~200 MB — PaddleOCR model downloads (otomatis saat pertama jalan)
|
||
- Sisanya — blob storage dokumen yang diupload
|
||
|
||
### Software yang Dibutuhkan
|
||
|
||
- **Docker Compose** — untuk Opsi A
|
||
- **Python 3.10–3.12 + PostgreSQL + Redis** — untuk Opsi B
|
||
- **Git** — kedua opsi
|
||
- **Nginx** (opsional) — reverse proxy + SSL
|
||
|
||
---
|
||
|
||
## 3. Opsi A — Docker Compose (Recommended)
|
||
|
||
> Cara paling cepat. Semua service (API, Worker, Redis, Postgres) berjalan dalam container.
|
||
|
||
### 3.1 Login & Clone
|
||
|
||
```bash
|
||
ssh user@your-server.com
|
||
|
||
git clone https://github.com/Adriankf59/ocr-sprint-service.git
|
||
cd ocr-sprint-service
|
||
```
|
||
|
||
### 3.2 Konfigurasi .env
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
nano .env
|
||
```
|
||
|
||
Lihat [Bagian 5](#5-konfigurasi-environment-production) untuk detail konfigurasi production.
|
||
|
||
> [!IMPORTANT]
|
||
> Untuk Docker Compose, **jangan ubah** `DATABASE_URL` dan `REDIS_URL` — sudah dioverride oleh `docker-compose.yml` via environment variables di masing-masing container.
|
||
|
||
### 3.3 Build & Start
|
||
|
||
```bash
|
||
# Build image (~5–10 menit pertama kali)
|
||
docker compose build
|
||
|
||
# Start semua services
|
||
docker compose up -d
|
||
|
||
# Cek logs
|
||
docker compose logs -f api worker
|
||
```
|
||
|
||
Container `api` akan otomatis menjalankan `alembic upgrade head` sebelum start server (lihat `command` di `docker-compose.yml`).
|
||
|
||
### 3.4 First-Run Model Download
|
||
|
||
Request pertama akan trigger download model PaddleOCR (~200 MB) ke Docker volume `paddle-models`. Tunggu hingga selesai sebelum test.
|
||
|
||
```bash
|
||
# Monitor download di logs
|
||
docker compose logs -f api
|
||
```
|
||
|
||
### 3.5 Verifikasi
|
||
|
||
```bash
|
||
curl http://localhost:8000/api/v1/health
|
||
# Expected: {"status":"ok","version":"0.1.0"}
|
||
```
|
||
|
||
### 3.6 Update Service (Setelah Ada Perubahan Kode)
|
||
|
||
```bash
|
||
cd ocr-sprint-service
|
||
git pull
|
||
docker compose build
|
||
docker compose up -d
|
||
```
|
||
|
||
---
|
||
|
||
## 4. Opsi B — Manual (Tanpa Docker)
|
||
|
||
> Untuk server yang sudah punya Python, PostgreSQL, dan Redis terinstall.
|
||
|
||
### 4.1 Install System Libraries
|
||
|
||
```bash
|
||
sudo apt update && sudo apt upgrade -y
|
||
|
||
# Libraries untuk OpenCV & PaddleOCR
|
||
sudo apt install -y \
|
||
python3.11 python3.11-venv python3.11-dev \
|
||
libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \
|
||
libgomp1 libmagic1 \
|
||
build-essential git curl
|
||
|
||
# Install Redis & PostgreSQL (jika belum ada)
|
||
sudo apt install -y redis-server postgresql postgresql-contrib
|
||
sudo systemctl enable --now redis-server postgresql
|
||
```
|
||
|
||
> [!NOTE]
|
||
> Jika server sudah punya Python 3.12, gunakan `python3.12` di semua perintah selanjutnya.
|
||
|
||
### 4.2 Setup Database
|
||
|
||
```bash
|
||
sudo -u postgres psql
|
||
```
|
||
|
||
```sql
|
||
CREATE USER ocr WITH PASSWORD 'ganti-password-kuat';
|
||
CREATE DATABASE ocr_sprint OWNER ocr;
|
||
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
|
||
\c ocr_sprint
|
||
GRANT ALL ON SCHEMA public TO ocr;
|
||
\q
|
||
```
|
||
|
||
### 4.3 Create Application User & Directory
|
||
|
||
```bash
|
||
sudo useradd -m -s /bin/bash ocr
|
||
sudo mkdir -p /opt/ocr-sprint-service
|
||
sudo chown ocr:ocr /opt/ocr-sprint-service
|
||
```
|
||
|
||
### 4.4 Clone & Install
|
||
|
||
```bash
|
||
sudo su - ocr
|
||
cd /opt
|
||
git clone https://github.com/Adriankf59/ocr-sprint-service.git
|
||
cd ocr-sprint-service
|
||
|
||
# Create virtual environment
|
||
python3.11 -m venv .venv
|
||
source .venv/bin/activate
|
||
|
||
# Install dependencies + OCR runtime (~1.5 GB download)
|
||
pip install --upgrade pip setuptools wheel
|
||
pip install -e ".[ocr]"
|
||
|
||
# Verify
|
||
python -c "import paddleocr; print('PaddleOCR OK')"
|
||
python -c "import fastapi; print('FastAPI OK')"
|
||
```
|
||
|
||
### 4.5 Konfigurasi .env
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
nano .env
|
||
```
|
||
|
||
**Wajib diubah untuk manual deployment:**
|
||
|
||
```bash
|
||
APP_ENV=prod
|
||
DATABASE_URL=postgresql+psycopg://ocr:ganti-password-kuat@localhost:5432/ocr_sprint
|
||
REDIS_URL=redis://localhost:6379/0
|
||
QUEUE_ENABLED=true
|
||
API_KEYS=your-generated-api-key
|
||
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
|
||
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
|
||
```
|
||
|
||
```bash
|
||
# Create storage directories
|
||
mkdir -p /opt/ocr-sprint-service/storage/blobs
|
||
```
|
||
|
||
### 4.6 Run Database Migrations
|
||
|
||
```bash
|
||
source .venv/bin/activate
|
||
alembic upgrade head
|
||
alembic current # verify
|
||
```
|
||
|
||
### 4.7 Test Manual
|
||
|
||
```bash
|
||
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
|
||
# Di terminal lain: curl http://localhost:8000/api/v1/health
|
||
# Ctrl+C untuk stop
|
||
```
|
||
|
||
### 4.8 Setup Systemd Services
|
||
|
||
**API Service** — `/etc/systemd/system/ocr-sprint-api.service`:
|
||
|
||
```ini
|
||
[Unit]
|
||
Description=OCR Sprint API Service
|
||
After=network.target postgresql.service redis-server.service
|
||
|
||
[Service]
|
||
Type=simple
|
||
User=ocr
|
||
Group=ocr
|
||
WorkingDirectory=/opt/ocr-sprint-service
|
||
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
|
||
EnvironmentFile=/opt/ocr-sprint-service/.env
|
||
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
|
||
ocr_sprint.main:app \
|
||
--host 0.0.0.0 --port 8000 --workers 4 --log-level info
|
||
Restart=always
|
||
RestartSec=10
|
||
LimitNOFILE=65536
|
||
NoNewPrivileges=true
|
||
|
||
[Install]
|
||
WantedBy=multi-user.target
|
||
```
|
||
|
||
**Worker Service** — `/etc/systemd/system/ocr-sprint-worker.service`:
|
||
|
||
```ini
|
||
[Unit]
|
||
Description=OCR Sprint Celery Worker
|
||
After=network.target postgresql.service redis-server.service
|
||
|
||
[Service]
|
||
Type=simple
|
||
User=ocr
|
||
Group=ocr
|
||
WorkingDirectory=/opt/ocr-sprint-service
|
||
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
|
||
EnvironmentFile=/opt/ocr-sprint-service/.env
|
||
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
|
||
-A ocr_sprint.worker.celery_app worker \
|
||
--loglevel=info --concurrency=2 --max-tasks-per-child=100
|
||
Restart=always
|
||
RestartSec=10
|
||
LimitNOFILE=65536
|
||
NoNewPrivileges=true
|
||
|
||
[Install]
|
||
WantedBy=multi-user.target
|
||
```
|
||
|
||
**Enable & Start:**
|
||
|
||
```bash
|
||
# Keluar dari user ocr dulu
|
||
exit
|
||
|
||
sudo systemctl daemon-reload
|
||
sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker
|
||
sudo systemctl status ocr-sprint-api ocr-sprint-worker
|
||
```
|
||
|
||
### 4.9 Update Service (Manual)
|
||
|
||
```bash
|
||
sudo su - ocr
|
||
cd /opt/ocr-sprint-service
|
||
git pull
|
||
source .venv/bin/activate
|
||
pip install -e ".[ocr]"
|
||
alembic upgrade head
|
||
exit
|
||
|
||
sudo systemctl restart ocr-sprint-api ocr-sprint-worker
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Konfigurasi Environment Production
|
||
|
||
Berikut konfigurasi `.env` yang **wajib diubah** dari default untuk production:
|
||
|
||
| Variable | Default | Production | Keterangan |
|
||
|----------|---------|------------|------------|
|
||
| `APP_ENV` | `local` | `prod` | Mode environment |
|
||
| `API_KEYS` | *(kosong)* | `key1,key2` | **WAJIB!** Auth disabled jika kosong |
|
||
| `QUEUE_ENABLED` | `false` | `true` | Aktifkan async processing |
|
||
| `DATABASE_URL` | `sqlite:///...` | `postgresql+psycopg://...` | Docker: otomatis di-override |
|
||
| `REDIS_URL` | `redis://localhost:6379/0` | Sesuaikan | Docker: otomatis di-override |
|
||
| `OCR_USE_GPU` | `false` | `true` jika ada GPU | Mode GPU butuh NVIDIA driver |
|
||
| `TABLES_ENABLED` | `true` | `true` | Ekstraksi tabel personel |
|
||
|
||
**Generate API Key:**
|
||
|
||
```bash
|
||
openssl rand -hex 32
|
||
```
|
||
|
||
> [!WARNING]
|
||
> Jangan pernah deploy ke production tanpa mengisi `API_KEYS`. Jika kosong, semua endpoint terbuka tanpa autentikasi.
|
||
|
||
---
|
||
|
||
## 6. Reverse Proxy & SSL (Nginx)
|
||
|
||
### Install
|
||
|
||
```bash
|
||
sudo apt install -y nginx certbot python3-certbot-nginx
|
||
```
|
||
|
||
### Konfigurasi — `/etc/nginx/sites-available/ocr-sprint`
|
||
|
||
```nginx
|
||
upstream ocr_api {
|
||
server 127.0.0.1:8000;
|
||
keepalive 32;
|
||
}
|
||
|
||
server {
|
||
listen 80;
|
||
server_name ocr.yourdomain.com;
|
||
|
||
client_max_body_size 30M;
|
||
|
||
proxy_connect_timeout 300s;
|
||
proxy_read_timeout 300s;
|
||
|
||
location / {
|
||
proxy_pass http://ocr_api;
|
||
proxy_http_version 1.1;
|
||
proxy_set_header Host $host;
|
||
proxy_set_header X-Real-IP $remote_addr;
|
||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||
proxy_set_header X-Forwarded-Proto $scheme;
|
||
}
|
||
|
||
location /metrics {
|
||
allow 127.0.0.1;
|
||
allow 10.0.0.0/8;
|
||
deny all;
|
||
proxy_pass http://ocr_api;
|
||
}
|
||
}
|
||
```
|
||
|
||
### Enable & SSL
|
||
|
||
```bash
|
||
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
|
||
sudo nginx -t
|
||
sudo systemctl reload nginx
|
||
|
||
# SSL
|
||
sudo certbot --nginx -d ocr.yourdomain.com
|
||
```
|
||
|
||
---
|
||
|
||
## 7. Firewall
|
||
|
||
```bash
|
||
sudo ufw allow 22/tcp # SSH — PENTING!
|
||
sudo ufw allow 80/tcp # HTTP
|
||
sudo ufw allow 443/tcp # HTTPS
|
||
sudo ufw enable
|
||
sudo ufw status
|
||
```
|
||
|
||
> [!CAUTION]
|
||
> Pastikan SSH (port 22) di-allow **sebelum** enable firewall, agar tidak terkunci dari server.
|
||
|
||
---
|
||
|
||
## 8. Verifikasi Deployment
|
||
|
||
### Health Check
|
||
|
||
```bash
|
||
curl http://localhost:8000/api/v1/health
|
||
# {"status":"ok","version":"0.1.0"}
|
||
```
|
||
|
||
### Test OCR (Sync)
|
||
|
||
```bash
|
||
curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \
|
||
-H "X-API-Key: your-api-key" \
|
||
-F "file=@/path/to/test.pdf" | jq
|
||
```
|
||
|
||
### Test OCR (Async — Production Flow)
|
||
|
||
```bash
|
||
# Submit job
|
||
curl -X POST http://localhost:8000/api/v1/documents \
|
||
-H "X-API-Key: your-api-key" \
|
||
-F "file=@document.pdf" | jq
|
||
# → {"job_id":"8f2a...","status":"pending",...}
|
||
|
||
# Poll result
|
||
curl -H "X-API-Key: your-api-key" \
|
||
http://localhost:8000/api/v1/documents/8f2a... | jq
|
||
# → {"status":"completed","confidence":0.93,"data":{...}}
|
||
```
|
||
|
||
### Cek Semua Service Berjalan
|
||
|
||
```bash
|
||
# Docker
|
||
docker compose ps
|
||
|
||
# Manual
|
||
sudo systemctl status ocr-sprint-api ocr-sprint-worker postgresql redis-server nginx
|
||
```
|
||
|
||
---
|
||
|
||
## 9. Monitoring & Maintenance
|
||
|
||
### Logs
|
||
|
||
```bash
|
||
# Docker
|
||
docker compose logs -f api worker
|
||
|
||
# Manual (systemd)
|
||
sudo journalctl -u ocr-sprint-api -f
|
||
sudo journalctl -u ocr-sprint-worker -f
|
||
```
|
||
|
||
### Prometheus Metrics
|
||
|
||
```bash
|
||
curl http://localhost:8000/metrics
|
||
```
|
||
|
||
Metrics penting: `ocr_documents_total`, `ocr_processing_duration_seconds`, `ocr_confidence_score`.
|
||
|
||
### Backup Database
|
||
|
||
```bash
|
||
# Docker
|
||
docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
|
||
|
||
# Manual
|
||
pg_dump -U ocr -h localhost ocr_sprint | gzip > backup_$(date +%Y%m%d).sql.gz
|
||
```
|
||
|
||
### Automated Backup (Cron)
|
||
|
||
```bash
|
||
# /opt/ocr-sprint-service/backup.sh
|
||
#!/bin/bash
|
||
BACKUP_DIR="/opt/ocr-sprint-service/backups"
|
||
mkdir -p $BACKUP_DIR
|
||
pg_dump -U ocr -h localhost ocr_sprint | gzip > $BACKUP_DIR/db_$(date +%Y%m%d_%H%M%S).sql.gz
|
||
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete
|
||
```
|
||
|
||
```bash
|
||
chmod +x /opt/ocr-sprint-service/backup.sh
|
||
# Cron: daily at 2 AM
|
||
echo "0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1" | sudo crontab -u ocr -
|
||
```
|
||
|
||
---
|
||
|
||
## 10. Troubleshooting
|
||
|
||
| Masalah | Diagnosis | Solusi |
|
||
|---------|-----------|--------|
|
||
| Service tidak start | `journalctl -u ocr-sprint-api -n 100` | Cek permissions, `.env`, dan log error |
|
||
| PaddleOCR model gagal download | Timeout di logs | `python -c "from paddleocr import PaddleOCR; PaddleOCR(lang='latin')"` |
|
||
| Worker tidak proses jobs | `redis-cli ping` → bukan PONG | Pastikan Redis running, cek `REDIS_URL` |
|
||
| Database migration error | `alembic current` | `alembic stamp head` lalu `alembic upgrade head` |
|
||
| Port 8000 sudah dipakai | `ss -tlnp | grep 8000` | Kill proses lama atau ganti port di `.env` |
|
||
| Out of memory | OOM killer di logs | Kurangi `--concurrency` di worker, atau tambah RAM |
|
||
|
||
---
|
||
|
||
## 11. Security Checklist
|
||
|
||
- [ ] `API_KEYS` diisi dengan random key (`openssl rand -hex 32`)
|
||
- [ ] Password database diganti dari default
|
||
- [ ] Firewall aktif (hanya port 22, 80, 443 terbuka)
|
||
- [ ] SSL/TLS aktif via Nginx + Let's Encrypt
|
||
- [ ] Endpoint `/metrics` restricted ke internal network
|
||
- [ ] Backup database otomatis via cron
|
||
- [ ] OS security updates enabled (`unattended-upgrades`)
|
||
- [ ] `APP_ENV=prod` (bukan `local`)
|
||
|
||
---
|
||
|
||
## Quick Reference — Perintah Sehari-hari
|
||
|
||
```bash
|
||
# === Docker ===
|
||
docker compose up -d # Start
|
||
docker compose down # Stop
|
||
docker compose logs -f api # Logs
|
||
docker compose build && docker compose up -d # Update
|
||
|
||
# === Manual ===
|
||
sudo systemctl restart ocr-sprint-api ocr-sprint-worker # Restart
|
||
sudo journalctl -u ocr-sprint-api -f # Logs
|
||
curl http://localhost:8000/api/v1/health # Health check
|
||
```
|