9.2 KiB
9.2 KiB
Quickstart Deployment OCR Sprint Service
Panduan deployment OCR Sprint Service ke server production untuk pemrosesan dokumen surat sprint Polri.
Prasyarat Server
Spesifikasi Minimum
- OS: Linux (Ubuntu 20.04+ / Debian 11+ / RHEL 8+)
- CPU: 4 cores (8 cores recommended untuk throughput tinggi)
- RAM: 8 GB minimum (16 GB recommended)
- Storage: 50 GB free space
- ~3 GB untuk model PaddleOCR
- ~1.5 GB untuk dependencies Python
- Sisanya untuk blob storage dokumen
- Network: Port 8000 terbuka untuk API access
Software Requirements
- Docker 24.0+ dan Docker Compose v2
- Git
- (Opsional) Nginx/Caddy untuk reverse proxy + SSL
Deployment dengan Docker Compose (Recommended)
1. Clone Repository
# Login ke server sebagai user non-root dengan sudo access
ssh user@your-server.com
# Clone repository
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
2. Konfigurasi Environment
# Copy template environment
cp .env.example .env
# Edit konfigurasi production
nano .env
Konfigurasi penting untuk production:
# ==== App ====
APP_ENV=prod
APP_LOG_LEVEL=INFO
# ==== Storage ====
STORAGE_LOCAL_DIR=/app/storage
BLOB_STORAGE_DIR=/app/storage/blobs
BLOB_MAX_UPLOAD_MB=25
# ==== OCR ====
OCR_LANG=latin
OCR_USE_GPU=false # set true jika server punya GPU NVIDIA
OCR_MAX_IMAGE_SIDE=2200
# ==== Preprocessing ====
PREPROCESS_TARGET_DPI=300
PREPROCESS_DENOISE=true
PREPROCESS_DESKEW=true
PREPROCESS_DETECT_DOCUMENT=true
PREPROCESS_REMOVE_SHADOW=true
# ==== Table Extraction ====
TABLES_ENABLED=true
# ==== Async Pipeline ====
QUEUE_ENABLED=true
REDIS_URL=redis://redis:6379/0
CELERY_TASK_DEFAULT_QUEUE=ocr_sprint
# ==== Database ====
DATABASE_URL=postgresql+psycopg://ocr:ocr@postgres:5432/ocr_sprint
DATABASE_ECHO=false
# ==== Auth (WAJIB untuk production!) ====
API_KEYS=your-secret-key-1,your-secret-key-2
API_KEY_HEADER=X-API-Key
Generate API keys yang aman:
# Generate random API key
openssl rand -hex 32
3. Build dan Start Services
# Build Docker images
docker compose build
# Start semua services (API, Worker, Redis, Postgres)
docker compose up -d
# Cek logs untuk memastikan semua berjalan
docker compose logs -f api worker
Services yang berjalan:
api: FastAPI server di port 8000worker: Celery worker untuk async processingredis: Message broker untuk job queuepostgres: Database untuk job state
4. Verifikasi Deployment
# Health check
curl http://localhost:8000/api/v1/health
# Expected response:
# {"status":"ok","version":"0.1.0"}
# Test OCR endpoint (sync mode untuk testing)
curl -X POST http://localhost:8000/api/v1/documents?sync=true \
-H "X-API-Key: your-secret-key-1" \
-F "file=@samples/pdf/example.pdf" \
| jq
5. Setup Reverse Proxy (Nginx)
Install Nginx:
sudo apt update
sudo apt install nginx certbot python3-certbot-nginx
Konfigurasi Nginx (/etc/nginx/sites-available/ocr-sprint):
upstream ocr_api {
server localhost:8000;
}
server {
listen 80;
server_name ocr.yourdomain.com;
client_max_body_size 30M; # Sesuaikan dengan BLOB_MAX_UPLOAD_MB
location / {
proxy_pass http://ocr_api;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeout untuk dokumen besar
proxy_read_timeout 300s;
proxy_connect_timeout 75s;
}
location /metrics {
# Restrict metrics endpoint
allow 10.0.0.0/8; # Internal network only
deny all;
proxy_pass http://ocr_api;
}
}
Enable site dan setup SSL:
# Enable site
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
# Setup SSL dengan Let's Encrypt
sudo certbot --nginx -d ocr.yourdomain.com
Deployment Manual (Tanpa Docker)
1. Install System Dependencies
# Ubuntu/Debian
sudo apt update
sudo apt install -y \
python3.11 python3.11-venv python3-pip \
libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \
libgomp1 libmagic1 \
redis-server postgresql-14
# Start services
sudo systemctl enable --now redis-server postgresql
2. Setup Database
# Create database dan user
sudo -u postgres psql << EOF
CREATE USER ocr WITH PASSWORD 'your-secure-password';
CREATE DATABASE ocr_sprint OWNER ocr;
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
EOF
3. Install Application
# Clone repository
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install --upgrade pip
pip install -e ".[ocr]"
# Copy dan edit .env
cp .env.example .env
nano .env
Update DATABASE_URL di .env:
DATABASE_URL=postgresql+psycopg://ocr:your-secure-password@localhost:5432/ocr_sprint
REDIS_URL=redis://localhost:6379/0
QUEUE_ENABLED=true
4. Run Database Migrations
alembic upgrade head
5. Setup Systemd Services
API Service (/etc/systemd/system/ocr-sprint-api.service):
[Unit]
Description=OCR Sprint API
After=network.target postgresql.service redis.service
[Service]
Type=simple
User=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin"
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000 --workers 4
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Worker Service (/etc/systemd/system/ocr-sprint-worker.service):
[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis.service
[Service]
Type=simple
User=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin"
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery -A ocr_sprint.worker.celery_app worker -l info --concurrency=2
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Enable dan start services:
sudo systemctl daemon-reload
sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker
sudo systemctl status ocr-sprint-api ocr-sprint-worker
Monitoring dan Maintenance
Monitoring Logs
# Docker deployment
docker compose logs -f api worker
# Manual deployment
sudo journalctl -u ocr-sprint-api -f
sudo journalctl -u ocr-sprint-worker -f
Prometheus Metrics
Metrics tersedia di endpoint /metrics:
curl http://localhost:8000/metrics
Key metrics:
ocr_documents_total: Total dokumen diprosesocr_processing_duration_seconds: Durasi processingocr_confidence_score: Distribusi confidence scorecelery_task_*: Celery worker metrics
Backup Database
# Docker deployment
docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
# Manual deployment
pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
Update Service
# Docker deployment
cd ocr-sprint-service
git pull
docker compose build
docker compose up -d
# Manual deployment
cd ocr-sprint-service
git pull
source .venv/bin/activate
pip install -e ".[ocr]"
alembic upgrade head
sudo systemctl restart ocr-sprint-api ocr-sprint-worker
Troubleshooting
Service tidak start
# Cek logs
docker compose logs api worker
# Cek health check
curl http://localhost:8000/api/v1/health
PaddleOCR model download gagal
# Download manual ke volume
docker compose exec api python -c "from paddleocr import PaddleOCR; PaddleOCR(use_angle_cls=True, lang='latin')"
Worker tidak memproses jobs
# Cek Redis connection
docker compose exec worker redis-cli -h redis ping
# Cek Celery worker status
docker compose exec worker celery -A ocr_sprint.worker.celery_app inspect active
Database migration error
# Cek current revision
docker compose exec api alembic current
# Force upgrade
docker compose exec api alembic upgrade head
Out of memory
# Kurangi worker concurrency di docker-compose.yml
# Ubah: --concurrency=1 (default) atau tambahkan memory limit
Security Checklist
- API_KEYS diset dengan nilai random yang kuat
- Firewall configured (hanya port 80/443 terbuka)
- SSL/TLS enabled via Nginx + Let's Encrypt
- Database password diganti dari default
/metricsendpoint restricted ke internal network- Regular backup database dan blob storage
- Log rotation configured
- OS security updates enabled
Performance Tuning
Untuk throughput tinggi:
-
Increase worker concurrency:
# docker-compose.yml command: ["celery", "-A", "ocr_sprint.worker.celery_app", "worker", "-l", "info", "--concurrency=4"] -
Scale workers horizontally:
docker compose up -d --scale worker=3 -
Enable GPU (jika tersedia):
# .env OCR_USE_GPU=true -
Tune Postgres:
-- Increase connection pool ALTER SYSTEM SET max_connections = 200; ALTER SYSTEM SET shared_buffers = '2GB';
Support
Untuk pertanyaan atau issues, hubungi tim development atau buat issue di repository.