17 KiB
Deployment OCR Sprint Service (Existing Stack)
Panduan deployment untuk server dengan Python 3.12.3, PostgreSQL 16.13, dan Redis 7.0.15 yang sudah terinstall.
Informasi Server Anda
- OS: Ubuntu 24.04
- Python: 3.12.3 ✅
- PostgreSQL: 16.13 ✅
- Redis: 7.0.15 ✅
Semua versi sudah kompatibel dan optimal untuk OCR Sprint Service!
Langkah 1: Install System Libraries untuk OpenCV & PaddleOCR
# Update package list
sudo apt update
# Install libraries yang dibutuhkan oleh OpenCV dan PaddleOCR
sudo apt install -y \
libgl1 \
libglib2.0-0 \
libsm6 \
libxext6 \
libxrender1 \
libgomp1 \
libmagic1 \
python3.12-venv \
python3.12-dev \
build-essential \
git
Langkah 2: Setup PostgreSQL Database
# Login ke PostgreSQL
sudo -u postgres psql
Jalankan SQL commands berikut:
-- Create user dan database
CREATE USER ocr WITH PASSWORD '@Offroader123';
CREATE DATABASE ocr_sprint OWNER ocr;
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
-- Connect ke database untuk grant schema privileges
\c ocr_sprint
-- Grant schema privileges (PostgreSQL 15+)
GRANT ALL ON SCHEMA public TO ocr;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO ocr;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO ocr;
-- Verify
\l ocr_sprint
\du ocr
-- Exit
\q
Generate password yang aman:
# Generate random password
openssl rand -base64 32
+J33GdYQcWcfqXs169cmgPrQJpLFgybjoedr/tNb0d4=
Simpan password ini, akan digunakan di konfigurasi nanti.
Langkah 3: Verify Redis
# Check Redis status
sudo systemctl status redis-server
# Test connection
redis-cli ping
# Expected output: PONG
# Check Redis config (opsional)
redis-cli CONFIG GET maxmemory
Jika Redis belum running:
sudo systemctl enable redis-server
sudo systemctl start redis-server
Langkah 4: Create Application User
# Create dedicated user untuk aplikasi
sudo useradd -m -s /bin/bash ocr
# Create application directory
sudo mkdir -p /opt/ocr-sprint-service
sudo chown ocr:ocr /opt/ocr-sprint-service
Langkah 5: Clone dan Install Application
# Switch ke user ocr
sudo su - ocr
# Clone repository
cd /opt
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
# Create virtual environment dengan Python 3.12
python3.12 -m venv .venv
# Activate virtual environment
source .venv/bin/activate
# Verify Python version di venv
python --version
# Expected: Python 3.12.3
# Upgrade pip
pip install --upgrade pip setuptools wheel
# Install application dengan OCR dependencies
# Ini akan download ~1.5GB PaddlePaddle wheels
pip install -e ".[ocr]"
# Verify installation
python -c "import paddleocr; print('PaddleOCR OK')"
python -c "import cv2; print('OpenCV OK')"
python -c "import fastapi; print('FastAPI OK')"
Langkah 6: Konfigurasi Application
# Masih sebagai user ocr
cd /opt/ocr-sprint-service
# Copy environment template
cp .env.example .env
# Edit konfigurasi
nano .env
Konfigurasi /opt/ocr-sprint-service/.env:
# ==== App ====
APP_ENV=prod
APP_HOST=0.0.0.0
APP_PORT=8000
APP_LOG_LEVEL=INFO
# ==== Storage ====
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
BLOB_MAX_UPLOAD_MB=25
# ==== OCR ====
OCR_LANG=latin
OCR_USE_GPU=false
OCR_MAX_IMAGE_SIDE=2200
# ==== Preprocessing ====
PREPROCESS_TARGET_DPI=300
PREPROCESS_DENOISE=true
PREPROCESS_DESKEW=true
PREPROCESS_DETECT_DOCUMENT=true
PREPROCESS_REMOVE_SHADOW=true
PREPROCESS_MIN_QUAD_AREA_FRACTION=0.20
# ==== Table Extraction ====
TABLES_ENABLED=true
# ==== Confidence ====
CONFIDENCE_AUTO_APPROVE=0.95
CONFIDENCE_NEEDS_REVIEW=0.85
# ==== LLM (Phase 5, optional - disable untuk sekarang) ====
LLM_ENABLED=false
# ==== Async Pipeline ====
QUEUE_ENABLED=true
REDIS_URL=redis://localhost:6379/0
CELERY_TASK_DEFAULT_QUEUE=ocr_sprint
# ==== Database ====
# Ganti 'your-password-here' dengan password yang Anda generate di Langkah 2
DATABASE_URL=postgresql+psycopg://ocr:your-password-here@localhost:5432/ocr_sprint
DATABASE_ECHO=false
# ==== Auth (WAJIB untuk production!) ====
# Generate dengan: openssl rand -hex 32
API_KEYS=paste-api-key-1-here,paste-api-key-2-here
API_KEY_HEADER=X-API-Key
Generate API keys:
# Generate 2 API keys
echo "API Key 1: $(openssl rand -hex 32)"
echo "API Key 2: $(openssl rand -hex 32)"
Copy output dan paste ke API_KEYS di file .env.
Create storage directories:
mkdir -p /opt/ocr-sprint-service/storage/blobs
chmod 755 /opt/ocr-sprint-service/storage
Langkah 7: Run Database Migrations
# Masih sebagai user ocr, dengan venv activated
cd /opt/ocr-sprint-service
source .venv/bin/activate
# Run migrations
alembic upgrade head
# Verify - should show current revision
alembic current
# Expected output: (head) atau revision number
Langkah 8: Test Manual Run
# Masih sebagai user ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
# Test API server
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
Di terminal lain (sebagai user ubuntu):
# Test health check
curl http://localhost:8000/api/v1/health
# Expected: {"status":"ok","version":"0.1.0"}
# Test dengan sample file (jika ada)
curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \
-H "X-API-Key: your-api-key-here" \
-F "file=@/path/to/test.pdf"
Jika berhasil, stop server dengan Ctrl+C.
Langkah 9: Setup Systemd Services
# Exit dari user ocr
exit
# Kembali sebagai user ubuntu dengan sudo
Create API Service
sudo nano /etc/systemd/system/ocr-sprint-api.service
Content:
[Unit]
Description=OCR Sprint API Service
After=network.target postgresql.service redis-server.service
Wants=postgresql.service redis-server.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
# Environment
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
# Start command - 4 workers untuk production
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
ocr_sprint.main:app \
--host 0.0.0.0 \
--port 8000 \
--workers 4 \
--log-level info
# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=0
# Resource limits
LimitNOFILE=65536
# Security
NoNewPrivileges=true
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Create Celery Worker Service
sudo nano /etc/systemd/system/ocr-sprint-worker.service
Content:
[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis-server.service ocr-sprint-api.service
Wants=postgresql.service redis-server.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
# Environment
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
# Start command - concurrency 2 untuk CPU dengan 4 cores
# Sesuaikan dengan jumlah CPU cores server Anda
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
-A ocr_sprint.worker.celery_app \
worker \
--loglevel=info \
--concurrency=2 \
--max-tasks-per-child=100
# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=0
# Resource limits
LimitNOFILE=65536
# Security
NoNewPrivileges=true
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Enable dan Start Services
# Reload systemd
sudo systemctl daemon-reload
# Enable services (auto-start on boot)
sudo systemctl enable ocr-sprint-api
sudo systemctl enable ocr-sprint-worker
# Start services
sudo systemctl start ocr-sprint-api
sudo systemctl start ocr-sprint-worker
# Check status
sudo systemctl status ocr-sprint-api
sudo systemctl status ocr-sprint-worker
Expected output: active (running) dengan warna hijau.
View Logs
# API logs (real-time)
sudo journalctl -u ocr-sprint-api -f
# Worker logs (real-time)
sudo journalctl -u ocr-sprint-worker -f
# Last 50 lines
sudo journalctl -u ocr-sprint-api -n 50
sudo journalctl -u ocr-sprint-worker -n 50
Langkah 10: Install dan Setup Nginx
# Install Nginx dan Certbot
sudo apt install -y nginx certbot python3-certbot-nginx
# Check Nginx status
sudo systemctl status nginx
Create Nginx Configuration
sudo nano /etc/nginx/sites-available/ocr-sprint
Content (ganti ocr.yourdomain.com dengan domain Anda):
# Upstream
upstream ocr_api {
server 127.0.0.1:8000;
keepalive 32;
}
# Rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
server {
listen 80;
server_name ocr.yourdomain.com;
# Max upload size
client_max_body_size 30M;
client_body_buffer_size 128k;
# Timeouts
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
send_timeout 300s;
# Logging
access_log /var/log/nginx/ocr-sprint-access.log;
error_log /var/log/nginx/ocr-sprint-error.log;
# API endpoints
location /api/ {
limit_req zone=api_limit burst=20 nodelay;
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Connection "";
proxy_buffering off;
}
# Health check
location /api/v1/health {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
access_log off;
}
# Metrics (restrict access)
location /metrics {
allow 127.0.0.1;
allow 10.0.0.0/8;
deny all;
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
# API docs
location /docs {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
location /redoc {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
}
Enable Site
# Test konfigurasi
sudo nginx -t
# Enable site
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
# Reload Nginx
sudo systemctl reload nginx
Setup SSL (jika punya domain)
# Obtain certificate
sudo certbot --nginx -d ocr.yourdomain.com
# Test auto-renewal
sudo certbot renew --dry-run
Langkah 11: Setup Firewall
# Check UFW status
sudo ufw status
# Allow SSH (PENTING!)
sudo ufw allow 22/tcp
# Allow HTTP dan HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Enable firewall (jika belum)
sudo ufw enable
# Verify
sudo ufw status numbered
Langkah 12: Verifikasi Final
Test dari Server
# Health check
curl http://localhost:8000/api/v1/health
# Test async endpoint
curl -X POST http://localhost:8000/api/v1/documents \
-H "X-API-Key: your-api-key-here" \
-F "file=@/path/to/test.pdf"
# Expected: {"job_id":"...","status":"pending",...}
# Check job status
curl -H "X-API-Key: your-api-key-here" \
http://localhost:8000/api/v1/documents/JOB_ID_HERE
Test via Domain (jika sudah setup SSL)
curl https://ocr.yourdomain.com/api/v1/health
Check Services
# All services should be active
sudo systemctl status ocr-sprint-api
sudo systemctl status ocr-sprint-worker
sudo systemctl status postgresql
sudo systemctl status redis-server
sudo systemctl status nginx
Monitoring
View Logs
# API logs
sudo journalctl -u ocr-sprint-api -f
# Worker logs
sudo journalctl -u ocr-sprint-worker -f
# Nginx access logs
sudo tail -f /var/log/nginx/ocr-sprint-access.log
# Nginx error logs
sudo tail -f /var/log/nginx/ocr-sprint-error.log
Prometheus Metrics
# View metrics
curl http://localhost:8000/metrics
# Key metrics:
# - ocr_documents_total
# - ocr_processing_duration_seconds
# - ocr_confidence_score
Maintenance
Restart Services
sudo systemctl restart ocr-sprint-api
sudo systemctl restart ocr-sprint-worker
Update Application
# Switch ke user ocr
sudo su - ocr
cd /opt/ocr-sprint-service
# Pull latest code
git pull
# Activate venv
source .venv/bin/activate
# Update dependencies
pip install -e ".[ocr]"
# Run migrations
alembic upgrade head
# Exit
exit
# Restart services
sudo systemctl restart ocr-sprint-api
sudo systemctl restart ocr-sprint-worker
# Check logs
sudo journalctl -u ocr-sprint-api -n 50
Database Backup
# Create backup directory
sudo mkdir -p /opt/ocr-sprint-service/backups
sudo chown ocr:ocr /opt/ocr-sprint-service/backups
# Manual backup
sudo -u ocr pg_dump -h localhost -U ocr ocr_sprint | gzip > /opt/ocr-sprint-service/backups/backup_$(date +%Y%m%d_%H%M%S).sql.gz
Setup automated backup:
# Create backup script
sudo nano /opt/ocr-sprint-service/backup.sh
#!/bin/bash
BACKUP_DIR="/opt/ocr-sprint-service/backups"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR
# Backup database
PGPASSWORD='your-db-password' pg_dump -h localhost -U ocr ocr_sprint | gzip > $BACKUP_DIR/db_$DATE.sql.gz
# Keep only last 7 days
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete
echo "Backup completed: $DATE"
# Make executable
sudo chmod +x /opt/ocr-sprint-service/backup.sh
sudo chown ocr:ocr /opt/ocr-sprint-service/backup.sh
# Setup cron (daily at 2 AM)
sudo crontab -e -u ocr
# Add line:
0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1
Troubleshooting
Service tidak start
# Check detailed logs
sudo journalctl -u ocr-sprint-api -n 100 --no-pager
sudo journalctl -u ocr-sprint-worker -n 100 --no-pager
# Check file permissions
ls -la /opt/ocr-sprint-service
ls -la /opt/ocr-sprint-service/storage
# Test manual run
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
Database connection error
# Test connection
sudo -u ocr psql -h localhost -U ocr -d ocr_sprint
# Check PostgreSQL status
sudo systemctl status postgresql
# Check PostgreSQL logs
sudo journalctl -u postgresql -n 50
Redis connection error
# Test Redis
redis-cli ping
# Check Redis status
sudo systemctl status redis-server
# Check Redis logs
sudo journalctl -u redis-server -n 50
Worker tidak memproses jobs
# Check Celery worker status
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
celery -A ocr_sprint.worker.celery_app inspect active
celery -A ocr_sprint.worker.celery_app inspect stats
# Check Redis queue
redis-cli LLEN ocr_sprint
PaddleOCR error
# Re-download models
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
python << EOF
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='latin')
print("Models downloaded successfully")
EOF
Performance Tuning
Check CPU cores
nproc
Adjust worker concurrency
# Edit worker service
sudo nano /etc/systemd/system/ocr-sprint-worker.service
# Untuk 4 cores: --concurrency=2
# Untuk 8 cores: --concurrency=4
# Untuk 16 cores: --concurrency=8
# Reload dan restart
sudo systemctl daemon-reload
sudo systemctl restart ocr-sprint-worker
PostgreSQL 16 Tuning
sudo nano /etc/postgresql/16/main/postgresql.conf
Recommended settings (sesuaikan dengan RAM server):
# Untuk 8GB RAM:
shared_buffers = 2GB
effective_cache_size = 6GB
maintenance_work_mem = 512MB
work_mem = 8MB
# Untuk 16GB RAM:
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
work_mem = 10MB
# General
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
sudo systemctl restart postgresql
Security Checklist
- API keys set dengan nilai random yang kuat
- Database password diganti dari default
- Firewall enabled (UFW)
- SSL/TLS enabled (jika punya domain)
/metricsendpoint restricted- PostgreSQL hanya listen di localhost
- Redis hanya listen di localhost
- Backup automated (cron job)
- OS security updates enabled
Next Steps
- Setup monitoring - Install Prometheus + Grafana (opsional)
- Setup alerting - Email/Slack notification untuk errors
- Load testing - Test dengan volume dokumen production
- Backup verification - Test restore dari backup
- Documentation - Dokumentasi API keys untuk tim
Support
Untuk pertanyaan atau issues, hubungi tim development.