Files
OCR-SPRIN-SERVICE/docs/DEPLOYMENT-EXISTING-STACK.md

17 KiB

Deployment OCR Sprint Service (Existing Stack)

Panduan deployment untuk server dengan Python 3.12.3, PostgreSQL 16.13, dan Redis 7.0.15 yang sudah terinstall.

Informasi Server Anda

  • OS: Ubuntu 24.04
  • Python: 3.12.3
  • PostgreSQL: 16.13
  • Redis: 7.0.15

Semua versi sudah kompatibel dan optimal untuk OCR Sprint Service!

Langkah 1: Install System Libraries untuk OpenCV & PaddleOCR

# Update package list
sudo apt update

# Install libraries yang dibutuhkan oleh OpenCV dan PaddleOCR
sudo apt install -y \
    libgl1 \
    libglib2.0-0 \
    libsm6 \
    libxext6 \
    libxrender1 \
    libgomp1 \
    libmagic1 \
    python3.12-venv \
    python3.12-dev \
    build-essential \
    git

Langkah 2: Setup PostgreSQL Database

# Login ke PostgreSQL
sudo -u postgres psql

Jalankan SQL commands berikut:

-- Create user dan database
CREATE USER ocr WITH PASSWORD '@Offroader123';
CREATE DATABASE ocr_sprint OWNER ocr;

-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;

-- Connect ke database untuk grant schema privileges
\c ocr_sprint

-- Grant schema privileges (PostgreSQL 15+)
GRANT ALL ON SCHEMA public TO ocr;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO ocr;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO ocr;

-- Verify
\l ocr_sprint
\du ocr

-- Exit
\q

Generate password yang aman:

# Generate random password
openssl rand -base64 32
+J33GdYQcWcfqXs169cmgPrQJpLFgybjoedr/tNb0d4=

Simpan password ini, akan digunakan di konfigurasi nanti.

Langkah 3: Verify Redis

# Check Redis status
sudo systemctl status redis-server

# Test connection
redis-cli ping
# Expected output: PONG

# Check Redis config (opsional)
redis-cli CONFIG GET maxmemory

Jika Redis belum running:

sudo systemctl enable redis-server
sudo systemctl start redis-server

Langkah 4: Create Application User

# Create dedicated user untuk aplikasi
sudo useradd -m -s /bin/bash ocr

# Create application directory
sudo mkdir -p /opt/ocr-sprint-service
sudo chown ocr:ocr /opt/ocr-sprint-service

Langkah 5: Clone dan Install Application

# Switch ke user ocr
sudo su - ocr

# Clone repository
cd /opt
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service

# Create virtual environment dengan Python 3.12
python3.12 -m venv .venv

# Activate virtual environment
source .venv/bin/activate

# Verify Python version di venv
python --version
# Expected: Python 3.12.3

# Upgrade pip
pip install --upgrade pip setuptools wheel

# Install application dengan OCR dependencies
# Ini akan download ~1.5GB PaddlePaddle wheels
pip install -e ".[ocr]"

# Verify installation
python -c "import paddleocr; print('PaddleOCR OK')"
python -c "import cv2; print('OpenCV OK')"
python -c "import fastapi; print('FastAPI OK')"

Langkah 6: Konfigurasi Application

# Masih sebagai user ocr
cd /opt/ocr-sprint-service

# Copy environment template
cp .env.example .env

# Edit konfigurasi
nano .env

Konfigurasi /opt/ocr-sprint-service/.env:

# ==== App ====
APP_ENV=prod
APP_HOST=0.0.0.0
APP_PORT=8000
APP_LOG_LEVEL=INFO

# ==== Storage ====
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
BLOB_MAX_UPLOAD_MB=25

# ==== OCR ====
OCR_LANG=latin
OCR_USE_GPU=false
OCR_MAX_IMAGE_SIDE=2200

# ==== Preprocessing ====
PREPROCESS_TARGET_DPI=300
PREPROCESS_DENOISE=true
PREPROCESS_DESKEW=true
PREPROCESS_DETECT_DOCUMENT=true
PREPROCESS_REMOVE_SHADOW=true
PREPROCESS_MIN_QUAD_AREA_FRACTION=0.20

# ==== Table Extraction ====
TABLES_ENABLED=true

# ==== Confidence ====
CONFIDENCE_AUTO_APPROVE=0.95
CONFIDENCE_NEEDS_REVIEW=0.85

# ==== LLM (Phase 5, optional - disable untuk sekarang) ====
LLM_ENABLED=false

# ==== Async Pipeline ====
QUEUE_ENABLED=true
REDIS_URL=redis://localhost:6379/0
CELERY_TASK_DEFAULT_QUEUE=ocr_sprint

# ==== Database ====
# Ganti 'your-password-here' dengan password yang Anda generate di Langkah 2
DATABASE_URL=postgresql+psycopg://ocr:your-password-here@localhost:5432/ocr_sprint
DATABASE_ECHO=false

# ==== Auth (WAJIB untuk production!) ====
# Generate dengan: openssl rand -hex 32
API_KEYS=paste-api-key-1-here,paste-api-key-2-here
API_KEY_HEADER=X-API-Key

Generate API keys:

# Generate 2 API keys
echo "API Key 1: $(openssl rand -hex 32)"
echo "API Key 2: $(openssl rand -hex 32)"

Copy output dan paste ke API_KEYS di file .env.

Create storage directories:

mkdir -p /opt/ocr-sprint-service/storage/blobs
chmod 755 /opt/ocr-sprint-service/storage

Langkah 7: Run Database Migrations

# Masih sebagai user ocr, dengan venv activated
cd /opt/ocr-sprint-service
source .venv/bin/activate

# Run migrations
alembic upgrade head

# Verify - should show current revision
alembic current

# Expected output: (head) atau revision number

Langkah 8: Test Manual Run

# Masih sebagai user ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate

# Test API server
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000

Di terminal lain (sebagai user ubuntu):

# Test health check
curl http://localhost:8000/api/v1/health

# Expected: {"status":"ok","version":"0.1.0"}

# Test dengan sample file (jika ada)
curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \
  -H "X-API-Key: your-api-key-here" \
  -F "file=@/path/to/test.pdf"

Jika berhasil, stop server dengan Ctrl+C.

Langkah 9: Setup Systemd Services

# Exit dari user ocr
exit

# Kembali sebagai user ubuntu dengan sudo

Create API Service

sudo nano /etc/systemd/system/ocr-sprint-api.service

Content:

[Unit]
Description=OCR Sprint API Service
After=network.target postgresql.service redis-server.service
Wants=postgresql.service redis-server.service

[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service

# Environment
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env

# Start command - 4 workers untuk production
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
    ocr_sprint.main:app \
    --host 0.0.0.0 \
    --port 8000 \
    --workers 4 \
    --log-level info

# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=0

# Resource limits
LimitNOFILE=65536

# Security
NoNewPrivileges=true
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Create Celery Worker Service

sudo nano /etc/systemd/system/ocr-sprint-worker.service

Content:

[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis-server.service ocr-sprint-api.service
Wants=postgresql.service redis-server.service

[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service

# Environment
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env

# Start command - concurrency 2 untuk CPU dengan 4 cores
# Sesuaikan dengan jumlah CPU cores server Anda
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
    -A ocr_sprint.worker.celery_app \
    worker \
    --loglevel=info \
    --concurrency=2 \
    --max-tasks-per-child=100

# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=0

# Resource limits
LimitNOFILE=65536

# Security
NoNewPrivileges=true
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Enable dan Start Services

# Reload systemd
sudo systemctl daemon-reload

# Enable services (auto-start on boot)
sudo systemctl enable ocr-sprint-api
sudo systemctl enable ocr-sprint-worker

# Start services
sudo systemctl start ocr-sprint-api
sudo systemctl start ocr-sprint-worker

# Check status
sudo systemctl status ocr-sprint-api
sudo systemctl status ocr-sprint-worker

Expected output: active (running) dengan warna hijau.

View Logs

# API logs (real-time)
sudo journalctl -u ocr-sprint-api -f

# Worker logs (real-time)
sudo journalctl -u ocr-sprint-worker -f

# Last 50 lines
sudo journalctl -u ocr-sprint-api -n 50
sudo journalctl -u ocr-sprint-worker -n 50

Langkah 10: Install dan Setup Nginx

# Install Nginx dan Certbot
sudo apt install -y nginx certbot python3-certbot-nginx

# Check Nginx status
sudo systemctl status nginx

Create Nginx Configuration

sudo nano /etc/nginx/sites-available/ocr-sprint

Content (ganti ocr.yourdomain.com dengan domain Anda):

# Upstream
upstream ocr_api {
    server 127.0.0.1:8000;
    keepalive 32;
}

# Rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;

server {
    listen 80;
    server_name ocr.yourdomain.com;

    # Max upload size
    client_max_body_size 30M;
    client_body_buffer_size 128k;

    # Timeouts
    proxy_connect_timeout 300s;
    proxy_send_timeout 300s;
    proxy_read_timeout 300s;
    send_timeout 300s;

    # Logging
    access_log /var/log/nginx/ocr-sprint-access.log;
    error_log /var/log/nginx/ocr-sprint-error.log;

    # API endpoints
    location /api/ {
        limit_req zone=api_limit burst=20 nodelay;

        proxy_pass http://ocr_api;
        proxy_http_version 1.1;
        
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Connection "";
        
        proxy_buffering off;
    }

    # Health check
    location /api/v1/health {
        proxy_pass http://ocr_api;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        access_log off;
    }

    # Metrics (restrict access)
    location /metrics {
        allow 127.0.0.1;
        allow 10.0.0.0/8;
        deny all;

        proxy_pass http://ocr_api;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
    }

    # API docs
    location /docs {
        proxy_pass http://ocr_api;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
    }

    location /redoc {
        proxy_pass http://ocr_api;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
    }
}

Enable Site

# Test konfigurasi
sudo nginx -t

# Enable site
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/

# Reload Nginx
sudo systemctl reload nginx

Setup SSL (jika punya domain)

# Obtain certificate
sudo certbot --nginx -d ocr.yourdomain.com

# Test auto-renewal
sudo certbot renew --dry-run

Langkah 11: Setup Firewall

# Check UFW status
sudo ufw status

# Allow SSH (PENTING!)
sudo ufw allow 22/tcp

# Allow HTTP dan HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Enable firewall (jika belum)
sudo ufw enable

# Verify
sudo ufw status numbered

Langkah 12: Verifikasi Final

Test dari Server

# Health check
curl http://localhost:8000/api/v1/health

# Test async endpoint
curl -X POST http://localhost:8000/api/v1/documents \
  -H "X-API-Key: your-api-key-here" \
  -F "file=@/path/to/test.pdf"

# Expected: {"job_id":"...","status":"pending",...}

# Check job status
curl -H "X-API-Key: your-api-key-here" \
  http://localhost:8000/api/v1/documents/JOB_ID_HERE

Test via Domain (jika sudah setup SSL)

curl https://ocr.yourdomain.com/api/v1/health

Check Services

# All services should be active
sudo systemctl status ocr-sprint-api
sudo systemctl status ocr-sprint-worker
sudo systemctl status postgresql
sudo systemctl status redis-server
sudo systemctl status nginx

Monitoring

View Logs

# API logs
sudo journalctl -u ocr-sprint-api -f

# Worker logs
sudo journalctl -u ocr-sprint-worker -f

# Nginx access logs
sudo tail -f /var/log/nginx/ocr-sprint-access.log

# Nginx error logs
sudo tail -f /var/log/nginx/ocr-sprint-error.log

Prometheus Metrics

# View metrics
curl http://localhost:8000/metrics

# Key metrics:
# - ocr_documents_total
# - ocr_processing_duration_seconds
# - ocr_confidence_score

Maintenance

Restart Services

sudo systemctl restart ocr-sprint-api
sudo systemctl restart ocr-sprint-worker

Update Application

# Switch ke user ocr
sudo su - ocr
cd /opt/ocr-sprint-service

# Pull latest code
git pull

# Activate venv
source .venv/bin/activate

# Update dependencies
pip install -e ".[ocr]"

# Run migrations
alembic upgrade head

# Exit
exit

# Restart services
sudo systemctl restart ocr-sprint-api
sudo systemctl restart ocr-sprint-worker

# Check logs
sudo journalctl -u ocr-sprint-api -n 50

Database Backup

# Create backup directory
sudo mkdir -p /opt/ocr-sprint-service/backups
sudo chown ocr:ocr /opt/ocr-sprint-service/backups

# Manual backup
sudo -u ocr pg_dump -h localhost -U ocr ocr_sprint | gzip > /opt/ocr-sprint-service/backups/backup_$(date +%Y%m%d_%H%M%S).sql.gz

Setup automated backup:

# Create backup script
sudo nano /opt/ocr-sprint-service/backup.sh
#!/bin/bash
BACKUP_DIR="/opt/ocr-sprint-service/backups"
DATE=$(date +%Y%m%d_%H%M%S)

mkdir -p $BACKUP_DIR

# Backup database
PGPASSWORD='your-db-password' pg_dump -h localhost -U ocr ocr_sprint | gzip > $BACKUP_DIR/db_$DATE.sql.gz

# Keep only last 7 days
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete

echo "Backup completed: $DATE"
# Make executable
sudo chmod +x /opt/ocr-sprint-service/backup.sh
sudo chown ocr:ocr /opt/ocr-sprint-service/backup.sh

# Setup cron (daily at 2 AM)
sudo crontab -e -u ocr

# Add line:
0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1

Troubleshooting

Service tidak start

# Check detailed logs
sudo journalctl -u ocr-sprint-api -n 100 --no-pager
sudo journalctl -u ocr-sprint-worker -n 100 --no-pager

# Check file permissions
ls -la /opt/ocr-sprint-service
ls -la /opt/ocr-sprint-service/storage

# Test manual run
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000

Database connection error

# Test connection
sudo -u ocr psql -h localhost -U ocr -d ocr_sprint

# Check PostgreSQL status
sudo systemctl status postgresql

# Check PostgreSQL logs
sudo journalctl -u postgresql -n 50

Redis connection error

# Test Redis
redis-cli ping

# Check Redis status
sudo systemctl status redis-server

# Check Redis logs
sudo journalctl -u redis-server -n 50

Worker tidak memproses jobs

# Check Celery worker status
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
celery -A ocr_sprint.worker.celery_app inspect active
celery -A ocr_sprint.worker.celery_app inspect stats

# Check Redis queue
redis-cli LLEN ocr_sprint

PaddleOCR error

# Re-download models
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate

python << EOF
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='latin')
print("Models downloaded successfully")
EOF

Performance Tuning

Check CPU cores

nproc

Adjust worker concurrency

# Edit worker service
sudo nano /etc/systemd/system/ocr-sprint-worker.service

# Untuk 4 cores: --concurrency=2
# Untuk 8 cores: --concurrency=4
# Untuk 16 cores: --concurrency=8

# Reload dan restart
sudo systemctl daemon-reload
sudo systemctl restart ocr-sprint-worker

PostgreSQL 16 Tuning

sudo nano /etc/postgresql/16/main/postgresql.conf

Recommended settings (sesuaikan dengan RAM server):

# Untuk 8GB RAM:
shared_buffers = 2GB
effective_cache_size = 6GB
maintenance_work_mem = 512MB
work_mem = 8MB

# Untuk 16GB RAM:
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
work_mem = 10MB

# General
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
sudo systemctl restart postgresql

Security Checklist

  • API keys set dengan nilai random yang kuat
  • Database password diganti dari default
  • Firewall enabled (UFW)
  • SSL/TLS enabled (jika punya domain)
  • /metrics endpoint restricted
  • PostgreSQL hanya listen di localhost
  • Redis hanya listen di localhost
  • Backup automated (cron job)
  • OS security updates enabled

Next Steps

  1. Setup monitoring - Install Prometheus + Grafana (opsional)
  2. Setup alerting - Email/Slack notification untuk errors
  3. Load testing - Test dengan volume dokumen production
  4. Backup verification - Test restore dari backup
  5. Documentation - Dokumentasi API keys untuk tim

Support

Untuk pertanyaan atau issues, hubungi tim development.