# Panduan Deployment OCR Sprint Service > Dokumen ini adalah panduan langkah-langkah deployment **ocr-sprint-service** ke server production. Disusun berdasarkan kondisi kodingan aktual per April 2026 (Phase 1–4 selesai). --- ## Daftar Isi 1. [Gambaran Arsitektur](#1-gambaran-arsitektur) 2. [Prasyarat Server](#2-prasyarat-server) 3. [Opsi A — Docker Compose (Recommended)](#3-opsi-a--docker-compose-recommended) 4. [Opsi B — Manual (Tanpa Docker)](#4-opsi-b--manual-tanpa-docker) 5. [Konfigurasi Environment Production](#5-konfigurasi-environment-production) 6. [Reverse Proxy & SSL (Nginx)](#6-reverse-proxy--ssl-nginx) 7. [Firewall](#7-firewall) 8. [Verifikasi Deployment](#8-verifikasi-deployment) 9. [Monitoring & Maintenance](#9-monitoring--maintenance) 10. [Troubleshooting](#10-troubleshooting) 11. [Security Checklist](#11-security-checklist) --- ## 1. Gambaran Arsitektur ``` ┌──────────┐ ┌──────────────┐ ┌───────┐ │ Client │────▶│ Nginx (SSL) │────▶│ API │──▶ PaddleOCR └──────────┘ └──────────────┘ │ :8000 │ Pipeline └───┬───┘ │ async job ┌─────▼─────┐ │ Redis │ │ :6379 │ └─────┬─────┘ ┌─────▼──────┐ │ Worker │──▶ PaddleOCR │ (Celery) │ Pipeline └─────┬──────┘ ┌─────▼──────┐ │ PostgreSQL │ │ :5432 │ └────────────┘ ``` **4 services** yang harus berjalan: | Service | Fungsi | |---------|--------| | **API** (FastAPI + Uvicorn) | Menerima upload dokumen, serve hasil OCR | | **Worker** (Celery) | Async OCR processing di background | | **Redis** | Message broker untuk job queue | | **PostgreSQL** | Menyimpan job state & hasil ekstraksi | Blob storage menggunakan **local filesystem** (belum S3/MinIO). --- ## 2. Prasyarat Server ### Spesifikasi Minimum | Resource | Minimum | Recommended | |----------|---------|-------------| | OS | Ubuntu 20.04+ / Debian 11+ | Ubuntu 22.04+ | | CPU | 4 cores | 8 cores | | RAM | 8 GB | 16 GB | | Storage | 50 GB free | 100 GB free | | Python | 3.10–3.12 | 3.11 atau 3.12 | | Network | Port 8000 (internal) | + Port 80/443 (Nginx) | ### Kebutuhan Disk - ~1.5 GB — PaddlePaddle wheels - ~200 MB — PaddleOCR model downloads (otomatis saat pertama jalan) - Sisanya — blob storage dokumen yang diupload ### Software yang Dibutuhkan - **Docker Compose** — untuk Opsi A - **Python 3.10–3.12 + PostgreSQL + Redis** — untuk Opsi B - **Git** — kedua opsi - **Nginx** (opsional) — reverse proxy + SSL --- ## 3. Opsi A — Docker Compose (Recommended) > Cara paling cepat. Semua service (API, Worker, Redis, Postgres) berjalan dalam container. ### 3.1 Login & Clone ```bash ssh user@your-server.com git clone https://github.com/Adriankf59/ocr-sprint-service.git cd ocr-sprint-service ``` ### 3.2 Konfigurasi .env ```bash cp .env.example .env nano .env ``` Lihat [Bagian 5](#5-konfigurasi-environment-production) untuk detail konfigurasi production. > [!IMPORTANT] > Untuk Docker Compose, **jangan ubah** `DATABASE_URL` dan `REDIS_URL` — sudah dioverride oleh `docker-compose.yml` via environment variables di masing-masing container. ### 3.3 Build & Start ```bash # Build image (~5–10 menit pertama kali) docker compose build # Start semua services docker compose up -d # Cek logs docker compose logs -f api worker ``` Container `api` akan otomatis menjalankan `alembic upgrade head` sebelum start server (lihat `command` di `docker-compose.yml`). ### 3.4 First-Run Model Download Request pertama akan trigger download model PaddleOCR (~200 MB) ke Docker volume `paddle-models`. Tunggu hingga selesai sebelum test. ```bash # Monitor download di logs docker compose logs -f api ``` ### 3.5 Verifikasi ```bash curl http://localhost:8000/api/v1/health # Expected: {"status":"ok","version":"0.1.0"} ``` ### 3.6 Update Service (Setelah Ada Perubahan Kode) ```bash cd ocr-sprint-service git pull docker compose build docker compose up -d ``` --- ## 4. Opsi B — Manual (Tanpa Docker) > Untuk server yang sudah punya Python, PostgreSQL, dan Redis terinstall. ### 4.1 Install System Libraries ```bash sudo apt update && sudo apt upgrade -y # Libraries untuk OpenCV & PaddleOCR sudo apt install -y \ python3.11 python3.11-venv python3.11-dev \ libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \ libgomp1 libmagic1 \ build-essential git curl # Install Redis & PostgreSQL (jika belum ada) sudo apt install -y redis-server postgresql postgresql-contrib sudo systemctl enable --now redis-server postgresql ``` > [!NOTE] > Jika server sudah punya Python 3.12, gunakan `python3.12` di semua perintah selanjutnya. ### 4.2 Setup Database ```bash sudo -u postgres psql ``` ```sql CREATE USER ocr WITH PASSWORD 'ganti-password-kuat'; CREATE DATABASE ocr_sprint OWNER ocr; GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr; \c ocr_sprint GRANT ALL ON SCHEMA public TO ocr; \q ``` ### 4.3 Create Application User & Directory ```bash sudo useradd -m -s /bin/bash ocr sudo mkdir -p /opt/ocr-sprint-service sudo chown ocr:ocr /opt/ocr-sprint-service ``` ### 4.4 Clone & Install ```bash sudo su - ocr cd /opt git clone https://github.com/Adriankf59/ocr-sprint-service.git cd ocr-sprint-service # Create virtual environment python3.11 -m venv .venv source .venv/bin/activate # Install dependencies + OCR runtime (~1.5 GB download) pip install --upgrade pip setuptools wheel pip install -e ".[ocr]" # Verify python -c "import paddleocr; print('PaddleOCR OK')" python -c "import fastapi; print('FastAPI OK')" ``` ### 4.5 Konfigurasi .env ```bash cp .env.example .env nano .env ``` **Wajib diubah untuk manual deployment:** ```bash APP_ENV=prod DATABASE_URL=postgresql+psycopg://ocr:ganti-password-kuat@localhost:5432/ocr_sprint REDIS_URL=redis://localhost:6379/0 QUEUE_ENABLED=true API_KEYS=your-generated-api-key STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs ``` ```bash # Create storage directories mkdir -p /opt/ocr-sprint-service/storage/blobs ``` ### 4.6 Run Database Migrations ```bash source .venv/bin/activate alembic upgrade head alembic current # verify ``` ### 4.7 Test Manual ```bash uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000 # Di terminal lain: curl http://localhost:8000/api/v1/health # Ctrl+C untuk stop ``` ### 4.8 Setup Systemd Services **API Service** — `/etc/systemd/system/ocr-sprint-api.service`: ```ini [Unit] Description=OCR Sprint API Service After=network.target postgresql.service redis-server.service [Service] Type=simple User=ocr Group=ocr WorkingDirectory=/opt/ocr-sprint-service Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin" EnvironmentFile=/opt/ocr-sprint-service/.env ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \ ocr_sprint.main:app \ --host 0.0.0.0 --port 8000 --workers 4 --log-level info Restart=always RestartSec=10 LimitNOFILE=65536 NoNewPrivileges=true [Install] WantedBy=multi-user.target ``` **Worker Service** — `/etc/systemd/system/ocr-sprint-worker.service`: ```ini [Unit] Description=OCR Sprint Celery Worker After=network.target postgresql.service redis-server.service [Service] Type=simple User=ocr Group=ocr WorkingDirectory=/opt/ocr-sprint-service Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin" EnvironmentFile=/opt/ocr-sprint-service/.env ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \ -A ocr_sprint.worker.celery_app worker \ --loglevel=info --concurrency=2 --max-tasks-per-child=100 Restart=always RestartSec=10 LimitNOFILE=65536 NoNewPrivileges=true [Install] WantedBy=multi-user.target ``` **Enable & Start:** ```bash # Keluar dari user ocr dulu exit sudo systemctl daemon-reload sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker sudo systemctl status ocr-sprint-api ocr-sprint-worker ``` ### 4.9 Update Service (Manual) ```bash sudo su - ocr cd /opt/ocr-sprint-service git pull source .venv/bin/activate pip install -e ".[ocr]" alembic upgrade head exit sudo systemctl restart ocr-sprint-api ocr-sprint-worker ``` --- ## 5. Konfigurasi Environment Production Berikut konfigurasi `.env` yang **wajib diubah** dari default untuk production: | Variable | Default | Production | Keterangan | |----------|---------|------------|------------| | `APP_ENV` | `local` | `prod` | Mode environment | | `API_KEYS` | *(kosong)* | `key1,key2` | **WAJIB!** Auth disabled jika kosong | | `QUEUE_ENABLED` | `false` | `true` | Aktifkan async processing | | `DATABASE_URL` | `sqlite:///...` | `postgresql+psycopg://...` | Docker: otomatis di-override | | `REDIS_URL` | `redis://localhost:6379/0` | Sesuaikan | Docker: otomatis di-override | | `OCR_USE_GPU` | `false` | `true` jika ada GPU | Mode GPU butuh NVIDIA driver | | `TABLES_ENABLED` | `true` | `true` | Ekstraksi tabel personel | **Generate API Key:** ```bash openssl rand -hex 32 ``` > [!WARNING] > Jangan pernah deploy ke production tanpa mengisi `API_KEYS`. Jika kosong, semua endpoint terbuka tanpa autentikasi. --- ## 6. Reverse Proxy & SSL (Nginx) ### Install ```bash sudo apt install -y nginx certbot python3-certbot-nginx ``` ### Konfigurasi — `/etc/nginx/sites-available/ocr-sprint` ```nginx upstream ocr_api { server 127.0.0.1:8000; keepalive 32; } server { listen 80; server_name ocr.yourdomain.com; client_max_body_size 30M; proxy_connect_timeout 300s; proxy_read_timeout 300s; location / { proxy_pass http://ocr_api; proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } location /metrics { allow 127.0.0.1; allow 10.0.0.0/8; deny all; proxy_pass http://ocr_api; } } ``` ### Enable & SSL ```bash sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/ sudo nginx -t sudo systemctl reload nginx # SSL sudo certbot --nginx -d ocr.yourdomain.com ``` --- ## 7. Firewall ```bash sudo ufw allow 22/tcp # SSH — PENTING! sudo ufw allow 80/tcp # HTTP sudo ufw allow 443/tcp # HTTPS sudo ufw enable sudo ufw status ``` > [!CAUTION] > Pastikan SSH (port 22) di-allow **sebelum** enable firewall, agar tidak terkunci dari server. --- ## 8. Verifikasi Deployment ### Health Check ```bash curl http://localhost:8000/api/v1/health # {"status":"ok","version":"0.1.0"} ``` ### Test OCR (Sync) ```bash curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \ -H "X-API-Key: your-api-key" \ -F "file=@/path/to/test.pdf" | jq ``` ### Test OCR (Async — Production Flow) ```bash # Submit job curl -X POST http://localhost:8000/api/v1/documents \ -H "X-API-Key: your-api-key" \ -F "file=@document.pdf" | jq # → {"job_id":"8f2a...","status":"pending",...} # Poll result curl -H "X-API-Key: your-api-key" \ http://localhost:8000/api/v1/documents/8f2a... | jq # → {"status":"completed","confidence":0.93,"data":{...}} ``` ### Cek Semua Service Berjalan ```bash # Docker docker compose ps # Manual sudo systemctl status ocr-sprint-api ocr-sprint-worker postgresql redis-server nginx ``` --- ## 9. Monitoring & Maintenance ### Logs ```bash # Docker docker compose logs -f api worker # Manual (systemd) sudo journalctl -u ocr-sprint-api -f sudo journalctl -u ocr-sprint-worker -f ``` ### Prometheus Metrics ```bash curl http://localhost:8000/metrics ``` Metrics penting: `ocr_documents_total`, `ocr_processing_duration_seconds`, `ocr_confidence_score`. ### Backup Database ```bash # Docker docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql # Manual pg_dump -U ocr -h localhost ocr_sprint | gzip > backup_$(date +%Y%m%d).sql.gz ``` ### Automated Backup (Cron) ```bash # /opt/ocr-sprint-service/backup.sh #!/bin/bash BACKUP_DIR="/opt/ocr-sprint-service/backups" mkdir -p $BACKUP_DIR pg_dump -U ocr -h localhost ocr_sprint | gzip > $BACKUP_DIR/db_$(date +%Y%m%d_%H%M%S).sql.gz find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete ``` ```bash chmod +x /opt/ocr-sprint-service/backup.sh # Cron: daily at 2 AM echo "0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1" | sudo crontab -u ocr - ``` --- ## 10. Troubleshooting | Masalah | Diagnosis | Solusi | |---------|-----------|--------| | Service tidak start | `journalctl -u ocr-sprint-api -n 100` | Cek permissions, `.env`, dan log error | | PaddleOCR model gagal download | Timeout di logs | `python -c "from paddleocr import PaddleOCR; PaddleOCR(lang='latin')"` | | Worker tidak proses jobs | `redis-cli ping` → bukan PONG | Pastikan Redis running, cek `REDIS_URL` | | Database migration error | `alembic current` | `alembic stamp head` lalu `alembic upgrade head` | | Port 8000 sudah dipakai | `ss -tlnp | grep 8000` | Kill proses lama atau ganti port di `.env` | | Out of memory | OOM killer di logs | Kurangi `--concurrency` di worker, atau tambah RAM | --- ## 11. Security Checklist - [ ] `API_KEYS` diisi dengan random key (`openssl rand -hex 32`) - [ ] Password database diganti dari default - [ ] Firewall aktif (hanya port 22, 80, 443 terbuka) - [ ] SSL/TLS aktif via Nginx + Let's Encrypt - [ ] Endpoint `/metrics` restricted ke internal network - [ ] Backup database otomatis via cron - [ ] OS security updates enabled (`unattended-upgrades`) - [ ] `APP_ENV=prod` (bukan `local`) --- ## Quick Reference — Perintah Sehari-hari ```bash # === Docker === docker compose up -d # Start docker compose down # Stop docker compose logs -f api # Logs docker compose build && docker compose up -d # Update # === Manual === sudo systemctl restart ocr-sprint-api ocr-sprint-worker # Restart sudo journalctl -u ocr-sprint-api -f # Logs curl http://localhost:8000/api/v1/health # Health check ```