# Quickstart Deployment OCR Sprint Service Panduan deployment OCR Sprint Service ke server production untuk pemrosesan dokumen surat sprint Polri. ## Prasyarat Server ### Spesifikasi Minimum - **OS**: Linux (Ubuntu 20.04+ / Debian 11+ / RHEL 8+) - **CPU**: 4 cores (8 cores recommended untuk throughput tinggi) - **RAM**: 8 GB minimum (16 GB recommended) - **Storage**: 50 GB free space - ~3 GB untuk model PaddleOCR - ~1.5 GB untuk dependencies Python - Sisanya untuk blob storage dokumen - **Network**: Port 8000 terbuka untuk API access ### Software Requirements - Docker 24.0+ dan Docker Compose v2 - Git - (Opsional) Nginx/Caddy untuk reverse proxy + SSL ## Deployment dengan Docker Compose (Recommended) ### 1. Clone Repository ```bash # Login ke server sebagai user non-root dengan sudo access ssh user@your-server.com # Clone repository git clone https://github.com/Adriankf59/ocr-sprint-service.git cd ocr-sprint-service ``` ### 2. Konfigurasi Environment ```bash # Copy template environment cp .env.example .env # Edit konfigurasi production nano .env ``` **Konfigurasi penting untuk production:** ```bash # ==== App ==== APP_ENV=prod APP_LOG_LEVEL=INFO # ==== Storage ==== STORAGE_LOCAL_DIR=/app/storage BLOB_STORAGE_DIR=/app/storage/blobs BLOB_MAX_UPLOAD_MB=25 # ==== OCR ==== OCR_LANG=latin OCR_USE_GPU=false # set true jika server punya GPU NVIDIA OCR_MAX_IMAGE_SIDE=2200 # ==== Preprocessing ==== PREPROCESS_TARGET_DPI=300 PREPROCESS_DENOISE=true PREPROCESS_DESKEW=true PREPROCESS_DETECT_DOCUMENT=true PREPROCESS_REMOVE_SHADOW=true # ==== Table Extraction ==== TABLES_ENABLED=true # ==== Async Pipeline ==== QUEUE_ENABLED=true REDIS_URL=redis://redis:6379/0 CELERY_TASK_DEFAULT_QUEUE=ocr_sprint # ==== Database ==== DATABASE_URL=postgresql+psycopg://ocr:ocr@postgres:5432/ocr_sprint DATABASE_ECHO=false # ==== Auth (WAJIB untuk production!) ==== API_KEYS=your-secret-key-1,your-secret-key-2 API_KEY_HEADER=X-API-Key ``` **Generate API keys yang aman:** ```bash # Generate random API key openssl rand -hex 32 ``` ### 3. Build dan Start Services ```bash # Build Docker images docker compose build # Start semua services (API, Worker, Redis, Postgres) docker compose up -d # Cek logs untuk memastikan semua berjalan docker compose logs -f api worker ``` **Services yang berjalan:** - `api`: FastAPI server di port 8000 - `worker`: Celery worker untuk async processing - `redis`: Message broker untuk job queue - `postgres`: Database untuk job state ### 4. Verifikasi Deployment ```bash # Health check curl http://localhost:8000/api/v1/health # Expected response: # {"status":"ok","version":"0.1.0"} # Test OCR endpoint (sync mode untuk testing) curl -X POST http://localhost:8000/api/v1/documents?sync=true \ -H "X-API-Key: your-secret-key-1" \ -F "file=@samples/pdf/example.pdf" \ | jq ``` ### 5. Setup Reverse Proxy (Nginx) **Install Nginx:** ```bash sudo apt update sudo apt install nginx certbot python3-certbot-nginx ``` **Konfigurasi Nginx (`/etc/nginx/sites-available/ocr-sprint`):** ```nginx upstream ocr_api { server localhost:8000; } server { listen 80; server_name ocr.yourdomain.com; client_max_body_size 30M; # Sesuaikan dengan BLOB_MAX_UPLOAD_MB location / { proxy_pass http://ocr_api; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # Timeout untuk dokumen besar proxy_read_timeout 300s; proxy_connect_timeout 75s; } location /metrics { # Restrict metrics endpoint allow 10.0.0.0/8; # Internal network only deny all; proxy_pass http://ocr_api; } } ``` **Enable site dan setup SSL:** ```bash # Enable site sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/ sudo nginx -t sudo systemctl reload nginx # Setup SSL dengan Let's Encrypt sudo certbot --nginx -d ocr.yourdomain.com ``` ## Deployment Manual (Tanpa Docker) ### 1. Install System Dependencies ```bash # Ubuntu/Debian sudo apt update sudo apt install -y \ python3.11 python3.11-venv python3-pip \ libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \ libgomp1 libmagic1 \ redis-server postgresql-14 # Start services sudo systemctl enable --now redis-server postgresql ``` ### 2. Setup Database ```bash # Create database dan user sudo -u postgres psql << EOF CREATE USER ocr WITH PASSWORD 'your-secure-password'; CREATE DATABASE ocr_sprint OWNER ocr; GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr; EOF ``` ### 3. Install Application ```bash # Clone repository git clone https://github.com/Adriankf59/ocr-sprint-service.git cd ocr-sprint-service # Create virtual environment python3.11 -m venv .venv source .venv/bin/activate # Install dependencies pip install --upgrade pip pip install -e ".[ocr]" # Copy dan edit .env cp .env.example .env nano .env ``` **Update DATABASE_URL di .env:** ```bash DATABASE_URL=postgresql+psycopg://ocr:your-secure-password@localhost:5432/ocr_sprint REDIS_URL=redis://localhost:6379/0 QUEUE_ENABLED=true ``` ### 4. Run Database Migrations ```bash alembic upgrade head ``` ### 5. Setup Systemd Services **API Service (`/etc/systemd/system/ocr-sprint-api.service`):** ```ini [Unit] Description=OCR Sprint API After=network.target postgresql.service redis.service [Service] Type=simple User=ocr WorkingDirectory=/opt/ocr-sprint-service Environment="PATH=/opt/ocr-sprint-service/.venv/bin" ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000 --workers 4 Restart=always RestartSec=10 [Install] WantedBy=multi-user.target ``` **Worker Service (`/etc/systemd/system/ocr-sprint-worker.service`):** ```ini [Unit] Description=OCR Sprint Celery Worker After=network.target postgresql.service redis.service [Service] Type=simple User=ocr WorkingDirectory=/opt/ocr-sprint-service Environment="PATH=/opt/ocr-sprint-service/.venv/bin" ExecStart=/opt/ocr-sprint-service/.venv/bin/celery -A ocr_sprint.worker.celery_app worker -l info --concurrency=2 Restart=always RestartSec=10 [Install] WantedBy=multi-user.target ``` **Enable dan start services:** ```bash sudo systemctl daemon-reload sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker sudo systemctl status ocr-sprint-api ocr-sprint-worker ``` ## Monitoring dan Maintenance ### Monitoring Logs ```bash # Docker deployment docker compose logs -f api worker # Manual deployment sudo journalctl -u ocr-sprint-api -f sudo journalctl -u ocr-sprint-worker -f ``` ### Prometheus Metrics Metrics tersedia di endpoint `/metrics`: ```bash curl http://localhost:8000/metrics ``` **Key metrics:** - `ocr_documents_total`: Total dokumen diproses - `ocr_processing_duration_seconds`: Durasi processing - `ocr_confidence_score`: Distribusi confidence score - `celery_task_*`: Celery worker metrics ### Backup Database ```bash # Docker deployment docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql # Manual deployment pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql ``` ### Update Service ```bash # Docker deployment cd ocr-sprint-service git pull docker compose build docker compose up -d # Manual deployment cd ocr-sprint-service git pull source .venv/bin/activate pip install -e ".[ocr]" alembic upgrade head sudo systemctl restart ocr-sprint-api ocr-sprint-worker ``` ## Troubleshooting ### Service tidak start ```bash # Cek logs docker compose logs api worker # Cek health check curl http://localhost:8000/api/v1/health ``` ### PaddleOCR model download gagal ```bash # Download manual ke volume docker compose exec api python -c "from paddleocr import PaddleOCR; PaddleOCR(use_angle_cls=True, lang='latin')" ``` ### Worker tidak memproses jobs ```bash # Cek Redis connection docker compose exec worker redis-cli -h redis ping # Cek Celery worker status docker compose exec worker celery -A ocr_sprint.worker.celery_app inspect active ``` ### Database migration error ```bash # Cek current revision docker compose exec api alembic current # Force upgrade docker compose exec api alembic upgrade head ``` ### Out of memory ```bash # Kurangi worker concurrency di docker-compose.yml # Ubah: --concurrency=1 (default) atau tambahkan memory limit ``` ## Security Checklist - [ ] API_KEYS diset dengan nilai random yang kuat - [ ] Firewall configured (hanya port 80/443 terbuka) - [ ] SSL/TLS enabled via Nginx + Let's Encrypt - [ ] Database password diganti dari default - [ ] `/metrics` endpoint restricted ke internal network - [ ] Regular backup database dan blob storage - [ ] Log rotation configured - [ ] OS security updates enabled ## Performance Tuning ### Untuk throughput tinggi: 1. **Increase worker concurrency:** ```yaml # docker-compose.yml command: ["celery", "-A", "ocr_sprint.worker.celery_app", "worker", "-l", "info", "--concurrency=4"] ``` 2. **Scale workers horizontally:** ```bash docker compose up -d --scale worker=3 ``` 3. **Enable GPU (jika tersedia):** ```bash # .env OCR_USE_GPU=true ``` 4. **Tune Postgres:** ```sql -- Increase connection pool ALTER SYSTEM SET max_connections = 200; ALTER SYSTEM SET shared_buffers = '2GB'; ``` ## Support Untuk pertanyaan atau issues, hubungi tim development atau buat issue di repository.