# Deployment OCR Sprint Service (Existing Stack) Panduan deployment untuk server dengan Python 3.12.3, PostgreSQL 16.13, dan Redis 7.0.15 yang sudah terinstall. ## Informasi Server Anda - **OS**: Ubuntu 24.04 - **Python**: 3.12.3 ✅ - **PostgreSQL**: 16.13 ✅ - **Redis**: 7.0.15 ✅ Semua versi sudah kompatibel dan optimal untuk OCR Sprint Service! ## Langkah 1: Install System Libraries untuk OpenCV & PaddleOCR ```bash # Update package list sudo apt update # Install libraries yang dibutuhkan oleh OpenCV dan PaddleOCR sudo apt install -y \ libgl1 \ libglib2.0-0 \ libsm6 \ libxext6 \ libxrender1 \ libgomp1 \ libmagic1 \ python3.12-venv \ python3.12-dev \ build-essential \ git ``` ## Langkah 2: Setup PostgreSQL Database ```bash # Login ke PostgreSQL sudo -u postgres psql ``` Jalankan SQL commands berikut: ```sql -- Create user dan database CREATE USER ocr WITH PASSWORD '@Offroader123'; CREATE DATABASE ocr_sprint OWNER ocr; -- Grant privileges GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr; -- Connect ke database untuk grant schema privileges \c ocr_sprint -- Grant schema privileges (PostgreSQL 15+) GRANT ALL ON SCHEMA public TO ocr; GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO ocr; GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO ocr; -- Verify \l ocr_sprint \du ocr -- Exit \q ``` **Generate password yang aman:** ```bash # Generate random password openssl rand -base64 32 +J33GdYQcWcfqXs169cmgPrQJpLFgybjoedr/tNb0d4= ``` Simpan password ini, akan digunakan di konfigurasi nanti. ## Langkah 3: Verify Redis ```bash # Check Redis status sudo systemctl status redis-server # Test connection redis-cli ping # Expected output: PONG # Check Redis config (opsional) redis-cli CONFIG GET maxmemory ``` Jika Redis belum running: ```bash sudo systemctl enable redis-server sudo systemctl start redis-server ``` ## Langkah 4: Create Application User ```bash # Create dedicated user untuk aplikasi sudo useradd -m -s /bin/bash ocr # Create application directory sudo mkdir -p /opt/ocr-sprint-service sudo chown ocr:ocr /opt/ocr-sprint-service ``` ## Langkah 5: Clone dan Install Application ```bash # Switch ke user ocr sudo su - ocr # Clone repository cd /opt git clone https://github.com/Adriankf59/ocr-sprint-service.git cd ocr-sprint-service # Create virtual environment dengan Python 3.12 python3.12 -m venv .venv # Activate virtual environment source .venv/bin/activate # Verify Python version di venv python --version # Expected: Python 3.12.3 # Upgrade pip pip install --upgrade pip setuptools wheel # Install application dengan OCR dependencies # Ini akan download ~1.5GB PaddlePaddle wheels pip install -e ".[ocr]" # Verify installation python -c "import paddleocr; print('PaddleOCR OK')" python -c "import cv2; print('OpenCV OK')" python -c "import fastapi; print('FastAPI OK')" ``` ## Langkah 6: Konfigurasi Application ```bash # Masih sebagai user ocr cd /opt/ocr-sprint-service # Copy environment template cp .env.example .env # Edit konfigurasi nano .env ``` **Konfigurasi `/opt/ocr-sprint-service/.env`:** ```bash # ==== App ==== APP_ENV=prod APP_HOST=0.0.0.0 APP_PORT=8000 APP_LOG_LEVEL=INFO # ==== Storage ==== STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs BLOB_MAX_UPLOAD_MB=25 # ==== OCR ==== OCR_LANG=latin OCR_USE_GPU=false OCR_MAX_IMAGE_SIDE=2200 # ==== Preprocessing ==== PREPROCESS_TARGET_DPI=300 PREPROCESS_DENOISE=true PREPROCESS_DESKEW=true PREPROCESS_DETECT_DOCUMENT=true PREPROCESS_REMOVE_SHADOW=true PREPROCESS_MIN_QUAD_AREA_FRACTION=0.20 # ==== Table Extraction ==== TABLES_ENABLED=true # ==== Confidence ==== CONFIDENCE_AUTO_APPROVE=0.95 CONFIDENCE_NEEDS_REVIEW=0.85 # ==== LLM (Phase 5, optional - disable untuk sekarang) ==== LLM_ENABLED=false # ==== Async Pipeline ==== QUEUE_ENABLED=true REDIS_URL=redis://localhost:6379/0 CELERY_TASK_DEFAULT_QUEUE=ocr_sprint # ==== Database ==== # Ganti 'your-password-here' dengan password yang Anda generate di Langkah 2 DATABASE_URL=postgresql+psycopg://ocr:your-password-here@localhost:5432/ocr_sprint DATABASE_ECHO=false # ==== Auth (WAJIB untuk production!) ==== # Generate dengan: openssl rand -hex 32 API_KEYS=paste-api-key-1-here,paste-api-key-2-here API_KEY_HEADER=X-API-Key ``` **Generate API keys:** ```bash # Generate 2 API keys echo "API Key 1: $(openssl rand -hex 32)" echo "API Key 2: $(openssl rand -hex 32)" ``` Copy output dan paste ke `API_KEYS` di file `.env`. **Create storage directories:** ```bash mkdir -p /opt/ocr-sprint-service/storage/blobs chmod 755 /opt/ocr-sprint-service/storage ``` ## Langkah 7: Run Database Migrations ```bash # Masih sebagai user ocr, dengan venv activated cd /opt/ocr-sprint-service source .venv/bin/activate # Run migrations alembic upgrade head # Verify - should show current revision alembic current # Expected output: (head) atau revision number ``` ## Langkah 8: Test Manual Run ```bash # Masih sebagai user ocr cd /opt/ocr-sprint-service source .venv/bin/activate # Test API server uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000 ``` **Di terminal lain (sebagai user ubuntu):** ```bash # Test health check curl http://localhost:8000/api/v1/health # Expected: {"status":"ok","version":"0.1.0"} # Test dengan sample file (jika ada) curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \ -H "X-API-Key: your-api-key-here" \ -F "file=@/path/to/test.pdf" ``` Jika berhasil, stop server dengan `Ctrl+C`. ## Langkah 9: Setup Systemd Services ```bash # Exit dari user ocr exit # Kembali sebagai user ubuntu dengan sudo ``` ### Create API Service ```bash sudo nano /etc/systemd/system/ocr-sprint-api.service ``` **Content:** ```ini [Unit] Description=OCR Sprint API Service After=network.target postgresql.service redis-server.service Wants=postgresql.service redis-server.service [Service] Type=simple User=ocr Group=ocr WorkingDirectory=/opt/ocr-sprint-service # Environment Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin" EnvironmentFile=/opt/ocr-sprint-service/.env # Start command - 4 workers untuk production ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \ ocr_sprint.main:app \ --host 0.0.0.0 \ --port 8000 \ --workers 4 \ --log-level info # Restart policy Restart=always RestartSec=10 StartLimitInterval=0 # Resource limits LimitNOFILE=65536 # Security NoNewPrivileges=true PrivateTmp=true [Install] WantedBy=multi-user.target ``` ### Create Celery Worker Service ```bash sudo nano /etc/systemd/system/ocr-sprint-worker.service ``` **Content:** ```ini [Unit] Description=OCR Sprint Celery Worker After=network.target postgresql.service redis-server.service ocr-sprint-api.service Wants=postgresql.service redis-server.service [Service] Type=simple User=ocr Group=ocr WorkingDirectory=/opt/ocr-sprint-service # Environment Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin" EnvironmentFile=/opt/ocr-sprint-service/.env # Start command - concurrency 2 untuk CPU dengan 4 cores # Sesuaikan dengan jumlah CPU cores server Anda ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \ -A ocr_sprint.worker.celery_app \ worker \ --loglevel=info \ --concurrency=2 \ --max-tasks-per-child=100 # Restart policy Restart=always RestartSec=10 StartLimitInterval=0 # Resource limits LimitNOFILE=65536 # Security NoNewPrivileges=true PrivateTmp=true [Install] WantedBy=multi-user.target ``` ### Enable dan Start Services ```bash # Reload systemd sudo systemctl daemon-reload # Enable services (auto-start on boot) sudo systemctl enable ocr-sprint-api sudo systemctl enable ocr-sprint-worker # Start services sudo systemctl start ocr-sprint-api sudo systemctl start ocr-sprint-worker # Check status sudo systemctl status ocr-sprint-api sudo systemctl status ocr-sprint-worker ``` **Expected output:** `active (running)` dengan warna hijau. ### View Logs ```bash # API logs (real-time) sudo journalctl -u ocr-sprint-api -f # Worker logs (real-time) sudo journalctl -u ocr-sprint-worker -f # Last 50 lines sudo journalctl -u ocr-sprint-api -n 50 sudo journalctl -u ocr-sprint-worker -n 50 ``` ## Langkah 10: Install dan Setup Nginx ```bash # Install Nginx dan Certbot sudo apt install -y nginx certbot python3-certbot-nginx # Check Nginx status sudo systemctl status nginx ``` ### Create Nginx Configuration ```bash sudo nano /etc/nginx/sites-available/ocr-sprint ``` **Content (ganti `ocr.yourdomain.com` dengan domain Anda):** ```nginx # Upstream upstream ocr_api { server 127.0.0.1:8000; keepalive 32; } # Rate limiting limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s; server { listen 80; server_name ocr.yourdomain.com; # Max upload size client_max_body_size 30M; client_body_buffer_size 128k; # Timeouts proxy_connect_timeout 300s; proxy_send_timeout 300s; proxy_read_timeout 300s; send_timeout 300s; # Logging access_log /var/log/nginx/ocr-sprint-access.log; error_log /var/log/nginx/ocr-sprint-error.log; # API endpoints location /api/ { limit_req zone=api_limit burst=20 nodelay; proxy_pass http://ocr_api; proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Connection ""; proxy_buffering off; } # Health check location /api/v1/health { proxy_pass http://ocr_api; proxy_http_version 1.1; proxy_set_header Host $host; access_log off; } # Metrics (restrict access) location /metrics { allow 127.0.0.1; allow 10.0.0.0/8; deny all; proxy_pass http://ocr_api; proxy_http_version 1.1; proxy_set_header Host $host; } # API docs location /docs { proxy_pass http://ocr_api; proxy_http_version 1.1; proxy_set_header Host $host; } location /redoc { proxy_pass http://ocr_api; proxy_http_version 1.1; proxy_set_header Host $host; } } ``` ### Enable Site ```bash # Test konfigurasi sudo nginx -t # Enable site sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/ # Reload Nginx sudo systemctl reload nginx ``` ### Setup SSL (jika punya domain) ```bash # Obtain certificate sudo certbot --nginx -d ocr.yourdomain.com # Test auto-renewal sudo certbot renew --dry-run ``` ## Langkah 11: Setup Firewall ```bash # Check UFW status sudo ufw status # Allow SSH (PENTING!) sudo ufw allow 22/tcp # Allow HTTP dan HTTPS sudo ufw allow 80/tcp sudo ufw allow 443/tcp # Enable firewall (jika belum) sudo ufw enable # Verify sudo ufw status numbered ``` ## Langkah 12: Verifikasi Final ### Test dari Server ```bash # Health check curl http://localhost:8000/api/v1/health # Test async endpoint curl -X POST http://localhost:8000/api/v1/documents \ -H "X-API-Key: your-api-key-here" \ -F "file=@/path/to/test.pdf" # Expected: {"job_id":"...","status":"pending",...} # Check job status curl -H "X-API-Key: your-api-key-here" \ http://localhost:8000/api/v1/documents/JOB_ID_HERE ``` ### Test via Domain (jika sudah setup SSL) ```bash curl https://ocr.yourdomain.com/api/v1/health ``` ### Check Services ```bash # All services should be active sudo systemctl status ocr-sprint-api sudo systemctl status ocr-sprint-worker sudo systemctl status postgresql sudo systemctl status redis-server sudo systemctl status nginx ``` ## Monitoring ### View Logs ```bash # API logs sudo journalctl -u ocr-sprint-api -f # Worker logs sudo journalctl -u ocr-sprint-worker -f # Nginx access logs sudo tail -f /var/log/nginx/ocr-sprint-access.log # Nginx error logs sudo tail -f /var/log/nginx/ocr-sprint-error.log ``` ### Prometheus Metrics ```bash # View metrics curl http://localhost:8000/metrics # Key metrics: # - ocr_documents_total # - ocr_processing_duration_seconds # - ocr_confidence_score ``` ## Maintenance ### Restart Services ```bash sudo systemctl restart ocr-sprint-api sudo systemctl restart ocr-sprint-worker ``` ### Update Application ```bash # Switch ke user ocr sudo su - ocr cd /opt/ocr-sprint-service # Pull latest code git pull # Activate venv source .venv/bin/activate # Update dependencies pip install -e ".[ocr]" # Run migrations alembic upgrade head # Exit exit # Restart services sudo systemctl restart ocr-sprint-api sudo systemctl restart ocr-sprint-worker # Check logs sudo journalctl -u ocr-sprint-api -n 50 ``` ### Database Backup ```bash # Create backup directory sudo mkdir -p /opt/ocr-sprint-service/backups sudo chown ocr:ocr /opt/ocr-sprint-service/backups # Manual backup sudo -u ocr pg_dump -h localhost -U ocr ocr_sprint | gzip > /opt/ocr-sprint-service/backups/backup_$(date +%Y%m%d_%H%M%S).sql.gz ``` **Setup automated backup:** ```bash # Create backup script sudo nano /opt/ocr-sprint-service/backup.sh ``` ```bash #!/bin/bash BACKUP_DIR="/opt/ocr-sprint-service/backups" DATE=$(date +%Y%m%d_%H%M%S) mkdir -p $BACKUP_DIR # Backup database PGPASSWORD='your-db-password' pg_dump -h localhost -U ocr ocr_sprint | gzip > $BACKUP_DIR/db_$DATE.sql.gz # Keep only last 7 days find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete echo "Backup completed: $DATE" ``` ```bash # Make executable sudo chmod +x /opt/ocr-sprint-service/backup.sh sudo chown ocr:ocr /opt/ocr-sprint-service/backup.sh # Setup cron (daily at 2 AM) sudo crontab -e -u ocr # Add line: 0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1 ``` ## Troubleshooting ### Service tidak start ```bash # Check detailed logs sudo journalctl -u ocr-sprint-api -n 100 --no-pager sudo journalctl -u ocr-sprint-worker -n 100 --no-pager # Check file permissions ls -la /opt/ocr-sprint-service ls -la /opt/ocr-sprint-service/storage # Test manual run sudo su - ocr cd /opt/ocr-sprint-service source .venv/bin/activate uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000 ``` ### Database connection error ```bash # Test connection sudo -u ocr psql -h localhost -U ocr -d ocr_sprint # Check PostgreSQL status sudo systemctl status postgresql # Check PostgreSQL logs sudo journalctl -u postgresql -n 50 ``` ### Redis connection error ```bash # Test Redis redis-cli ping # Check Redis status sudo systemctl status redis-server # Check Redis logs sudo journalctl -u redis-server -n 50 ``` ### Worker tidak memproses jobs ```bash # Check Celery worker status sudo su - ocr cd /opt/ocr-sprint-service source .venv/bin/activate celery -A ocr_sprint.worker.celery_app inspect active celery -A ocr_sprint.worker.celery_app inspect stats # Check Redis queue redis-cli LLEN ocr_sprint ``` ### PaddleOCR error ```bash # Re-download models sudo su - ocr cd /opt/ocr-sprint-service source .venv/bin/activate python << EOF from paddleocr import PaddleOCR ocr = PaddleOCR(use_angle_cls=True, lang='latin') print("Models downloaded successfully") EOF ``` ## Performance Tuning ### Check CPU cores ```bash nproc ``` ### Adjust worker concurrency ```bash # Edit worker service sudo nano /etc/systemd/system/ocr-sprint-worker.service # Untuk 4 cores: --concurrency=2 # Untuk 8 cores: --concurrency=4 # Untuk 16 cores: --concurrency=8 # Reload dan restart sudo systemctl daemon-reload sudo systemctl restart ocr-sprint-worker ``` ### PostgreSQL 16 Tuning ```bash sudo nano /etc/postgresql/16/main/postgresql.conf ``` **Recommended settings (sesuaikan dengan RAM server):** ``` # Untuk 8GB RAM: shared_buffers = 2GB effective_cache_size = 6GB maintenance_work_mem = 512MB work_mem = 8MB # Untuk 16GB RAM: shared_buffers = 4GB effective_cache_size = 12GB maintenance_work_mem = 1GB work_mem = 10MB # General checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 max_worker_processes = 4 max_parallel_workers_per_gather = 2 max_parallel_workers = 4 ``` ```bash sudo systemctl restart postgresql ``` ## Security Checklist - [ ] API keys set dengan nilai random yang kuat - [ ] Database password diganti dari default - [ ] Firewall enabled (UFW) - [ ] SSL/TLS enabled (jika punya domain) - [ ] `/metrics` endpoint restricted - [ ] PostgreSQL hanya listen di localhost - [ ] Redis hanya listen di localhost - [ ] Backup automated (cron job) - [ ] OS security updates enabled ## Next Steps 1. **Setup monitoring** - Install Prometheus + Grafana (opsional) 2. **Setup alerting** - Email/Slack notification untuk errors 3. **Load testing** - Test dengan volume dokumen production 4. **Backup verification** - Test restore dari backup 5. **Documentation** - Dokumentasi API keys untuk tim ## Support Untuk pertanyaan atau issues, hubungi tim development.