feat: implement robust personnel data extraction pipeline with text-based fallback and coordinate-aware processing

This commit is contained in:
Adriankf59
2026-04-26 17:16:47 +07:00
parent dbcf480130
commit 002821ca07
20 changed files with 3326 additions and 20 deletions

View File

@@ -0,0 +1,858 @@
# Deployment OCR Sprint Service (Existing Stack)
Panduan deployment untuk server dengan Python 3.12.3, PostgreSQL 16.13, dan Redis 7.0.15 yang sudah terinstall.
## Informasi Server Anda
- **OS**: Ubuntu 24.04
- **Python**: 3.12.3 ✅
- **PostgreSQL**: 16.13 ✅
- **Redis**: 7.0.15 ✅
Semua versi sudah kompatibel dan optimal untuk OCR Sprint Service!
## Langkah 1: Install System Libraries untuk OpenCV & PaddleOCR
```bash
# Update package list
sudo apt update
# Install libraries yang dibutuhkan oleh OpenCV dan PaddleOCR
sudo apt install -y \
libgl1 \
libglib2.0-0 \
libsm6 \
libxext6 \
libxrender1 \
libgomp1 \
libmagic1 \
python3.12-venv \
python3.12-dev \
build-essential \
git
```
## Langkah 2: Setup PostgreSQL Database
```bash
# Login ke PostgreSQL
sudo -u postgres psql
```
Jalankan SQL commands berikut:
```sql
-- Create user dan database
CREATE USER ocr WITH PASSWORD '@Offroader123';
CREATE DATABASE ocr_sprint OWNER ocr;
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
-- Connect ke database untuk grant schema privileges
\c ocr_sprint
-- Grant schema privileges (PostgreSQL 15+)
GRANT ALL ON SCHEMA public TO ocr;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO ocr;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO ocr;
-- Verify
\l ocr_sprint
\du ocr
-- Exit
\q
```
**Generate password yang aman:**
```bash
# Generate random password
openssl rand -base64 32
+J33GdYQcWcfqXs169cmgPrQJpLFgybjoedr/tNb0d4=
```
Simpan password ini, akan digunakan di konfigurasi nanti.
## Langkah 3: Verify Redis
```bash
# Check Redis status
sudo systemctl status redis-server
# Test connection
redis-cli ping
# Expected output: PONG
# Check Redis config (opsional)
redis-cli CONFIG GET maxmemory
```
Jika Redis belum running:
```bash
sudo systemctl enable redis-server
sudo systemctl start redis-server
```
## Langkah 4: Create Application User
```bash
# Create dedicated user untuk aplikasi
sudo useradd -m -s /bin/bash ocr
# Create application directory
sudo mkdir -p /opt/ocr-sprint-service
sudo chown ocr:ocr /opt/ocr-sprint-service
```
## Langkah 5: Clone dan Install Application
```bash
# Switch ke user ocr
sudo su - ocr
# Clone repository
cd /opt
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
# Create virtual environment dengan Python 3.12
python3.12 -m venv .venv
# Activate virtual environment
source .venv/bin/activate
# Verify Python version di venv
python --version
# Expected: Python 3.12.3
# Upgrade pip
pip install --upgrade pip setuptools wheel
# Install application dengan OCR dependencies
# Ini akan download ~1.5GB PaddlePaddle wheels
pip install -e ".[ocr]"
# Verify installation
python -c "import paddleocr; print('PaddleOCR OK')"
python -c "import cv2; print('OpenCV OK')"
python -c "import fastapi; print('FastAPI OK')"
```
## Langkah 6: Konfigurasi Application
```bash
# Masih sebagai user ocr
cd /opt/ocr-sprint-service
# Copy environment template
cp .env.example .env
# Edit konfigurasi
nano .env
```
**Konfigurasi `/opt/ocr-sprint-service/.env`:**
```bash
# ==== App ====
APP_ENV=prod
APP_HOST=0.0.0.0
APP_PORT=8000
APP_LOG_LEVEL=INFO
# ==== Storage ====
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
BLOB_MAX_UPLOAD_MB=25
# ==== OCR ====
OCR_LANG=latin
OCR_USE_GPU=false
OCR_MAX_IMAGE_SIDE=2200
# ==== Preprocessing ====
PREPROCESS_TARGET_DPI=300
PREPROCESS_DENOISE=true
PREPROCESS_DESKEW=true
PREPROCESS_DETECT_DOCUMENT=true
PREPROCESS_REMOVE_SHADOW=true
PREPROCESS_MIN_QUAD_AREA_FRACTION=0.20
# ==== Table Extraction ====
TABLES_ENABLED=true
# ==== Confidence ====
CONFIDENCE_AUTO_APPROVE=0.95
CONFIDENCE_NEEDS_REVIEW=0.85
# ==== LLM (Phase 5, optional - disable untuk sekarang) ====
LLM_ENABLED=false
# ==== Async Pipeline ====
QUEUE_ENABLED=true
REDIS_URL=redis://localhost:6379/0
CELERY_TASK_DEFAULT_QUEUE=ocr_sprint
# ==== Database ====
# Ganti 'your-password-here' dengan password yang Anda generate di Langkah 2
DATABASE_URL=postgresql+psycopg://ocr:your-password-here@localhost:5432/ocr_sprint
DATABASE_ECHO=false
# ==== Auth (WAJIB untuk production!) ====
# Generate dengan: openssl rand -hex 32
API_KEYS=paste-api-key-1-here,paste-api-key-2-here
API_KEY_HEADER=X-API-Key
```
**Generate API keys:**
```bash
# Generate 2 API keys
echo "API Key 1: $(openssl rand -hex 32)"
echo "API Key 2: $(openssl rand -hex 32)"
```
Copy output dan paste ke `API_KEYS` di file `.env`.
**Create storage directories:**
```bash
mkdir -p /opt/ocr-sprint-service/storage/blobs
chmod 755 /opt/ocr-sprint-service/storage
```
## Langkah 7: Run Database Migrations
```bash
# Masih sebagai user ocr, dengan venv activated
cd /opt/ocr-sprint-service
source .venv/bin/activate
# Run migrations
alembic upgrade head
# Verify - should show current revision
alembic current
# Expected output: (head) atau revision number
```
## Langkah 8: Test Manual Run
```bash
# Masih sebagai user ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
# Test API server
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
```
**Di terminal lain (sebagai user ubuntu):**
```bash
# Test health check
curl http://localhost:8000/api/v1/health
# Expected: {"status":"ok","version":"0.1.0"}
# Test dengan sample file (jika ada)
curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \
-H "X-API-Key: your-api-key-here" \
-F "file=@/path/to/test.pdf"
```
Jika berhasil, stop server dengan `Ctrl+C`.
## Langkah 9: Setup Systemd Services
```bash
# Exit dari user ocr
exit
# Kembali sebagai user ubuntu dengan sudo
```
### Create API Service
```bash
sudo nano /etc/systemd/system/ocr-sprint-api.service
```
**Content:**
```ini
[Unit]
Description=OCR Sprint API Service
After=network.target postgresql.service redis-server.service
Wants=postgresql.service redis-server.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
# Environment
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
# Start command - 4 workers untuk production
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
ocr_sprint.main:app \
--host 0.0.0.0 \
--port 8000 \
--workers 4 \
--log-level info
# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=0
# Resource limits
LimitNOFILE=65536
# Security
NoNewPrivileges=true
PrivateTmp=true
[Install]
WantedBy=multi-user.target
```
### Create Celery Worker Service
```bash
sudo nano /etc/systemd/system/ocr-sprint-worker.service
```
**Content:**
```ini
[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis-server.service ocr-sprint-api.service
Wants=postgresql.service redis-server.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
# Environment
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
# Start command - concurrency 2 untuk CPU dengan 4 cores
# Sesuaikan dengan jumlah CPU cores server Anda
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
-A ocr_sprint.worker.celery_app \
worker \
--loglevel=info \
--concurrency=2 \
--max-tasks-per-child=100
# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=0
# Resource limits
LimitNOFILE=65536
# Security
NoNewPrivileges=true
PrivateTmp=true
[Install]
WantedBy=multi-user.target
```
### Enable dan Start Services
```bash
# Reload systemd
sudo systemctl daemon-reload
# Enable services (auto-start on boot)
sudo systemctl enable ocr-sprint-api
sudo systemctl enable ocr-sprint-worker
# Start services
sudo systemctl start ocr-sprint-api
sudo systemctl start ocr-sprint-worker
# Check status
sudo systemctl status ocr-sprint-api
sudo systemctl status ocr-sprint-worker
```
**Expected output:** `active (running)` dengan warna hijau.
### View Logs
```bash
# API logs (real-time)
sudo journalctl -u ocr-sprint-api -f
# Worker logs (real-time)
sudo journalctl -u ocr-sprint-worker -f
# Last 50 lines
sudo journalctl -u ocr-sprint-api -n 50
sudo journalctl -u ocr-sprint-worker -n 50
```
## Langkah 10: Install dan Setup Nginx
```bash
# Install Nginx dan Certbot
sudo apt install -y nginx certbot python3-certbot-nginx
# Check Nginx status
sudo systemctl status nginx
```
### Create Nginx Configuration
```bash
sudo nano /etc/nginx/sites-available/ocr-sprint
```
**Content (ganti `ocr.yourdomain.com` dengan domain Anda):**
```nginx
# Upstream
upstream ocr_api {
server 127.0.0.1:8000;
keepalive 32;
}
# Rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
server {
listen 80;
server_name ocr.yourdomain.com;
# Max upload size
client_max_body_size 30M;
client_body_buffer_size 128k;
# Timeouts
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
send_timeout 300s;
# Logging
access_log /var/log/nginx/ocr-sprint-access.log;
error_log /var/log/nginx/ocr-sprint-error.log;
# API endpoints
location /api/ {
limit_req zone=api_limit burst=20 nodelay;
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Connection "";
proxy_buffering off;
}
# Health check
location /api/v1/health {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
access_log off;
}
# Metrics (restrict access)
location /metrics {
allow 127.0.0.1;
allow 10.0.0.0/8;
deny all;
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
# API docs
location /docs {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
location /redoc {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
}
```
### Enable Site
```bash
# Test konfigurasi
sudo nginx -t
# Enable site
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
# Reload Nginx
sudo systemctl reload nginx
```
### Setup SSL (jika punya domain)
```bash
# Obtain certificate
sudo certbot --nginx -d ocr.yourdomain.com
# Test auto-renewal
sudo certbot renew --dry-run
```
## Langkah 11: Setup Firewall
```bash
# Check UFW status
sudo ufw status
# Allow SSH (PENTING!)
sudo ufw allow 22/tcp
# Allow HTTP dan HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Enable firewall (jika belum)
sudo ufw enable
# Verify
sudo ufw status numbered
```
## Langkah 12: Verifikasi Final
### Test dari Server
```bash
# Health check
curl http://localhost:8000/api/v1/health
# Test async endpoint
curl -X POST http://localhost:8000/api/v1/documents \
-H "X-API-Key: your-api-key-here" \
-F "file=@/path/to/test.pdf"
# Expected: {"job_id":"...","status":"pending",...}
# Check job status
curl -H "X-API-Key: your-api-key-here" \
http://localhost:8000/api/v1/documents/JOB_ID_HERE
```
### Test via Domain (jika sudah setup SSL)
```bash
curl https://ocr.yourdomain.com/api/v1/health
```
### Check Services
```bash
# All services should be active
sudo systemctl status ocr-sprint-api
sudo systemctl status ocr-sprint-worker
sudo systemctl status postgresql
sudo systemctl status redis-server
sudo systemctl status nginx
```
## Monitoring
### View Logs
```bash
# API logs
sudo journalctl -u ocr-sprint-api -f
# Worker logs
sudo journalctl -u ocr-sprint-worker -f
# Nginx access logs
sudo tail -f /var/log/nginx/ocr-sprint-access.log
# Nginx error logs
sudo tail -f /var/log/nginx/ocr-sprint-error.log
```
### Prometheus Metrics
```bash
# View metrics
curl http://localhost:8000/metrics
# Key metrics:
# - ocr_documents_total
# - ocr_processing_duration_seconds
# - ocr_confidence_score
```
## Maintenance
### Restart Services
```bash
sudo systemctl restart ocr-sprint-api
sudo systemctl restart ocr-sprint-worker
```
### Update Application
```bash
# Switch ke user ocr
sudo su - ocr
cd /opt/ocr-sprint-service
# Pull latest code
git pull
# Activate venv
source .venv/bin/activate
# Update dependencies
pip install -e ".[ocr]"
# Run migrations
alembic upgrade head
# Exit
exit
# Restart services
sudo systemctl restart ocr-sprint-api
sudo systemctl restart ocr-sprint-worker
# Check logs
sudo journalctl -u ocr-sprint-api -n 50
```
### Database Backup
```bash
# Create backup directory
sudo mkdir -p /opt/ocr-sprint-service/backups
sudo chown ocr:ocr /opt/ocr-sprint-service/backups
# Manual backup
sudo -u ocr pg_dump -h localhost -U ocr ocr_sprint | gzip > /opt/ocr-sprint-service/backups/backup_$(date +%Y%m%d_%H%M%S).sql.gz
```
**Setup automated backup:**
```bash
# Create backup script
sudo nano /opt/ocr-sprint-service/backup.sh
```
```bash
#!/bin/bash
BACKUP_DIR="/opt/ocr-sprint-service/backups"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR
# Backup database
PGPASSWORD='your-db-password' pg_dump -h localhost -U ocr ocr_sprint | gzip > $BACKUP_DIR/db_$DATE.sql.gz
# Keep only last 7 days
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete
echo "Backup completed: $DATE"
```
```bash
# Make executable
sudo chmod +x /opt/ocr-sprint-service/backup.sh
sudo chown ocr:ocr /opt/ocr-sprint-service/backup.sh
# Setup cron (daily at 2 AM)
sudo crontab -e -u ocr
# Add line:
0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1
```
## Troubleshooting
### Service tidak start
```bash
# Check detailed logs
sudo journalctl -u ocr-sprint-api -n 100 --no-pager
sudo journalctl -u ocr-sprint-worker -n 100 --no-pager
# Check file permissions
ls -la /opt/ocr-sprint-service
ls -la /opt/ocr-sprint-service/storage
# Test manual run
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
```
### Database connection error
```bash
# Test connection
sudo -u ocr psql -h localhost -U ocr -d ocr_sprint
# Check PostgreSQL status
sudo systemctl status postgresql
# Check PostgreSQL logs
sudo journalctl -u postgresql -n 50
```
### Redis connection error
```bash
# Test Redis
redis-cli ping
# Check Redis status
sudo systemctl status redis-server
# Check Redis logs
sudo journalctl -u redis-server -n 50
```
### Worker tidak memproses jobs
```bash
# Check Celery worker status
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
celery -A ocr_sprint.worker.celery_app inspect active
celery -A ocr_sprint.worker.celery_app inspect stats
# Check Redis queue
redis-cli LLEN ocr_sprint
```
### PaddleOCR error
```bash
# Re-download models
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
python << EOF
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='latin')
print("Models downloaded successfully")
EOF
```
## Performance Tuning
### Check CPU cores
```bash
nproc
```
### Adjust worker concurrency
```bash
# Edit worker service
sudo nano /etc/systemd/system/ocr-sprint-worker.service
# Untuk 4 cores: --concurrency=2
# Untuk 8 cores: --concurrency=4
# Untuk 16 cores: --concurrency=8
# Reload dan restart
sudo systemctl daemon-reload
sudo systemctl restart ocr-sprint-worker
```
### PostgreSQL 16 Tuning
```bash
sudo nano /etc/postgresql/16/main/postgresql.conf
```
**Recommended settings (sesuaikan dengan RAM server):**
```
# Untuk 8GB RAM:
shared_buffers = 2GB
effective_cache_size = 6GB
maintenance_work_mem = 512MB
work_mem = 8MB
# Untuk 16GB RAM:
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
work_mem = 10MB
# General
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
```
```bash
sudo systemctl restart postgresql
```
## Security Checklist
- [ ] API keys set dengan nilai random yang kuat
- [ ] Database password diganti dari default
- [ ] Firewall enabled (UFW)
- [ ] SSL/TLS enabled (jika punya domain)
- [ ] `/metrics` endpoint restricted
- [ ] PostgreSQL hanya listen di localhost
- [ ] Redis hanya listen di localhost
- [ ] Backup automated (cron job)
- [ ] OS security updates enabled
## Next Steps
1. **Setup monitoring** - Install Prometheus + Grafana (opsional)
2. **Setup alerting** - Email/Slack notification untuk errors
3. **Load testing** - Test dengan volume dokumen production
4. **Backup verification** - Test restore dari backup
5. **Documentation** - Dokumentasi API keys untuk tim
## Support
Untuk pertanyaan atau issues, hubungi tim development.

943
docs/DEPLOYMENT-MANUAL.md Normal file
View File

@@ -0,0 +1,943 @@
# Deployment Manual OCR Sprint Service (Tanpa Docker)
Panduan lengkap deployment OCR Sprint Service langsung di server tanpa menggunakan Docker.
## Prasyarat Server
### Spesifikasi Minimum
- **OS**: Ubuntu 20.04+ / Debian 11+ / RHEL 8+
- **CPU**: 4 cores (8 cores recommended)
- **RAM**: 8 GB minimum (16 GB recommended)
- **Storage**: 50 GB free space
- **User**: Non-root user dengan sudo access
### Port yang Dibutuhkan
- `8000`: API server (internal, akan di-proxy oleh Nginx)
- `80/443`: HTTP/HTTPS (Nginx)
- `5432`: PostgreSQL (localhost only)
- `6379`: Redis (localhost only)
## Langkah 1: Install System Dependencies
### Ubuntu/Debian
```bash
# Update system
sudo apt update && sudo apt upgrade -y
# Install Python 3.11
sudo apt install -y software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update
sudo apt install -y python3.11 python3.11-venv python3.11-dev python3-pip
# Install system libraries untuk OpenCV dan PaddleOCR
sudo apt install -y \
libgl1-mesa-glx \
libglib2.0-0 \
libsm6 \
libxext6 \
libxrender1 \
libgomp1 \
libmagic1 \
build-essential \
git \
curl \
wget
# Install Redis
sudo apt install -y redis-server
sudo systemctl enable redis-server
sudo systemctl start redis-server
# Install PostgreSQL
sudo apt install -y postgresql postgresql-contrib
sudo systemctl enable postgresql
sudo systemctl start postgresql
```
### RHEL/CentOS/Rocky Linux
```bash
# Update system
sudo dnf update -y
# Install Python 3.11
sudo dnf install -y python3.11 python3.11-devel python3.11-pip
# Install system libraries
sudo dnf install -y \
mesa-libGL \
glib2 \
libSM \
libXext \
libXrender \
file-libs \
gcc \
gcc-c++ \
make \
git
# Install Redis
sudo dnf install -y redis
sudo systemctl enable redis
sudo systemctl start redis
# Install PostgreSQL
sudo dnf install -y postgresql-server postgresql-contrib
sudo postgresql-setup --initdb
sudo systemctl enable postgresql
sudo systemctl start postgresql
```
## Langkah 2: Setup Database PostgreSQL
```bash
# Masuk sebagai postgres user
sudo -u postgres psql
# Jalankan SQL commands berikut:
```
```sql
-- Create user dan database
CREATE USER ocr WITH PASSWORD 'ganti-dengan-password-kuat';
CREATE DATABASE ocr_sprint OWNER ocr;
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
-- Connect ke database
\c ocr_sprint
-- Grant schema privileges (PostgreSQL 15+)
GRANT ALL ON SCHEMA public TO ocr;
-- Exit
\q
```
**Konfigurasi PostgreSQL untuk remote access (opsional):**
```bash
# Edit postgresql.conf
sudo nano /etc/postgresql/14/main/postgresql.conf
# Uncomment dan ubah:
listen_addresses = 'localhost' # Tetap localhost untuk keamanan
# Edit pg_hba.conf
sudo nano /etc/postgresql/14/main/pg_hba.conf
# Tambahkan line:
local ocr_sprint ocr scram-sha-256
# Restart PostgreSQL
sudo systemctl restart postgresql
```
## Langkah 3: Setup Application User
```bash
# Create dedicated user untuk aplikasi
sudo useradd -m -s /bin/bash ocr
sudo usermod -aG sudo ocr # Opsional, untuk maintenance
# Create application directory
sudo mkdir -p /opt/ocr-sprint-service
sudo chown ocr:ocr /opt/ocr-sprint-service
# Switch ke user ocr
sudo su - ocr
```
## Langkah 4: Install Application
```bash
# Clone repository
cd /opt
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
# Create virtual environment
python3.11 -m venv .venv
# Activate virtual environment
source .venv/bin/activate
# Upgrade pip
pip install --upgrade pip setuptools wheel
# Install application dengan OCR dependencies
pip install -e ".[ocr]"
# Verify installation
python -c "import paddleocr; print('PaddleOCR installed successfully')"
```
## Langkah 5: Konfigurasi Application
```bash
# Copy environment template
cp .env.example .env
# Edit konfigurasi
nano .env
```
**Konfigurasi production (`/opt/ocr-sprint-service/.env`):**
```bash
# ==== App ====
APP_ENV=prod
APP_HOST=0.0.0.0
APP_PORT=8000
APP_LOG_LEVEL=INFO
# ==== Storage ====
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
BLOB_MAX_UPLOAD_MB=25
# ==== OCR ====
OCR_LANG=latin
OCR_USE_GPU=false
OCR_MAX_IMAGE_SIDE=2200
# ==== Preprocessing ====
PREPROCESS_TARGET_DPI=300
PREPROCESS_DENOISE=true
PREPROCESS_DESKEW=true
PREPROCESS_DETECT_DOCUMENT=true
PREPROCESS_REMOVE_SHADOW=true
PREPROCESS_MIN_QUAD_AREA_FRACTION=0.20
# ==== Table Extraction ====
TABLES_ENABLED=true
# ==== Confidence ====
CONFIDENCE_AUTO_APPROVE=0.95
CONFIDENCE_NEEDS_REVIEW=0.85
# ==== LLM (Phase 5, optional) ====
LLM_ENABLED=false
# ==== Async Pipeline ====
QUEUE_ENABLED=true
REDIS_URL=redis://localhost:6379/0
CELERY_TASK_DEFAULT_QUEUE=ocr_sprint
# ==== Database ====
DATABASE_URL=postgresql+psycopg://ocr:ganti-dengan-password-kuat@localhost:5432/ocr_sprint
DATABASE_ECHO=false
# ==== Auth (WAJIB!) ====
API_KEYS=key1-ganti-dengan-random-string,key2-ganti-dengan-random-string
API_KEY_HEADER=X-API-Key
```
**Generate secure API keys:**
```bash
# Generate 2 API keys
openssl rand -hex 32
openssl rand -hex 32
```
**Create storage directories:**
```bash
mkdir -p /opt/ocr-sprint-service/storage/blobs
chmod 755 /opt/ocr-sprint-service/storage
```
## Langkah 6: Run Database Migrations
```bash
# Masih sebagai user ocr, dengan venv activated
cd /opt/ocr-sprint-service
source .venv/bin/activate
# Run migrations
alembic upgrade head
# Verify
alembic current
```
## Langkah 7: Test Manual Run
```bash
# Test API server
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
# Di terminal lain, test health check
curl http://localhost:8000/api/v1/health
# Jika berhasil, stop dengan Ctrl+C
```
## Langkah 8: Setup Systemd Services
### API Service
```bash
# Exit dari user ocr, kembali ke user dengan sudo
exit
# Create systemd service file
sudo nano /etc/systemd/system/ocr-sprint-api.service
```
**Content `/etc/systemd/system/ocr-sprint-api.service`:**
```ini
[Unit]
Description=OCR Sprint API Service
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
# Environment
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
# Start command - 4 workers untuk production
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
ocr_sprint.main:app \
--host 0.0.0.0 \
--port 8000 \
--workers 4 \
--log-level info
# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=0
# Resource limits
LimitNOFILE=65536
MemoryLimit=6G
# Security
NoNewPrivileges=true
PrivateTmp=true
[Install]
WantedBy=multi-user.target
```
### Celery Worker Service
```bash
sudo nano /etc/systemd/system/ocr-sprint-worker.service
```
**Content `/etc/systemd/system/ocr-sprint-worker.service`:**
```ini
[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis.service ocr-sprint-api.service
Wants=postgresql.service redis.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
# Environment
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
# Start command - concurrency 2 untuk 4 core CPU
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
-A ocr_sprint.worker.celery_app \
worker \
--loglevel=info \
--concurrency=2 \
--max-tasks-per-child=100
# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=0
# Resource limits
LimitNOFILE=65536
MemoryLimit=4G
# Security
NoNewPrivileges=true
PrivateTmp=true
[Install]
WantedBy=multi-user.target
```
### Enable dan Start Services
```bash
# Reload systemd
sudo systemctl daemon-reload
# Enable services (auto-start on boot)
sudo systemctl enable ocr-sprint-api
sudo systemctl enable ocr-sprint-worker
# Start services
sudo systemctl start ocr-sprint-api
sudo systemctl start ocr-sprint-worker
# Check status
sudo systemctl status ocr-sprint-api
sudo systemctl status ocr-sprint-worker
# View logs
sudo journalctl -u ocr-sprint-api -f
sudo journalctl -u ocr-sprint-worker -f
```
## Langkah 9: Setup Nginx Reverse Proxy
### Install Nginx
```bash
sudo apt install -y nginx certbot python3-certbot-nginx
```
### Konfigurasi Nginx
```bash
sudo nano /etc/nginx/sites-available/ocr-sprint
```
**Content `/etc/nginx/sites-available/ocr-sprint`:**
```nginx
# Upstream untuk load balancing (jika scale horizontal)
upstream ocr_api {
server 127.0.0.1:8000;
keepalive 32;
}
# Rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
server {
listen 80;
server_name ocr.yourdomain.com; # Ganti dengan domain Anda
# Max upload size (sesuaikan dengan BLOB_MAX_UPLOAD_MB)
client_max_body_size 30M;
client_body_buffer_size 128k;
# Timeouts untuk dokumen besar
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
send_timeout 300s;
# Logging
access_log /var/log/nginx/ocr-sprint-access.log;
error_log /var/log/nginx/ocr-sprint-error.log;
# API endpoints
location /api/ {
# Rate limiting
limit_req zone=api_limit burst=20 nodelay;
proxy_pass http://ocr_api;
proxy_http_version 1.1;
# Headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Connection "";
# Disable buffering untuk streaming responses
proxy_buffering off;
}
# Health check endpoint (no rate limit)
location /api/v1/health {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
access_log off;
}
# Metrics endpoint (restrict access)
location /metrics {
# Allow only from internal network
allow 10.0.0.0/8;
allow 172.16.0.0/12;
allow 192.168.0.0/16;
allow 127.0.0.1;
deny all;
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
# Docs (opsional, bisa di-disable di production)
location /docs {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
location /redoc {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
}
```
### Enable Site
```bash
# Test konfigurasi
sudo nginx -t
# Enable site
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
# Remove default site (opsional)
sudo rm /etc/nginx/sites-enabled/default
# Reload Nginx
sudo systemctl reload nginx
```
### Setup SSL dengan Let's Encrypt
```bash
# Install certbot
sudo apt install -y certbot python3-certbot-nginx
# Obtain certificate (ganti dengan domain Anda)
sudo certbot --nginx -d ocr.yourdomain.com
# Test auto-renewal
sudo certbot renew --dry-run
```
Certbot akan otomatis mengupdate konfigurasi Nginx untuk HTTPS.
## Langkah 10: Setup Firewall
```bash
# Install UFW (jika belum ada)
sudo apt install -y ufw
# Allow SSH (PENTING! Jangan sampai terkunci)
sudo ufw allow 22/tcp
# Allow HTTP dan HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Enable firewall
sudo ufw enable
# Check status
sudo ufw status
```
## Langkah 11: Verifikasi Deployment
### Test dari Server
```bash
# Health check
curl http://localhost:8000/api/v1/health
# Test dengan API key
curl -X POST http://localhost:8000/api/v1/documents?sync=true \
-H "X-API-Key: your-api-key-here" \
-F "file=@/path/to/test.pdf"
```
### Test dari Client
```bash
# Health check via domain
curl https://ocr.yourdomain.com/api/v1/health
# Upload dokumen
curl -X POST https://ocr.yourdomain.com/api/v1/documents \
-H "X-API-Key: your-api-key-here" \
-F "file=@document.pdf"
```
## Monitoring dan Maintenance
### View Logs
```bash
# API logs
sudo journalctl -u ocr-sprint-api -f
# Worker logs
sudo journalctl -u ocr-sprint-worker -f
# Nginx logs
sudo tail -f /var/log/nginx/ocr-sprint-access.log
sudo tail -f /var/log/nginx/ocr-sprint-error.log
# PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-14-main.log
```
### Service Management
```bash
# Restart services
sudo systemctl restart ocr-sprint-api
sudo systemctl restart ocr-sprint-worker
# Stop services
sudo systemctl stop ocr-sprint-api
sudo systemctl stop ocr-sprint-worker
# Check status
sudo systemctl status ocr-sprint-api
sudo systemctl status ocr-sprint-worker
```
### Database Backup
```bash
# Create backup script
sudo nano /opt/ocr-sprint-service/backup.sh
```
**Content `backup.sh`:**
```bash
#!/bin/bash
BACKUP_DIR="/opt/ocr-sprint-service/backups"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR
# Backup database
pg_dump -U ocr -h localhost ocr_sprint | gzip > $BACKUP_DIR/db_$DATE.sql.gz
# Backup blobs (opsional, bisa besar)
# tar -czf $BACKUP_DIR/blobs_$DATE.tar.gz /opt/ocr-sprint-service/storage/blobs
# Keep only last 7 days
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete
echo "Backup completed: $DATE"
```
```bash
# Make executable
chmod +x /opt/ocr-sprint-service/backup.sh
# Setup cron job (daily at 2 AM)
sudo crontab -e
# Add line:
0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1
```
### Log Rotation
```bash
sudo nano /etc/logrotate.d/ocr-sprint
```
**Content:**
```
/var/log/nginx/ocr-sprint-*.log {
daily
rotate 14
compress
delaycompress
notifempty
create 0640 www-data adm
sharedscripts
postrotate
[ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid`
endscript
}
```
## Update Application
```bash
# Switch ke user ocr
sudo su - ocr
cd /opt/ocr-sprint-service
# Pull latest code
git pull
# Activate venv
source .venv/bin/activate
# Update dependencies
pip install -e ".[ocr]"
# Run migrations
alembic upgrade head
# Exit user ocr
exit
# Restart services
sudo systemctl restart ocr-sprint-api
sudo systemctl restart ocr-sprint-worker
# Check logs
sudo journalctl -u ocr-sprint-api -n 50
```
## Performance Tuning
### Increase Worker Concurrency
```bash
# Edit worker service
sudo nano /etc/systemd/system/ocr-sprint-worker.service
# Ubah --concurrency sesuai CPU cores
# Untuk 8 cores: --concurrency=4
# Untuk 16 cores: --concurrency=8
# Reload dan restart
sudo systemctl daemon-reload
sudo systemctl restart ocr-sprint-worker
```
### PostgreSQL Tuning
```bash
sudo nano /etc/postgresql/14/main/postgresql.conf
```
**Recommended settings untuk 16GB RAM:**
```
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 10MB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
```
```bash
sudo systemctl restart postgresql
```
### Redis Tuning
```bash
sudo nano /etc/redis/redis.conf
```
**Recommended settings:**
```
maxmemory 2gb
maxmemory-policy allkeys-lru
save "" # Disable RDB snapshots untuk performance
```
```bash
sudo systemctl restart redis
```
## Troubleshooting
### Service tidak start
```bash
# Check logs
sudo journalctl -u ocr-sprint-api -n 100 --no-pager
sudo journalctl -u ocr-sprint-worker -n 100 --no-pager
# Check permissions
ls -la /opt/ocr-sprint-service
ls -la /opt/ocr-sprint-service/storage
# Test manual run
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
```
### Database connection error
```bash
# Test connection
sudo -u ocr psql -h localhost -U ocr -d ocr_sprint
# Check PostgreSQL status
sudo systemctl status postgresql
# Check pg_hba.conf
sudo cat /etc/postgresql/14/main/pg_hba.conf | grep ocr
```
### Redis connection error
```bash
# Test Redis
redis-cli ping
# Check Redis status
sudo systemctl status redis
# Check Redis logs
sudo journalctl -u redis -n 50
```
### PaddleOCR model download gagal
```bash
# Download manual
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
python << EOF
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='latin')
print("Models downloaded successfully")
EOF
```
### Out of memory
```bash
# Check memory usage
free -h
htop
# Reduce worker concurrency
sudo nano /etc/systemd/system/ocr-sprint-worker.service
# Ubah --concurrency=1
# Add swap (jika perlu)
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
```
## Security Checklist
- [ ] API keys diganti dengan nilai random yang kuat
- [ ] Database password diganti dari default
- [ ] Firewall enabled (UFW) - hanya port 22, 80, 443 terbuka
- [ ] SSL/TLS enabled via Let's Encrypt
- [ ] `/metrics` endpoint restricted ke internal network
- [ ] Nginx rate limiting configured
- [ ] PostgreSQL hanya listen di localhost
- [ ] Redis hanya listen di localhost
- [ ] Regular backup configured (cron job)
- [ ] Log rotation configured
- [ ] OS security updates enabled (`unattended-upgrades`)
- [ ] Fail2ban installed untuk SSH protection
## Monitoring dengan Prometheus (Opsional)
### Install Prometheus
```bash
# Download Prometheus
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
sudo mv prometheus-2.45.0.linux-amd64 /opt/prometheus
# Create user
sudo useradd --no-create-home --shell /bin/false prometheus
# Create directories
sudo mkdir /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
```
### Configure Prometheus
```bash
sudo nano /etc/prometheus/prometheus.yml
```
**Content:**
```yaml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'ocr-sprint'
static_configs:
- targets: ['localhost:8000']
metrics_path: '/metrics'
```
### Create Systemd Service
```bash
sudo nano /etc/systemd/system/prometheus.service
```
**Content:**
```ini
[Unit]
Description=Prometheus
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/opt/prometheus/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/
[Install]
WantedBy=multi-user.target
```
```bash
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
```
Access Prometheus di `http://localhost:9090`
## Support
Untuk pertanyaan atau issues, hubungi tim development.

437
docs/DEPLOYMENT.md Normal file
View File

@@ -0,0 +1,437 @@
# Quickstart Deployment OCR Sprint Service
Panduan deployment OCR Sprint Service ke server production untuk pemrosesan dokumen surat sprint Polri.
## Prasyarat Server
### Spesifikasi Minimum
- **OS**: Linux (Ubuntu 20.04+ / Debian 11+ / RHEL 8+)
- **CPU**: 4 cores (8 cores recommended untuk throughput tinggi)
- **RAM**: 8 GB minimum (16 GB recommended)
- **Storage**: 50 GB free space
- ~3 GB untuk model PaddleOCR
- ~1.5 GB untuk dependencies Python
- Sisanya untuk blob storage dokumen
- **Network**: Port 8000 terbuka untuk API access
### Software Requirements
- Docker 24.0+ dan Docker Compose v2
- Git
- (Opsional) Nginx/Caddy untuk reverse proxy + SSL
## Deployment dengan Docker Compose (Recommended)
### 1. Clone Repository
```bash
# Login ke server sebagai user non-root dengan sudo access
ssh user@your-server.com
# Clone repository
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
```
### 2. Konfigurasi Environment
```bash
# Copy template environment
cp .env.example .env
# Edit konfigurasi production
nano .env
```
**Konfigurasi penting untuk production:**
```bash
# ==== App ====
APP_ENV=prod
APP_LOG_LEVEL=INFO
# ==== Storage ====
STORAGE_LOCAL_DIR=/app/storage
BLOB_STORAGE_DIR=/app/storage/blobs
BLOB_MAX_UPLOAD_MB=25
# ==== OCR ====
OCR_LANG=latin
OCR_USE_GPU=false # set true jika server punya GPU NVIDIA
OCR_MAX_IMAGE_SIDE=2200
# ==== Preprocessing ====
PREPROCESS_TARGET_DPI=300
PREPROCESS_DENOISE=true
PREPROCESS_DESKEW=true
PREPROCESS_DETECT_DOCUMENT=true
PREPROCESS_REMOVE_SHADOW=true
# ==== Table Extraction ====
TABLES_ENABLED=true
# ==== Async Pipeline ====
QUEUE_ENABLED=true
REDIS_URL=redis://redis:6379/0
CELERY_TASK_DEFAULT_QUEUE=ocr_sprint
# ==== Database ====
DATABASE_URL=postgresql+psycopg://ocr:ocr@postgres:5432/ocr_sprint
DATABASE_ECHO=false
# ==== Auth (WAJIB untuk production!) ====
API_KEYS=your-secret-key-1,your-secret-key-2
API_KEY_HEADER=X-API-Key
```
**Generate API keys yang aman:**
```bash
# Generate random API key
openssl rand -hex 32
```
### 3. Build dan Start Services
```bash
# Build Docker images
docker compose build
# Start semua services (API, Worker, Redis, Postgres)
docker compose up -d
# Cek logs untuk memastikan semua berjalan
docker compose logs -f api worker
```
**Services yang berjalan:**
- `api`: FastAPI server di port 8000
- `worker`: Celery worker untuk async processing
- `redis`: Message broker untuk job queue
- `postgres`: Database untuk job state
### 4. Verifikasi Deployment
```bash
# Health check
curl http://localhost:8000/api/v1/health
# Expected response:
# {"status":"ok","version":"0.1.0"}
# Test OCR endpoint (sync mode untuk testing)
curl -X POST http://localhost:8000/api/v1/documents?sync=true \
-H "X-API-Key: your-secret-key-1" \
-F "file=@samples/pdf/example.pdf" \
| jq
```
### 5. Setup Reverse Proxy (Nginx)
**Install Nginx:**
```bash
sudo apt update
sudo apt install nginx certbot python3-certbot-nginx
```
**Konfigurasi Nginx (`/etc/nginx/sites-available/ocr-sprint`):**
```nginx
upstream ocr_api {
server localhost:8000;
}
server {
listen 80;
server_name ocr.yourdomain.com;
client_max_body_size 30M; # Sesuaikan dengan BLOB_MAX_UPLOAD_MB
location / {
proxy_pass http://ocr_api;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeout untuk dokumen besar
proxy_read_timeout 300s;
proxy_connect_timeout 75s;
}
location /metrics {
# Restrict metrics endpoint
allow 10.0.0.0/8; # Internal network only
deny all;
proxy_pass http://ocr_api;
}
}
```
**Enable site dan setup SSL:**
```bash
# Enable site
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
# Setup SSL dengan Let's Encrypt
sudo certbot --nginx -d ocr.yourdomain.com
```
## Deployment Manual (Tanpa Docker)
### 1. Install System Dependencies
```bash
# Ubuntu/Debian
sudo apt update
sudo apt install -y \
python3.11 python3.11-venv python3-pip \
libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \
libgomp1 libmagic1 \
redis-server postgresql-14
# Start services
sudo systemctl enable --now redis-server postgresql
```
### 2. Setup Database
```bash
# Create database dan user
sudo -u postgres psql << EOF
CREATE USER ocr WITH PASSWORD 'your-secure-password';
CREATE DATABASE ocr_sprint OWNER ocr;
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
EOF
```
### 3. Install Application
```bash
# Clone repository
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install --upgrade pip
pip install -e ".[ocr]"
# Copy dan edit .env
cp .env.example .env
nano .env
```
**Update DATABASE_URL di .env:**
```bash
DATABASE_URL=postgresql+psycopg://ocr:your-secure-password@localhost:5432/ocr_sprint
REDIS_URL=redis://localhost:6379/0
QUEUE_ENABLED=true
```
### 4. Run Database Migrations
```bash
alembic upgrade head
```
### 5. Setup Systemd Services
**API Service (`/etc/systemd/system/ocr-sprint-api.service`):**
```ini
[Unit]
Description=OCR Sprint API
After=network.target postgresql.service redis.service
[Service]
Type=simple
User=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin"
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000 --workers 4
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
```
**Worker Service (`/etc/systemd/system/ocr-sprint-worker.service`):**
```ini
[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis.service
[Service]
Type=simple
User=ocr
WorkingDirectory=/opt/ocr-sprint-service
Environment="PATH=/opt/ocr-sprint-service/.venv/bin"
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery -A ocr_sprint.worker.celery_app worker -l info --concurrency=2
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
```
**Enable dan start services:**
```bash
sudo systemctl daemon-reload
sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker
sudo systemctl status ocr-sprint-api ocr-sprint-worker
```
## Monitoring dan Maintenance
### Monitoring Logs
```bash
# Docker deployment
docker compose logs -f api worker
# Manual deployment
sudo journalctl -u ocr-sprint-api -f
sudo journalctl -u ocr-sprint-worker -f
```
### Prometheus Metrics
Metrics tersedia di endpoint `/metrics`:
```bash
curl http://localhost:8000/metrics
```
**Key metrics:**
- `ocr_documents_total`: Total dokumen diproses
- `ocr_processing_duration_seconds`: Durasi processing
- `ocr_confidence_score`: Distribusi confidence score
- `celery_task_*`: Celery worker metrics
### Backup Database
```bash
# Docker deployment
docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
# Manual deployment
pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
```
### Update Service
```bash
# Docker deployment
cd ocr-sprint-service
git pull
docker compose build
docker compose up -d
# Manual deployment
cd ocr-sprint-service
git pull
source .venv/bin/activate
pip install -e ".[ocr]"
alembic upgrade head
sudo systemctl restart ocr-sprint-api ocr-sprint-worker
```
## Troubleshooting
### Service tidak start
```bash
# Cek logs
docker compose logs api worker
# Cek health check
curl http://localhost:8000/api/v1/health
```
### PaddleOCR model download gagal
```bash
# Download manual ke volume
docker compose exec api python -c "from paddleocr import PaddleOCR; PaddleOCR(use_angle_cls=True, lang='latin')"
```
### Worker tidak memproses jobs
```bash
# Cek Redis connection
docker compose exec worker redis-cli -h redis ping
# Cek Celery worker status
docker compose exec worker celery -A ocr_sprint.worker.celery_app inspect active
```
### Database migration error
```bash
# Cek current revision
docker compose exec api alembic current
# Force upgrade
docker compose exec api alembic upgrade head
```
### Out of memory
```bash
# Kurangi worker concurrency di docker-compose.yml
# Ubah: --concurrency=1 (default) atau tambahkan memory limit
```
## Security Checklist
- [ ] API_KEYS diset dengan nilai random yang kuat
- [ ] Firewall configured (hanya port 80/443 terbuka)
- [ ] SSL/TLS enabled via Nginx + Let's Encrypt
- [ ] Database password diganti dari default
- [ ] `/metrics` endpoint restricted ke internal network
- [ ] Regular backup database dan blob storage
- [ ] Log rotation configured
- [ ] OS security updates enabled
## Performance Tuning
### Untuk throughput tinggi:
1. **Increase worker concurrency:**
```yaml
# docker-compose.yml
command: ["celery", "-A", "ocr_sprint.worker.celery_app", "worker", "-l", "info", "--concurrency=4"]
```
2. **Scale workers horizontally:**
```bash
docker compose up -d --scale worker=3
```
3. **Enable GPU (jika tersedia):**
```bash
# .env
OCR_USE_GPU=true
```
4. **Tune Postgres:**
```sql
-- Increase connection pool
ALTER SYSTEM SET max_connections = 200;
ALTER SYSTEM SET shared_buffers = '2GB';
```
## Support
Untuk pertanyaan atau issues, hubungi tim development atau buat issue di repository.