feat: implement robust personnel data extraction pipeline with text-based fallback and coordinate-aware processing
This commit is contained in:
437
docs/DEPLOYMENT.md
Normal file
437
docs/DEPLOYMENT.md
Normal file
@@ -0,0 +1,437 @@
|
||||
# Quickstart Deployment OCR Sprint Service
|
||||
|
||||
Panduan deployment OCR Sprint Service ke server production untuk pemrosesan dokumen surat sprint Polri.
|
||||
|
||||
## Prasyarat Server
|
||||
|
||||
### Spesifikasi Minimum
|
||||
- **OS**: Linux (Ubuntu 20.04+ / Debian 11+ / RHEL 8+)
|
||||
- **CPU**: 4 cores (8 cores recommended untuk throughput tinggi)
|
||||
- **RAM**: 8 GB minimum (16 GB recommended)
|
||||
- **Storage**: 50 GB free space
|
||||
- ~3 GB untuk model PaddleOCR
|
||||
- ~1.5 GB untuk dependencies Python
|
||||
- Sisanya untuk blob storage dokumen
|
||||
- **Network**: Port 8000 terbuka untuk API access
|
||||
|
||||
### Software Requirements
|
||||
- Docker 24.0+ dan Docker Compose v2
|
||||
- Git
|
||||
- (Opsional) Nginx/Caddy untuk reverse proxy + SSL
|
||||
|
||||
## Deployment dengan Docker Compose (Recommended)
|
||||
|
||||
### 1. Clone Repository
|
||||
|
||||
```bash
|
||||
# Login ke server sebagai user non-root dengan sudo access
|
||||
ssh user@your-server.com
|
||||
|
||||
# Clone repository
|
||||
git clone https://github.com/Adriankf59/ocr-sprint-service.git
|
||||
cd ocr-sprint-service
|
||||
```
|
||||
|
||||
### 2. Konfigurasi Environment
|
||||
|
||||
```bash
|
||||
# Copy template environment
|
||||
cp .env.example .env
|
||||
|
||||
# Edit konfigurasi production
|
||||
nano .env
|
||||
```
|
||||
|
||||
**Konfigurasi penting untuk production:**
|
||||
|
||||
```bash
|
||||
# ==== App ====
|
||||
APP_ENV=prod
|
||||
APP_LOG_LEVEL=INFO
|
||||
|
||||
# ==== Storage ====
|
||||
STORAGE_LOCAL_DIR=/app/storage
|
||||
BLOB_STORAGE_DIR=/app/storage/blobs
|
||||
BLOB_MAX_UPLOAD_MB=25
|
||||
|
||||
# ==== OCR ====
|
||||
OCR_LANG=latin
|
||||
OCR_USE_GPU=false # set true jika server punya GPU NVIDIA
|
||||
OCR_MAX_IMAGE_SIDE=2200
|
||||
|
||||
# ==== Preprocessing ====
|
||||
PREPROCESS_TARGET_DPI=300
|
||||
PREPROCESS_DENOISE=true
|
||||
PREPROCESS_DESKEW=true
|
||||
PREPROCESS_DETECT_DOCUMENT=true
|
||||
PREPROCESS_REMOVE_SHADOW=true
|
||||
|
||||
# ==== Table Extraction ====
|
||||
TABLES_ENABLED=true
|
||||
|
||||
# ==== Async Pipeline ====
|
||||
QUEUE_ENABLED=true
|
||||
REDIS_URL=redis://redis:6379/0
|
||||
CELERY_TASK_DEFAULT_QUEUE=ocr_sprint
|
||||
|
||||
# ==== Database ====
|
||||
DATABASE_URL=postgresql+psycopg://ocr:ocr@postgres:5432/ocr_sprint
|
||||
DATABASE_ECHO=false
|
||||
|
||||
# ==== Auth (WAJIB untuk production!) ====
|
||||
API_KEYS=your-secret-key-1,your-secret-key-2
|
||||
API_KEY_HEADER=X-API-Key
|
||||
```
|
||||
|
||||
**Generate API keys yang aman:**
|
||||
|
||||
```bash
|
||||
# Generate random API key
|
||||
openssl rand -hex 32
|
||||
```
|
||||
|
||||
### 3. Build dan Start Services
|
||||
|
||||
```bash
|
||||
# Build Docker images
|
||||
docker compose build
|
||||
|
||||
# Start semua services (API, Worker, Redis, Postgres)
|
||||
docker compose up -d
|
||||
|
||||
# Cek logs untuk memastikan semua berjalan
|
||||
docker compose logs -f api worker
|
||||
```
|
||||
|
||||
**Services yang berjalan:**
|
||||
- `api`: FastAPI server di port 8000
|
||||
- `worker`: Celery worker untuk async processing
|
||||
- `redis`: Message broker untuk job queue
|
||||
- `postgres`: Database untuk job state
|
||||
|
||||
### 4. Verifikasi Deployment
|
||||
|
||||
```bash
|
||||
# Health check
|
||||
curl http://localhost:8000/api/v1/health
|
||||
|
||||
# Expected response:
|
||||
# {"status":"ok","version":"0.1.0"}
|
||||
|
||||
# Test OCR endpoint (sync mode untuk testing)
|
||||
curl -X POST http://localhost:8000/api/v1/documents?sync=true \
|
||||
-H "X-API-Key: your-secret-key-1" \
|
||||
-F "file=@samples/pdf/example.pdf" \
|
||||
| jq
|
||||
```
|
||||
|
||||
### 5. Setup Reverse Proxy (Nginx)
|
||||
|
||||
**Install Nginx:**
|
||||
|
||||
```bash
|
||||
sudo apt update
|
||||
sudo apt install nginx certbot python3-certbot-nginx
|
||||
```
|
||||
|
||||
**Konfigurasi Nginx (`/etc/nginx/sites-available/ocr-sprint`):**
|
||||
|
||||
```nginx
|
||||
upstream ocr_api {
|
||||
server localhost:8000;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 80;
|
||||
server_name ocr.yourdomain.com;
|
||||
|
||||
client_max_body_size 30M; # Sesuaikan dengan BLOB_MAX_UPLOAD_MB
|
||||
|
||||
location / {
|
||||
proxy_pass http://ocr_api;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
# Timeout untuk dokumen besar
|
||||
proxy_read_timeout 300s;
|
||||
proxy_connect_timeout 75s;
|
||||
}
|
||||
|
||||
location /metrics {
|
||||
# Restrict metrics endpoint
|
||||
allow 10.0.0.0/8; # Internal network only
|
||||
deny all;
|
||||
proxy_pass http://ocr_api;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Enable site dan setup SSL:**
|
||||
|
||||
```bash
|
||||
# Enable site
|
||||
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
|
||||
sudo nginx -t
|
||||
sudo systemctl reload nginx
|
||||
|
||||
# Setup SSL dengan Let's Encrypt
|
||||
sudo certbot --nginx -d ocr.yourdomain.com
|
||||
```
|
||||
|
||||
## Deployment Manual (Tanpa Docker)
|
||||
|
||||
### 1. Install System Dependencies
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt update
|
||||
sudo apt install -y \
|
||||
python3.11 python3.11-venv python3-pip \
|
||||
libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \
|
||||
libgomp1 libmagic1 \
|
||||
redis-server postgresql-14
|
||||
|
||||
# Start services
|
||||
sudo systemctl enable --now redis-server postgresql
|
||||
```
|
||||
|
||||
### 2. Setup Database
|
||||
|
||||
```bash
|
||||
# Create database dan user
|
||||
sudo -u postgres psql << EOF
|
||||
CREATE USER ocr WITH PASSWORD 'your-secure-password';
|
||||
CREATE DATABASE ocr_sprint OWNER ocr;
|
||||
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
|
||||
EOF
|
||||
```
|
||||
|
||||
### 3. Install Application
|
||||
|
||||
```bash
|
||||
# Clone repository
|
||||
git clone https://github.com/Adriankf59/ocr-sprint-service.git
|
||||
cd ocr-sprint-service
|
||||
|
||||
# Create virtual environment
|
||||
python3.11 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
|
||||
# Install dependencies
|
||||
pip install --upgrade pip
|
||||
pip install -e ".[ocr]"
|
||||
|
||||
# Copy dan edit .env
|
||||
cp .env.example .env
|
||||
nano .env
|
||||
```
|
||||
|
||||
**Update DATABASE_URL di .env:**
|
||||
|
||||
```bash
|
||||
DATABASE_URL=postgresql+psycopg://ocr:your-secure-password@localhost:5432/ocr_sprint
|
||||
REDIS_URL=redis://localhost:6379/0
|
||||
QUEUE_ENABLED=true
|
||||
```
|
||||
|
||||
### 4. Run Database Migrations
|
||||
|
||||
```bash
|
||||
alembic upgrade head
|
||||
```
|
||||
|
||||
### 5. Setup Systemd Services
|
||||
|
||||
**API Service (`/etc/systemd/system/ocr-sprint-api.service`):**
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=OCR Sprint API
|
||||
After=network.target postgresql.service redis.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=ocr
|
||||
WorkingDirectory=/opt/ocr-sprint-service
|
||||
Environment="PATH=/opt/ocr-sprint-service/.venv/bin"
|
||||
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000 --workers 4
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
**Worker Service (`/etc/systemd/system/ocr-sprint-worker.service`):**
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=OCR Sprint Celery Worker
|
||||
After=network.target postgresql.service redis.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=ocr
|
||||
WorkingDirectory=/opt/ocr-sprint-service
|
||||
Environment="PATH=/opt/ocr-sprint-service/.venv/bin"
|
||||
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery -A ocr_sprint.worker.celery_app worker -l info --concurrency=2
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
**Enable dan start services:**
|
||||
|
||||
```bash
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker
|
||||
sudo systemctl status ocr-sprint-api ocr-sprint-worker
|
||||
```
|
||||
|
||||
## Monitoring dan Maintenance
|
||||
|
||||
### Monitoring Logs
|
||||
|
||||
```bash
|
||||
# Docker deployment
|
||||
docker compose logs -f api worker
|
||||
|
||||
# Manual deployment
|
||||
sudo journalctl -u ocr-sprint-api -f
|
||||
sudo journalctl -u ocr-sprint-worker -f
|
||||
```
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
Metrics tersedia di endpoint `/metrics`:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8000/metrics
|
||||
```
|
||||
|
||||
**Key metrics:**
|
||||
- `ocr_documents_total`: Total dokumen diproses
|
||||
- `ocr_processing_duration_seconds`: Durasi processing
|
||||
- `ocr_confidence_score`: Distribusi confidence score
|
||||
- `celery_task_*`: Celery worker metrics
|
||||
|
||||
### Backup Database
|
||||
|
||||
```bash
|
||||
# Docker deployment
|
||||
docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
|
||||
|
||||
# Manual deployment
|
||||
pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
|
||||
```
|
||||
|
||||
### Update Service
|
||||
|
||||
```bash
|
||||
# Docker deployment
|
||||
cd ocr-sprint-service
|
||||
git pull
|
||||
docker compose build
|
||||
docker compose up -d
|
||||
|
||||
# Manual deployment
|
||||
cd ocr-sprint-service
|
||||
git pull
|
||||
source .venv/bin/activate
|
||||
pip install -e ".[ocr]"
|
||||
alembic upgrade head
|
||||
sudo systemctl restart ocr-sprint-api ocr-sprint-worker
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Service tidak start
|
||||
|
||||
```bash
|
||||
# Cek logs
|
||||
docker compose logs api worker
|
||||
|
||||
# Cek health check
|
||||
curl http://localhost:8000/api/v1/health
|
||||
```
|
||||
|
||||
### PaddleOCR model download gagal
|
||||
|
||||
```bash
|
||||
# Download manual ke volume
|
||||
docker compose exec api python -c "from paddleocr import PaddleOCR; PaddleOCR(use_angle_cls=True, lang='latin')"
|
||||
```
|
||||
|
||||
### Worker tidak memproses jobs
|
||||
|
||||
```bash
|
||||
# Cek Redis connection
|
||||
docker compose exec worker redis-cli -h redis ping
|
||||
|
||||
# Cek Celery worker status
|
||||
docker compose exec worker celery -A ocr_sprint.worker.celery_app inspect active
|
||||
```
|
||||
|
||||
### Database migration error
|
||||
|
||||
```bash
|
||||
# Cek current revision
|
||||
docker compose exec api alembic current
|
||||
|
||||
# Force upgrade
|
||||
docker compose exec api alembic upgrade head
|
||||
```
|
||||
|
||||
### Out of memory
|
||||
|
||||
```bash
|
||||
# Kurangi worker concurrency di docker-compose.yml
|
||||
# Ubah: --concurrency=1 (default) atau tambahkan memory limit
|
||||
```
|
||||
|
||||
## Security Checklist
|
||||
|
||||
- [ ] API_KEYS diset dengan nilai random yang kuat
|
||||
- [ ] Firewall configured (hanya port 80/443 terbuka)
|
||||
- [ ] SSL/TLS enabled via Nginx + Let's Encrypt
|
||||
- [ ] Database password diganti dari default
|
||||
- [ ] `/metrics` endpoint restricted ke internal network
|
||||
- [ ] Regular backup database dan blob storage
|
||||
- [ ] Log rotation configured
|
||||
- [ ] OS security updates enabled
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### Untuk throughput tinggi:
|
||||
|
||||
1. **Increase worker concurrency:**
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
command: ["celery", "-A", "ocr_sprint.worker.celery_app", "worker", "-l", "info", "--concurrency=4"]
|
||||
```
|
||||
|
||||
2. **Scale workers horizontally:**
|
||||
```bash
|
||||
docker compose up -d --scale worker=3
|
||||
```
|
||||
|
||||
3. **Enable GPU (jika tersedia):**
|
||||
```bash
|
||||
# .env
|
||||
OCR_USE_GPU=true
|
||||
```
|
||||
|
||||
4. **Tune Postgres:**
|
||||
```sql
|
||||
-- Increase connection pool
|
||||
ALTER SYSTEM SET max_connections = 200;
|
||||
ALTER SYSTEM SET shared_buffers = '2GB';
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
Untuk pertanyaan atau issues, hubungi tim development atau buat issue di repository.
|
||||
Reference in New Issue
Block a user