438 lines
9.2 KiB
Markdown
438 lines
9.2 KiB
Markdown
# Quickstart Deployment OCR Sprint Service
|
|
|
|
Panduan deployment OCR Sprint Service ke server production untuk pemrosesan dokumen surat sprint Polri.
|
|
|
|
## Prasyarat Server
|
|
|
|
### Spesifikasi Minimum
|
|
- **OS**: Linux (Ubuntu 20.04+ / Debian 11+ / RHEL 8+)
|
|
- **CPU**: 4 cores (8 cores recommended untuk throughput tinggi)
|
|
- **RAM**: 8 GB minimum (16 GB recommended)
|
|
- **Storage**: 50 GB free space
|
|
- ~3 GB untuk model PaddleOCR
|
|
- ~1.5 GB untuk dependencies Python
|
|
- Sisanya untuk blob storage dokumen
|
|
- **Network**: Port 8000 terbuka untuk API access
|
|
|
|
### Software Requirements
|
|
- Docker 24.0+ dan Docker Compose v2
|
|
- Git
|
|
- (Opsional) Nginx/Caddy untuk reverse proxy + SSL
|
|
|
|
## Deployment dengan Docker Compose (Recommended)
|
|
|
|
### 1. Clone Repository
|
|
|
|
```bash
|
|
# Login ke server sebagai user non-root dengan sudo access
|
|
ssh user@your-server.com
|
|
|
|
# Clone repository
|
|
git clone https://github.com/Adriankf59/ocr-sprint-service.git
|
|
cd ocr-sprint-service
|
|
```
|
|
|
|
### 2. Konfigurasi Environment
|
|
|
|
```bash
|
|
# Copy template environment
|
|
cp .env.example .env
|
|
|
|
# Edit konfigurasi production
|
|
nano .env
|
|
```
|
|
|
|
**Konfigurasi penting untuk production:**
|
|
|
|
```bash
|
|
# ==== App ====
|
|
APP_ENV=prod
|
|
APP_LOG_LEVEL=INFO
|
|
|
|
# ==== Storage ====
|
|
STORAGE_LOCAL_DIR=/app/storage
|
|
BLOB_STORAGE_DIR=/app/storage/blobs
|
|
BLOB_MAX_UPLOAD_MB=25
|
|
|
|
# ==== OCR ====
|
|
OCR_LANG=latin
|
|
OCR_USE_GPU=false # set true jika server punya GPU NVIDIA
|
|
OCR_MAX_IMAGE_SIDE=2200
|
|
|
|
# ==== Preprocessing ====
|
|
PREPROCESS_TARGET_DPI=300
|
|
PREPROCESS_DENOISE=true
|
|
PREPROCESS_DESKEW=true
|
|
PREPROCESS_DETECT_DOCUMENT=true
|
|
PREPROCESS_REMOVE_SHADOW=true
|
|
|
|
# ==== Table Extraction ====
|
|
TABLES_ENABLED=true
|
|
|
|
# ==== Async Pipeline ====
|
|
QUEUE_ENABLED=true
|
|
REDIS_URL=redis://redis:6379/0
|
|
CELERY_TASK_DEFAULT_QUEUE=ocr_sprint
|
|
|
|
# ==== Database ====
|
|
DATABASE_URL=postgresql+psycopg://ocr:ocr@postgres:5432/ocr_sprint
|
|
DATABASE_ECHO=false
|
|
|
|
# ==== Auth (WAJIB untuk production!) ====
|
|
API_KEYS=your-secret-key-1,your-secret-key-2
|
|
API_KEY_HEADER=X-API-Key
|
|
```
|
|
|
|
**Generate API keys yang aman:**
|
|
|
|
```bash
|
|
# Generate random API key
|
|
openssl rand -hex 32
|
|
```
|
|
|
|
### 3. Build dan Start Services
|
|
|
|
```bash
|
|
# Build Docker images
|
|
docker compose build
|
|
|
|
# Start semua services (API, Worker, Redis, Postgres)
|
|
docker compose up -d
|
|
|
|
# Cek logs untuk memastikan semua berjalan
|
|
docker compose logs -f api worker
|
|
```
|
|
|
|
**Services yang berjalan:**
|
|
- `api`: FastAPI server di port 8000
|
|
- `worker`: Celery worker untuk async processing
|
|
- `redis`: Message broker untuk job queue
|
|
- `postgres`: Database untuk job state
|
|
|
|
### 4. Verifikasi Deployment
|
|
|
|
```bash
|
|
# Health check
|
|
curl http://localhost:8000/api/v1/health
|
|
|
|
# Expected response:
|
|
# {"status":"ok","version":"0.1.0"}
|
|
|
|
# Test OCR endpoint (sync mode untuk testing)
|
|
curl -X POST http://localhost:8000/api/v1/documents?sync=true \
|
|
-H "X-API-Key: your-secret-key-1" \
|
|
-F "file=@samples/pdf/example.pdf" \
|
|
| jq
|
|
```
|
|
|
|
### 5. Setup Reverse Proxy (Nginx)
|
|
|
|
**Install Nginx:**
|
|
|
|
```bash
|
|
sudo apt update
|
|
sudo apt install nginx certbot python3-certbot-nginx
|
|
```
|
|
|
|
**Konfigurasi Nginx (`/etc/nginx/sites-available/ocr-sprint`):**
|
|
|
|
```nginx
|
|
upstream ocr_api {
|
|
server localhost:8000;
|
|
}
|
|
|
|
server {
|
|
listen 80;
|
|
server_name ocr.yourdomain.com;
|
|
|
|
client_max_body_size 30M; # Sesuaikan dengan BLOB_MAX_UPLOAD_MB
|
|
|
|
location / {
|
|
proxy_pass http://ocr_api;
|
|
proxy_set_header Host $host;
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
proxy_set_header X-Forwarded-Proto $scheme;
|
|
|
|
# Timeout untuk dokumen besar
|
|
proxy_read_timeout 300s;
|
|
proxy_connect_timeout 75s;
|
|
}
|
|
|
|
location /metrics {
|
|
# Restrict metrics endpoint
|
|
allow 10.0.0.0/8; # Internal network only
|
|
deny all;
|
|
proxy_pass http://ocr_api;
|
|
}
|
|
}
|
|
```
|
|
|
|
**Enable site dan setup SSL:**
|
|
|
|
```bash
|
|
# Enable site
|
|
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
|
|
sudo nginx -t
|
|
sudo systemctl reload nginx
|
|
|
|
# Setup SSL dengan Let's Encrypt
|
|
sudo certbot --nginx -d ocr.yourdomain.com
|
|
```
|
|
|
|
## Deployment Manual (Tanpa Docker)
|
|
|
|
### 1. Install System Dependencies
|
|
|
|
```bash
|
|
# Ubuntu/Debian
|
|
sudo apt update
|
|
sudo apt install -y \
|
|
python3.11 python3.11-venv python3-pip \
|
|
libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 \
|
|
libgomp1 libmagic1 \
|
|
redis-server postgresql-14
|
|
|
|
# Start services
|
|
sudo systemctl enable --now redis-server postgresql
|
|
```
|
|
|
|
### 2. Setup Database
|
|
|
|
```bash
|
|
# Create database dan user
|
|
sudo -u postgres psql << EOF
|
|
CREATE USER ocr WITH PASSWORD 'your-secure-password';
|
|
CREATE DATABASE ocr_sprint OWNER ocr;
|
|
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
|
|
EOF
|
|
```
|
|
|
|
### 3. Install Application
|
|
|
|
```bash
|
|
# Clone repository
|
|
git clone https://github.com/Adriankf59/ocr-sprint-service.git
|
|
cd ocr-sprint-service
|
|
|
|
# Create virtual environment
|
|
python3.11 -m venv .venv
|
|
source .venv/bin/activate
|
|
|
|
# Install dependencies
|
|
pip install --upgrade pip
|
|
pip install -e ".[ocr]"
|
|
|
|
# Copy dan edit .env
|
|
cp .env.example .env
|
|
nano .env
|
|
```
|
|
|
|
**Update DATABASE_URL di .env:**
|
|
|
|
```bash
|
|
DATABASE_URL=postgresql+psycopg://ocr:your-secure-password@localhost:5432/ocr_sprint
|
|
REDIS_URL=redis://localhost:6379/0
|
|
QUEUE_ENABLED=true
|
|
```
|
|
|
|
### 4. Run Database Migrations
|
|
|
|
```bash
|
|
alembic upgrade head
|
|
```
|
|
|
|
### 5. Setup Systemd Services
|
|
|
|
**API Service (`/etc/systemd/system/ocr-sprint-api.service`):**
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=OCR Sprint API
|
|
After=network.target postgresql.service redis.service
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=ocr
|
|
WorkingDirectory=/opt/ocr-sprint-service
|
|
Environment="PATH=/opt/ocr-sprint-service/.venv/bin"
|
|
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000 --workers 4
|
|
Restart=always
|
|
RestartSec=10
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
**Worker Service (`/etc/systemd/system/ocr-sprint-worker.service`):**
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=OCR Sprint Celery Worker
|
|
After=network.target postgresql.service redis.service
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=ocr
|
|
WorkingDirectory=/opt/ocr-sprint-service
|
|
Environment="PATH=/opt/ocr-sprint-service/.venv/bin"
|
|
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery -A ocr_sprint.worker.celery_app worker -l info --concurrency=2
|
|
Restart=always
|
|
RestartSec=10
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
**Enable dan start services:**
|
|
|
|
```bash
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable --now ocr-sprint-api ocr-sprint-worker
|
|
sudo systemctl status ocr-sprint-api ocr-sprint-worker
|
|
```
|
|
|
|
## Monitoring dan Maintenance
|
|
|
|
### Monitoring Logs
|
|
|
|
```bash
|
|
# Docker deployment
|
|
docker compose logs -f api worker
|
|
|
|
# Manual deployment
|
|
sudo journalctl -u ocr-sprint-api -f
|
|
sudo journalctl -u ocr-sprint-worker -f
|
|
```
|
|
|
|
### Prometheus Metrics
|
|
|
|
Metrics tersedia di endpoint `/metrics`:
|
|
|
|
```bash
|
|
curl http://localhost:8000/metrics
|
|
```
|
|
|
|
**Key metrics:**
|
|
- `ocr_documents_total`: Total dokumen diproses
|
|
- `ocr_processing_duration_seconds`: Durasi processing
|
|
- `ocr_confidence_score`: Distribusi confidence score
|
|
- `celery_task_*`: Celery worker metrics
|
|
|
|
### Backup Database
|
|
|
|
```bash
|
|
# Docker deployment
|
|
docker compose exec postgres pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
|
|
|
|
# Manual deployment
|
|
pg_dump -U ocr ocr_sprint > backup_$(date +%Y%m%d).sql
|
|
```
|
|
|
|
### Update Service
|
|
|
|
```bash
|
|
# Docker deployment
|
|
cd ocr-sprint-service
|
|
git pull
|
|
docker compose build
|
|
docker compose up -d
|
|
|
|
# Manual deployment
|
|
cd ocr-sprint-service
|
|
git pull
|
|
source .venv/bin/activate
|
|
pip install -e ".[ocr]"
|
|
alembic upgrade head
|
|
sudo systemctl restart ocr-sprint-api ocr-sprint-worker
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Service tidak start
|
|
|
|
```bash
|
|
# Cek logs
|
|
docker compose logs api worker
|
|
|
|
# Cek health check
|
|
curl http://localhost:8000/api/v1/health
|
|
```
|
|
|
|
### PaddleOCR model download gagal
|
|
|
|
```bash
|
|
# Download manual ke volume
|
|
docker compose exec api python -c "from paddleocr import PaddleOCR; PaddleOCR(use_angle_cls=True, lang='latin')"
|
|
```
|
|
|
|
### Worker tidak memproses jobs
|
|
|
|
```bash
|
|
# Cek Redis connection
|
|
docker compose exec worker redis-cli -h redis ping
|
|
|
|
# Cek Celery worker status
|
|
docker compose exec worker celery -A ocr_sprint.worker.celery_app inspect active
|
|
```
|
|
|
|
### Database migration error
|
|
|
|
```bash
|
|
# Cek current revision
|
|
docker compose exec api alembic current
|
|
|
|
# Force upgrade
|
|
docker compose exec api alembic upgrade head
|
|
```
|
|
|
|
### Out of memory
|
|
|
|
```bash
|
|
# Kurangi worker concurrency di docker-compose.yml
|
|
# Ubah: --concurrency=1 (default) atau tambahkan memory limit
|
|
```
|
|
|
|
## Security Checklist
|
|
|
|
- [ ] API_KEYS diset dengan nilai random yang kuat
|
|
- [ ] Firewall configured (hanya port 80/443 terbuka)
|
|
- [ ] SSL/TLS enabled via Nginx + Let's Encrypt
|
|
- [ ] Database password diganti dari default
|
|
- [ ] `/metrics` endpoint restricted ke internal network
|
|
- [ ] Regular backup database dan blob storage
|
|
- [ ] Log rotation configured
|
|
- [ ] OS security updates enabled
|
|
|
|
## Performance Tuning
|
|
|
|
### Untuk throughput tinggi:
|
|
|
|
1. **Increase worker concurrency:**
|
|
```yaml
|
|
# docker-compose.yml
|
|
command: ["celery", "-A", "ocr_sprint.worker.celery_app", "worker", "-l", "info", "--concurrency=4"]
|
|
```
|
|
|
|
2. **Scale workers horizontally:**
|
|
```bash
|
|
docker compose up -d --scale worker=3
|
|
```
|
|
|
|
3. **Enable GPU (jika tersedia):**
|
|
```bash
|
|
# .env
|
|
OCR_USE_GPU=true
|
|
```
|
|
|
|
4. **Tune Postgres:**
|
|
```sql
|
|
-- Increase connection pool
|
|
ALTER SYSTEM SET max_connections = 200;
|
|
ALTER SYSTEM SET shared_buffers = '2GB';
|
|
```
|
|
|
|
## Support
|
|
|
|
Untuk pertanyaan atau issues, hubungi tim development atau buat issue di repository.
|