feat: implement robust personnel data extraction pipeline with text-based fallback and coordinate-aware processing
This commit is contained in:
858
docs/DEPLOYMENT-EXISTING-STACK.md
Normal file
858
docs/DEPLOYMENT-EXISTING-STACK.md
Normal file
@@ -0,0 +1,858 @@
|
||||
# Deployment OCR Sprint Service (Existing Stack)
|
||||
|
||||
Panduan deployment untuk server dengan Python 3.12.3, PostgreSQL 16.13, dan Redis 7.0.15 yang sudah terinstall.
|
||||
|
||||
## Informasi Server Anda
|
||||
|
||||
- **OS**: Ubuntu 24.04
|
||||
- **Python**: 3.12.3 ✅
|
||||
- **PostgreSQL**: 16.13 ✅
|
||||
- **Redis**: 7.0.15 ✅
|
||||
|
||||
Semua versi sudah kompatibel dan optimal untuk OCR Sprint Service!
|
||||
|
||||
## Langkah 1: Install System Libraries untuk OpenCV & PaddleOCR
|
||||
|
||||
```bash
|
||||
# Update package list
|
||||
sudo apt update
|
||||
|
||||
# Install libraries yang dibutuhkan oleh OpenCV dan PaddleOCR
|
||||
sudo apt install -y \
|
||||
libgl1 \
|
||||
libglib2.0-0 \
|
||||
libsm6 \
|
||||
libxext6 \
|
||||
libxrender1 \
|
||||
libgomp1 \
|
||||
libmagic1 \
|
||||
python3.12-venv \
|
||||
python3.12-dev \
|
||||
build-essential \
|
||||
git
|
||||
```
|
||||
|
||||
## Langkah 2: Setup PostgreSQL Database
|
||||
|
||||
```bash
|
||||
# Login ke PostgreSQL
|
||||
sudo -u postgres psql
|
||||
```
|
||||
|
||||
Jalankan SQL commands berikut:
|
||||
|
||||
```sql
|
||||
-- Create user dan database
|
||||
CREATE USER ocr WITH PASSWORD '@Offroader123';
|
||||
CREATE DATABASE ocr_sprint OWNER ocr;
|
||||
|
||||
-- Grant privileges
|
||||
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
|
||||
|
||||
-- Connect ke database untuk grant schema privileges
|
||||
\c ocr_sprint
|
||||
|
||||
-- Grant schema privileges (PostgreSQL 15+)
|
||||
GRANT ALL ON SCHEMA public TO ocr;
|
||||
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO ocr;
|
||||
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO ocr;
|
||||
|
||||
-- Verify
|
||||
\l ocr_sprint
|
||||
\du ocr
|
||||
|
||||
-- Exit
|
||||
\q
|
||||
```
|
||||
|
||||
**Generate password yang aman:**
|
||||
|
||||
```bash
|
||||
# Generate random password
|
||||
openssl rand -base64 32
|
||||
+J33GdYQcWcfqXs169cmgPrQJpLFgybjoedr/tNb0d4=
|
||||
```
|
||||
|
||||
Simpan password ini, akan digunakan di konfigurasi nanti.
|
||||
|
||||
## Langkah 3: Verify Redis
|
||||
|
||||
```bash
|
||||
# Check Redis status
|
||||
sudo systemctl status redis-server
|
||||
|
||||
# Test connection
|
||||
redis-cli ping
|
||||
# Expected output: PONG
|
||||
|
||||
# Check Redis config (opsional)
|
||||
redis-cli CONFIG GET maxmemory
|
||||
```
|
||||
|
||||
Jika Redis belum running:
|
||||
|
||||
```bash
|
||||
sudo systemctl enable redis-server
|
||||
sudo systemctl start redis-server
|
||||
```
|
||||
|
||||
## Langkah 4: Create Application User
|
||||
|
||||
```bash
|
||||
# Create dedicated user untuk aplikasi
|
||||
sudo useradd -m -s /bin/bash ocr
|
||||
|
||||
# Create application directory
|
||||
sudo mkdir -p /opt/ocr-sprint-service
|
||||
sudo chown ocr:ocr /opt/ocr-sprint-service
|
||||
```
|
||||
|
||||
## Langkah 5: Clone dan Install Application
|
||||
|
||||
```bash
|
||||
# Switch ke user ocr
|
||||
sudo su - ocr
|
||||
|
||||
# Clone repository
|
||||
cd /opt
|
||||
git clone https://github.com/Adriankf59/ocr-sprint-service.git
|
||||
cd ocr-sprint-service
|
||||
|
||||
# Create virtual environment dengan Python 3.12
|
||||
python3.12 -m venv .venv
|
||||
|
||||
# Activate virtual environment
|
||||
source .venv/bin/activate
|
||||
|
||||
# Verify Python version di venv
|
||||
python --version
|
||||
# Expected: Python 3.12.3
|
||||
|
||||
# Upgrade pip
|
||||
pip install --upgrade pip setuptools wheel
|
||||
|
||||
# Install application dengan OCR dependencies
|
||||
# Ini akan download ~1.5GB PaddlePaddle wheels
|
||||
pip install -e ".[ocr]"
|
||||
|
||||
# Verify installation
|
||||
python -c "import paddleocr; print('PaddleOCR OK')"
|
||||
python -c "import cv2; print('OpenCV OK')"
|
||||
python -c "import fastapi; print('FastAPI OK')"
|
||||
```
|
||||
|
||||
## Langkah 6: Konfigurasi Application
|
||||
|
||||
```bash
|
||||
# Masih sebagai user ocr
|
||||
cd /opt/ocr-sprint-service
|
||||
|
||||
# Copy environment template
|
||||
cp .env.example .env
|
||||
|
||||
# Edit konfigurasi
|
||||
nano .env
|
||||
```
|
||||
|
||||
**Konfigurasi `/opt/ocr-sprint-service/.env`:**
|
||||
|
||||
```bash
|
||||
# ==== App ====
|
||||
APP_ENV=prod
|
||||
APP_HOST=0.0.0.0
|
||||
APP_PORT=8000
|
||||
APP_LOG_LEVEL=INFO
|
||||
|
||||
# ==== Storage ====
|
||||
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
|
||||
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
|
||||
BLOB_MAX_UPLOAD_MB=25
|
||||
|
||||
# ==== OCR ====
|
||||
OCR_LANG=latin
|
||||
OCR_USE_GPU=false
|
||||
OCR_MAX_IMAGE_SIDE=2200
|
||||
|
||||
# ==== Preprocessing ====
|
||||
PREPROCESS_TARGET_DPI=300
|
||||
PREPROCESS_DENOISE=true
|
||||
PREPROCESS_DESKEW=true
|
||||
PREPROCESS_DETECT_DOCUMENT=true
|
||||
PREPROCESS_REMOVE_SHADOW=true
|
||||
PREPROCESS_MIN_QUAD_AREA_FRACTION=0.20
|
||||
|
||||
# ==== Table Extraction ====
|
||||
TABLES_ENABLED=true
|
||||
|
||||
# ==== Confidence ====
|
||||
CONFIDENCE_AUTO_APPROVE=0.95
|
||||
CONFIDENCE_NEEDS_REVIEW=0.85
|
||||
|
||||
# ==== LLM (Phase 5, optional - disable untuk sekarang) ====
|
||||
LLM_ENABLED=false
|
||||
|
||||
# ==== Async Pipeline ====
|
||||
QUEUE_ENABLED=true
|
||||
REDIS_URL=redis://localhost:6379/0
|
||||
CELERY_TASK_DEFAULT_QUEUE=ocr_sprint
|
||||
|
||||
# ==== Database ====
|
||||
# Ganti 'your-password-here' dengan password yang Anda generate di Langkah 2
|
||||
DATABASE_URL=postgresql+psycopg://ocr:your-password-here@localhost:5432/ocr_sprint
|
||||
DATABASE_ECHO=false
|
||||
|
||||
# ==== Auth (WAJIB untuk production!) ====
|
||||
# Generate dengan: openssl rand -hex 32
|
||||
API_KEYS=paste-api-key-1-here,paste-api-key-2-here
|
||||
API_KEY_HEADER=X-API-Key
|
||||
```
|
||||
|
||||
**Generate API keys:**
|
||||
|
||||
```bash
|
||||
# Generate 2 API keys
|
||||
echo "API Key 1: $(openssl rand -hex 32)"
|
||||
echo "API Key 2: $(openssl rand -hex 32)"
|
||||
```
|
||||
|
||||
Copy output dan paste ke `API_KEYS` di file `.env`.
|
||||
|
||||
**Create storage directories:**
|
||||
|
||||
```bash
|
||||
mkdir -p /opt/ocr-sprint-service/storage/blobs
|
||||
chmod 755 /opt/ocr-sprint-service/storage
|
||||
```
|
||||
|
||||
## Langkah 7: Run Database Migrations
|
||||
|
||||
```bash
|
||||
# Masih sebagai user ocr, dengan venv activated
|
||||
cd /opt/ocr-sprint-service
|
||||
source .venv/bin/activate
|
||||
|
||||
# Run migrations
|
||||
alembic upgrade head
|
||||
|
||||
# Verify - should show current revision
|
||||
alembic current
|
||||
|
||||
# Expected output: (head) atau revision number
|
||||
```
|
||||
|
||||
## Langkah 8: Test Manual Run
|
||||
|
||||
```bash
|
||||
# Masih sebagai user ocr
|
||||
cd /opt/ocr-sprint-service
|
||||
source .venv/bin/activate
|
||||
|
||||
# Test API server
|
||||
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
**Di terminal lain (sebagai user ubuntu):**
|
||||
|
||||
```bash
|
||||
# Test health check
|
||||
curl http://localhost:8000/api/v1/health
|
||||
|
||||
# Expected: {"status":"ok","version":"0.1.0"}
|
||||
|
||||
# Test dengan sample file (jika ada)
|
||||
curl -X POST "http://localhost:8000/api/v1/documents?sync=true" \
|
||||
-H "X-API-Key: your-api-key-here" \
|
||||
-F "file=@/path/to/test.pdf"
|
||||
```
|
||||
|
||||
Jika berhasil, stop server dengan `Ctrl+C`.
|
||||
|
||||
## Langkah 9: Setup Systemd Services
|
||||
|
||||
```bash
|
||||
# Exit dari user ocr
|
||||
exit
|
||||
|
||||
# Kembali sebagai user ubuntu dengan sudo
|
||||
```
|
||||
|
||||
### Create API Service
|
||||
|
||||
```bash
|
||||
sudo nano /etc/systemd/system/ocr-sprint-api.service
|
||||
```
|
||||
|
||||
**Content:**
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=OCR Sprint API Service
|
||||
After=network.target postgresql.service redis-server.service
|
||||
Wants=postgresql.service redis-server.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=ocr
|
||||
Group=ocr
|
||||
WorkingDirectory=/opt/ocr-sprint-service
|
||||
|
||||
# Environment
|
||||
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
|
||||
EnvironmentFile=/opt/ocr-sprint-service/.env
|
||||
|
||||
# Start command - 4 workers untuk production
|
||||
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
|
||||
ocr_sprint.main:app \
|
||||
--host 0.0.0.0 \
|
||||
--port 8000 \
|
||||
--workers 4 \
|
||||
--log-level info
|
||||
|
||||
# Restart policy
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
StartLimitInterval=0
|
||||
|
||||
# Resource limits
|
||||
LimitNOFILE=65536
|
||||
|
||||
# Security
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
### Create Celery Worker Service
|
||||
|
||||
```bash
|
||||
sudo nano /etc/systemd/system/ocr-sprint-worker.service
|
||||
```
|
||||
|
||||
**Content:**
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=OCR Sprint Celery Worker
|
||||
After=network.target postgresql.service redis-server.service ocr-sprint-api.service
|
||||
Wants=postgresql.service redis-server.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=ocr
|
||||
Group=ocr
|
||||
WorkingDirectory=/opt/ocr-sprint-service
|
||||
|
||||
# Environment
|
||||
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
|
||||
EnvironmentFile=/opt/ocr-sprint-service/.env
|
||||
|
||||
# Start command - concurrency 2 untuk CPU dengan 4 cores
|
||||
# Sesuaikan dengan jumlah CPU cores server Anda
|
||||
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
|
||||
-A ocr_sprint.worker.celery_app \
|
||||
worker \
|
||||
--loglevel=info \
|
||||
--concurrency=2 \
|
||||
--max-tasks-per-child=100
|
||||
|
||||
# Restart policy
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
StartLimitInterval=0
|
||||
|
||||
# Resource limits
|
||||
LimitNOFILE=65536
|
||||
|
||||
# Security
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
### Enable dan Start Services
|
||||
|
||||
```bash
|
||||
# Reload systemd
|
||||
sudo systemctl daemon-reload
|
||||
|
||||
# Enable services (auto-start on boot)
|
||||
sudo systemctl enable ocr-sprint-api
|
||||
sudo systemctl enable ocr-sprint-worker
|
||||
|
||||
# Start services
|
||||
sudo systemctl start ocr-sprint-api
|
||||
sudo systemctl start ocr-sprint-worker
|
||||
|
||||
# Check status
|
||||
sudo systemctl status ocr-sprint-api
|
||||
sudo systemctl status ocr-sprint-worker
|
||||
```
|
||||
|
||||
**Expected output:** `active (running)` dengan warna hijau.
|
||||
|
||||
### View Logs
|
||||
|
||||
```bash
|
||||
# API logs (real-time)
|
||||
sudo journalctl -u ocr-sprint-api -f
|
||||
|
||||
# Worker logs (real-time)
|
||||
sudo journalctl -u ocr-sprint-worker -f
|
||||
|
||||
# Last 50 lines
|
||||
sudo journalctl -u ocr-sprint-api -n 50
|
||||
sudo journalctl -u ocr-sprint-worker -n 50
|
||||
```
|
||||
|
||||
## Langkah 10: Install dan Setup Nginx
|
||||
|
||||
```bash
|
||||
# Install Nginx dan Certbot
|
||||
sudo apt install -y nginx certbot python3-certbot-nginx
|
||||
|
||||
# Check Nginx status
|
||||
sudo systemctl status nginx
|
||||
```
|
||||
|
||||
### Create Nginx Configuration
|
||||
|
||||
```bash
|
||||
sudo nano /etc/nginx/sites-available/ocr-sprint
|
||||
```
|
||||
|
||||
**Content (ganti `ocr.yourdomain.com` dengan domain Anda):**
|
||||
|
||||
```nginx
|
||||
# Upstream
|
||||
upstream ocr_api {
|
||||
server 127.0.0.1:8000;
|
||||
keepalive 32;
|
||||
}
|
||||
|
||||
# Rate limiting
|
||||
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
|
||||
|
||||
server {
|
||||
listen 80;
|
||||
server_name ocr.yourdomain.com;
|
||||
|
||||
# Max upload size
|
||||
client_max_body_size 30M;
|
||||
client_body_buffer_size 128k;
|
||||
|
||||
# Timeouts
|
||||
proxy_connect_timeout 300s;
|
||||
proxy_send_timeout 300s;
|
||||
proxy_read_timeout 300s;
|
||||
send_timeout 300s;
|
||||
|
||||
# Logging
|
||||
access_log /var/log/nginx/ocr-sprint-access.log;
|
||||
error_log /var/log/nginx/ocr-sprint-error.log;
|
||||
|
||||
# API endpoints
|
||||
location /api/ {
|
||||
limit_req zone=api_limit burst=20 nodelay;
|
||||
|
||||
proxy_pass http://ocr_api;
|
||||
proxy_http_version 1.1;
|
||||
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
proxy_set_header Connection "";
|
||||
|
||||
proxy_buffering off;
|
||||
}
|
||||
|
||||
# Health check
|
||||
location /api/v1/health {
|
||||
proxy_pass http://ocr_api;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
access_log off;
|
||||
}
|
||||
|
||||
# Metrics (restrict access)
|
||||
location /metrics {
|
||||
allow 127.0.0.1;
|
||||
allow 10.0.0.0/8;
|
||||
deny all;
|
||||
|
||||
proxy_pass http://ocr_api;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
|
||||
# API docs
|
||||
location /docs {
|
||||
proxy_pass http://ocr_api;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
|
||||
location /redoc {
|
||||
proxy_pass http://ocr_api;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Enable Site
|
||||
|
||||
```bash
|
||||
# Test konfigurasi
|
||||
sudo nginx -t
|
||||
|
||||
# Enable site
|
||||
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
|
||||
|
||||
# Reload Nginx
|
||||
sudo systemctl reload nginx
|
||||
```
|
||||
|
||||
### Setup SSL (jika punya domain)
|
||||
|
||||
```bash
|
||||
# Obtain certificate
|
||||
sudo certbot --nginx -d ocr.yourdomain.com
|
||||
|
||||
# Test auto-renewal
|
||||
sudo certbot renew --dry-run
|
||||
```
|
||||
|
||||
## Langkah 11: Setup Firewall
|
||||
|
||||
```bash
|
||||
# Check UFW status
|
||||
sudo ufw status
|
||||
|
||||
# Allow SSH (PENTING!)
|
||||
sudo ufw allow 22/tcp
|
||||
|
||||
# Allow HTTP dan HTTPS
|
||||
sudo ufw allow 80/tcp
|
||||
sudo ufw allow 443/tcp
|
||||
|
||||
# Enable firewall (jika belum)
|
||||
sudo ufw enable
|
||||
|
||||
# Verify
|
||||
sudo ufw status numbered
|
||||
```
|
||||
|
||||
## Langkah 12: Verifikasi Final
|
||||
|
||||
### Test dari Server
|
||||
|
||||
```bash
|
||||
# Health check
|
||||
curl http://localhost:8000/api/v1/health
|
||||
|
||||
# Test async endpoint
|
||||
curl -X POST http://localhost:8000/api/v1/documents \
|
||||
-H "X-API-Key: your-api-key-here" \
|
||||
-F "file=@/path/to/test.pdf"
|
||||
|
||||
# Expected: {"job_id":"...","status":"pending",...}
|
||||
|
||||
# Check job status
|
||||
curl -H "X-API-Key: your-api-key-here" \
|
||||
http://localhost:8000/api/v1/documents/JOB_ID_HERE
|
||||
```
|
||||
|
||||
### Test via Domain (jika sudah setup SSL)
|
||||
|
||||
```bash
|
||||
curl https://ocr.yourdomain.com/api/v1/health
|
||||
```
|
||||
|
||||
### Check Services
|
||||
|
||||
```bash
|
||||
# All services should be active
|
||||
sudo systemctl status ocr-sprint-api
|
||||
sudo systemctl status ocr-sprint-worker
|
||||
sudo systemctl status postgresql
|
||||
sudo systemctl status redis-server
|
||||
sudo systemctl status nginx
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
### View Logs
|
||||
|
||||
```bash
|
||||
# API logs
|
||||
sudo journalctl -u ocr-sprint-api -f
|
||||
|
||||
# Worker logs
|
||||
sudo journalctl -u ocr-sprint-worker -f
|
||||
|
||||
# Nginx access logs
|
||||
sudo tail -f /var/log/nginx/ocr-sprint-access.log
|
||||
|
||||
# Nginx error logs
|
||||
sudo tail -f /var/log/nginx/ocr-sprint-error.log
|
||||
```
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
```bash
|
||||
# View metrics
|
||||
curl http://localhost:8000/metrics
|
||||
|
||||
# Key metrics:
|
||||
# - ocr_documents_total
|
||||
# - ocr_processing_duration_seconds
|
||||
# - ocr_confidence_score
|
||||
```
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Restart Services
|
||||
|
||||
```bash
|
||||
sudo systemctl restart ocr-sprint-api
|
||||
sudo systemctl restart ocr-sprint-worker
|
||||
```
|
||||
|
||||
### Update Application
|
||||
|
||||
```bash
|
||||
# Switch ke user ocr
|
||||
sudo su - ocr
|
||||
cd /opt/ocr-sprint-service
|
||||
|
||||
# Pull latest code
|
||||
git pull
|
||||
|
||||
# Activate venv
|
||||
source .venv/bin/activate
|
||||
|
||||
# Update dependencies
|
||||
pip install -e ".[ocr]"
|
||||
|
||||
# Run migrations
|
||||
alembic upgrade head
|
||||
|
||||
# Exit
|
||||
exit
|
||||
|
||||
# Restart services
|
||||
sudo systemctl restart ocr-sprint-api
|
||||
sudo systemctl restart ocr-sprint-worker
|
||||
|
||||
# Check logs
|
||||
sudo journalctl -u ocr-sprint-api -n 50
|
||||
```
|
||||
|
||||
### Database Backup
|
||||
|
||||
```bash
|
||||
# Create backup directory
|
||||
sudo mkdir -p /opt/ocr-sprint-service/backups
|
||||
sudo chown ocr:ocr /opt/ocr-sprint-service/backups
|
||||
|
||||
# Manual backup
|
||||
sudo -u ocr pg_dump -h localhost -U ocr ocr_sprint | gzip > /opt/ocr-sprint-service/backups/backup_$(date +%Y%m%d_%H%M%S).sql.gz
|
||||
```
|
||||
|
||||
**Setup automated backup:**
|
||||
|
||||
```bash
|
||||
# Create backup script
|
||||
sudo nano /opt/ocr-sprint-service/backup.sh
|
||||
```
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
BACKUP_DIR="/opt/ocr-sprint-service/backups"
|
||||
DATE=$(date +%Y%m%d_%H%M%S)
|
||||
|
||||
mkdir -p $BACKUP_DIR
|
||||
|
||||
# Backup database
|
||||
PGPASSWORD='your-db-password' pg_dump -h localhost -U ocr ocr_sprint | gzip > $BACKUP_DIR/db_$DATE.sql.gz
|
||||
|
||||
# Keep only last 7 days
|
||||
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete
|
||||
|
||||
echo "Backup completed: $DATE"
|
||||
```
|
||||
|
||||
```bash
|
||||
# Make executable
|
||||
sudo chmod +x /opt/ocr-sprint-service/backup.sh
|
||||
sudo chown ocr:ocr /opt/ocr-sprint-service/backup.sh
|
||||
|
||||
# Setup cron (daily at 2 AM)
|
||||
sudo crontab -e -u ocr
|
||||
|
||||
# Add line:
|
||||
0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Service tidak start
|
||||
|
||||
```bash
|
||||
# Check detailed logs
|
||||
sudo journalctl -u ocr-sprint-api -n 100 --no-pager
|
||||
sudo journalctl -u ocr-sprint-worker -n 100 --no-pager
|
||||
|
||||
# Check file permissions
|
||||
ls -la /opt/ocr-sprint-service
|
||||
ls -la /opt/ocr-sprint-service/storage
|
||||
|
||||
# Test manual run
|
||||
sudo su - ocr
|
||||
cd /opt/ocr-sprint-service
|
||||
source .venv/bin/activate
|
||||
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
### Database connection error
|
||||
|
||||
```bash
|
||||
# Test connection
|
||||
sudo -u ocr psql -h localhost -U ocr -d ocr_sprint
|
||||
|
||||
# Check PostgreSQL status
|
||||
sudo systemctl status postgresql
|
||||
|
||||
# Check PostgreSQL logs
|
||||
sudo journalctl -u postgresql -n 50
|
||||
```
|
||||
|
||||
### Redis connection error
|
||||
|
||||
```bash
|
||||
# Test Redis
|
||||
redis-cli ping
|
||||
|
||||
# Check Redis status
|
||||
sudo systemctl status redis-server
|
||||
|
||||
# Check Redis logs
|
||||
sudo journalctl -u redis-server -n 50
|
||||
```
|
||||
|
||||
### Worker tidak memproses jobs
|
||||
|
||||
```bash
|
||||
# Check Celery worker status
|
||||
sudo su - ocr
|
||||
cd /opt/ocr-sprint-service
|
||||
source .venv/bin/activate
|
||||
celery -A ocr_sprint.worker.celery_app inspect active
|
||||
celery -A ocr_sprint.worker.celery_app inspect stats
|
||||
|
||||
# Check Redis queue
|
||||
redis-cli LLEN ocr_sprint
|
||||
```
|
||||
|
||||
### PaddleOCR error
|
||||
|
||||
```bash
|
||||
# Re-download models
|
||||
sudo su - ocr
|
||||
cd /opt/ocr-sprint-service
|
||||
source .venv/bin/activate
|
||||
|
||||
python << EOF
|
||||
from paddleocr import PaddleOCR
|
||||
ocr = PaddleOCR(use_angle_cls=True, lang='latin')
|
||||
print("Models downloaded successfully")
|
||||
EOF
|
||||
```
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### Check CPU cores
|
||||
|
||||
```bash
|
||||
nproc
|
||||
```
|
||||
|
||||
### Adjust worker concurrency
|
||||
|
||||
```bash
|
||||
# Edit worker service
|
||||
sudo nano /etc/systemd/system/ocr-sprint-worker.service
|
||||
|
||||
# Untuk 4 cores: --concurrency=2
|
||||
# Untuk 8 cores: --concurrency=4
|
||||
# Untuk 16 cores: --concurrency=8
|
||||
|
||||
# Reload dan restart
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart ocr-sprint-worker
|
||||
```
|
||||
|
||||
### PostgreSQL 16 Tuning
|
||||
|
||||
```bash
|
||||
sudo nano /etc/postgresql/16/main/postgresql.conf
|
||||
```
|
||||
|
||||
**Recommended settings (sesuaikan dengan RAM server):**
|
||||
|
||||
```
|
||||
# Untuk 8GB RAM:
|
||||
shared_buffers = 2GB
|
||||
effective_cache_size = 6GB
|
||||
maintenance_work_mem = 512MB
|
||||
work_mem = 8MB
|
||||
|
||||
# Untuk 16GB RAM:
|
||||
shared_buffers = 4GB
|
||||
effective_cache_size = 12GB
|
||||
maintenance_work_mem = 1GB
|
||||
work_mem = 10MB
|
||||
|
||||
# General
|
||||
checkpoint_completion_target = 0.9
|
||||
wal_buffers = 16MB
|
||||
default_statistics_target = 100
|
||||
random_page_cost = 1.1
|
||||
effective_io_concurrency = 200
|
||||
max_worker_processes = 4
|
||||
max_parallel_workers_per_gather = 2
|
||||
max_parallel_workers = 4
|
||||
```
|
||||
|
||||
```bash
|
||||
sudo systemctl restart postgresql
|
||||
```
|
||||
|
||||
## Security Checklist
|
||||
|
||||
- [ ] API keys set dengan nilai random yang kuat
|
||||
- [ ] Database password diganti dari default
|
||||
- [ ] Firewall enabled (UFW)
|
||||
- [ ] SSL/TLS enabled (jika punya domain)
|
||||
- [ ] `/metrics` endpoint restricted
|
||||
- [ ] PostgreSQL hanya listen di localhost
|
||||
- [ ] Redis hanya listen di localhost
|
||||
- [ ] Backup automated (cron job)
|
||||
- [ ] OS security updates enabled
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Setup monitoring** - Install Prometheus + Grafana (opsional)
|
||||
2. **Setup alerting** - Email/Slack notification untuk errors
|
||||
3. **Load testing** - Test dengan volume dokumen production
|
||||
4. **Backup verification** - Test restore dari backup
|
||||
5. **Documentation** - Dokumentasi API keys untuk tim
|
||||
|
||||
## Support
|
||||
|
||||
Untuk pertanyaan atau issues, hubungi tim development.
|
||||
Reference in New Issue
Block a user