Files
OCR-SPRIN-SERVICE/docs/DEPLOYMENT-MANUAL.md

944 lines
18 KiB
Markdown

# Deployment Manual OCR Sprint Service (Tanpa Docker)
Panduan lengkap deployment OCR Sprint Service langsung di server tanpa menggunakan Docker.
## Prasyarat Server
### Spesifikasi Minimum
- **OS**: Ubuntu 20.04+ / Debian 11+ / RHEL 8+
- **CPU**: 4 cores (8 cores recommended)
- **RAM**: 8 GB minimum (16 GB recommended)
- **Storage**: 50 GB free space
- **User**: Non-root user dengan sudo access
### Port yang Dibutuhkan
- `8000`: API server (internal, akan di-proxy oleh Nginx)
- `80/443`: HTTP/HTTPS (Nginx)
- `5432`: PostgreSQL (localhost only)
- `6379`: Redis (localhost only)
## Langkah 1: Install System Dependencies
### Ubuntu/Debian
```bash
# Update system
sudo apt update && sudo apt upgrade -y
# Install Python 3.11
sudo apt install -y software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update
sudo apt install -y python3.11 python3.11-venv python3.11-dev python3-pip
# Install system libraries untuk OpenCV dan PaddleOCR
sudo apt install -y \
libgl1-mesa-glx \
libglib2.0-0 \
libsm6 \
libxext6 \
libxrender1 \
libgomp1 \
libmagic1 \
build-essential \
git \
curl \
wget
# Install Redis
sudo apt install -y redis-server
sudo systemctl enable redis-server
sudo systemctl start redis-server
# Install PostgreSQL
sudo apt install -y postgresql postgresql-contrib
sudo systemctl enable postgresql
sudo systemctl start postgresql
```
### RHEL/CentOS/Rocky Linux
```bash
# Update system
sudo dnf update -y
# Install Python 3.11
sudo dnf install -y python3.11 python3.11-devel python3.11-pip
# Install system libraries
sudo dnf install -y \
mesa-libGL \
glib2 \
libSM \
libXext \
libXrender \
file-libs \
gcc \
gcc-c++ \
make \
git
# Install Redis
sudo dnf install -y redis
sudo systemctl enable redis
sudo systemctl start redis
# Install PostgreSQL
sudo dnf install -y postgresql-server postgresql-contrib
sudo postgresql-setup --initdb
sudo systemctl enable postgresql
sudo systemctl start postgresql
```
## Langkah 2: Setup Database PostgreSQL
```bash
# Masuk sebagai postgres user
sudo -u postgres psql
# Jalankan SQL commands berikut:
```
```sql
-- Create user dan database
CREATE USER ocr WITH PASSWORD 'ganti-dengan-password-kuat';
CREATE DATABASE ocr_sprint OWNER ocr;
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE ocr_sprint TO ocr;
-- Connect ke database
\c ocr_sprint
-- Grant schema privileges (PostgreSQL 15+)
GRANT ALL ON SCHEMA public TO ocr;
-- Exit
\q
```
**Konfigurasi PostgreSQL untuk remote access (opsional):**
```bash
# Edit postgresql.conf
sudo nano /etc/postgresql/14/main/postgresql.conf
# Uncomment dan ubah:
listen_addresses = 'localhost' # Tetap localhost untuk keamanan
# Edit pg_hba.conf
sudo nano /etc/postgresql/14/main/pg_hba.conf
# Tambahkan line:
local ocr_sprint ocr scram-sha-256
# Restart PostgreSQL
sudo systemctl restart postgresql
```
## Langkah 3: Setup Application User
```bash
# Create dedicated user untuk aplikasi
sudo useradd -m -s /bin/bash ocr
sudo usermod -aG sudo ocr # Opsional, untuk maintenance
# Create application directory
sudo mkdir -p /opt/ocr-sprint-service
sudo chown ocr:ocr /opt/ocr-sprint-service
# Switch ke user ocr
sudo su - ocr
```
## Langkah 4: Install Application
```bash
# Clone repository
cd /opt
git clone https://github.com/Adriankf59/ocr-sprint-service.git
cd ocr-sprint-service
# Create virtual environment
python3.11 -m venv .venv
# Activate virtual environment
source .venv/bin/activate
# Upgrade pip
pip install --upgrade pip setuptools wheel
# Install application dengan OCR dependencies
pip install -e ".[ocr]"
# Verify installation
python -c "import paddleocr; print('PaddleOCR installed successfully')"
```
## Langkah 5: Konfigurasi Application
```bash
# Copy environment template
cp .env.example .env
# Edit konfigurasi
nano .env
```
**Konfigurasi production (`/opt/ocr-sprint-service/.env`):**
```bash
# ==== App ====
APP_ENV=prod
APP_HOST=0.0.0.0
APP_PORT=8000
APP_LOG_LEVEL=INFO
# ==== Storage ====
STORAGE_LOCAL_DIR=/opt/ocr-sprint-service/storage
BLOB_STORAGE_DIR=/opt/ocr-sprint-service/storage/blobs
BLOB_MAX_UPLOAD_MB=25
# ==== OCR ====
OCR_LANG=latin
OCR_USE_GPU=false
OCR_MAX_IMAGE_SIDE=2200
# ==== Preprocessing ====
PREPROCESS_TARGET_DPI=300
PREPROCESS_DENOISE=true
PREPROCESS_DESKEW=true
PREPROCESS_DETECT_DOCUMENT=true
PREPROCESS_REMOVE_SHADOW=true
PREPROCESS_MIN_QUAD_AREA_FRACTION=0.20
# ==== Table Extraction ====
TABLES_ENABLED=true
# ==== Confidence ====
CONFIDENCE_AUTO_APPROVE=0.95
CONFIDENCE_NEEDS_REVIEW=0.85
# ==== LLM (Phase 5, optional) ====
LLM_ENABLED=false
# ==== Async Pipeline ====
QUEUE_ENABLED=true
REDIS_URL=redis://localhost:6379/0
CELERY_TASK_DEFAULT_QUEUE=ocr_sprint
# ==== Database ====
DATABASE_URL=postgresql+psycopg://ocr:ganti-dengan-password-kuat@localhost:5432/ocr_sprint
DATABASE_ECHO=false
# ==== Auth (WAJIB!) ====
API_KEYS=key1-ganti-dengan-random-string,key2-ganti-dengan-random-string
API_KEY_HEADER=X-API-Key
```
**Generate secure API keys:**
```bash
# Generate 2 API keys
openssl rand -hex 32
openssl rand -hex 32
```
**Create storage directories:**
```bash
mkdir -p /opt/ocr-sprint-service/storage/blobs
chmod 755 /opt/ocr-sprint-service/storage
```
## Langkah 6: Run Database Migrations
```bash
# Masih sebagai user ocr, dengan venv activated
cd /opt/ocr-sprint-service
source .venv/bin/activate
# Run migrations
alembic upgrade head
# Verify
alembic current
```
## Langkah 7: Test Manual Run
```bash
# Test API server
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
# Di terminal lain, test health check
curl http://localhost:8000/api/v1/health
# Jika berhasil, stop dengan Ctrl+C
```
## Langkah 8: Setup Systemd Services
### API Service
```bash
# Exit dari user ocr, kembali ke user dengan sudo
exit
# Create systemd service file
sudo nano /etc/systemd/system/ocr-sprint-api.service
```
**Content `/etc/systemd/system/ocr-sprint-api.service`:**
```ini
[Unit]
Description=OCR Sprint API Service
After=network.target postgresql.service redis.service
Wants=postgresql.service redis.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
# Environment
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
# Start command - 4 workers untuk production
ExecStart=/opt/ocr-sprint-service/.venv/bin/uvicorn \
ocr_sprint.main:app \
--host 0.0.0.0 \
--port 8000 \
--workers 4 \
--log-level info
# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=0
# Resource limits
LimitNOFILE=65536
MemoryLimit=6G
# Security
NoNewPrivileges=true
PrivateTmp=true
[Install]
WantedBy=multi-user.target
```
### Celery Worker Service
```bash
sudo nano /etc/systemd/system/ocr-sprint-worker.service
```
**Content `/etc/systemd/system/ocr-sprint-worker.service`:**
```ini
[Unit]
Description=OCR Sprint Celery Worker
After=network.target postgresql.service redis.service ocr-sprint-api.service
Wants=postgresql.service redis.service
[Service]
Type=simple
User=ocr
Group=ocr
WorkingDirectory=/opt/ocr-sprint-service
# Environment
Environment="PATH=/opt/ocr-sprint-service/.venv/bin:/usr/local/bin:/usr/bin:/bin"
EnvironmentFile=/opt/ocr-sprint-service/.env
# Start command - concurrency 2 untuk 4 core CPU
ExecStart=/opt/ocr-sprint-service/.venv/bin/celery \
-A ocr_sprint.worker.celery_app \
worker \
--loglevel=info \
--concurrency=2 \
--max-tasks-per-child=100
# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=0
# Resource limits
LimitNOFILE=65536
MemoryLimit=4G
# Security
NoNewPrivileges=true
PrivateTmp=true
[Install]
WantedBy=multi-user.target
```
### Enable dan Start Services
```bash
# Reload systemd
sudo systemctl daemon-reload
# Enable services (auto-start on boot)
sudo systemctl enable ocr-sprint-api
sudo systemctl enable ocr-sprint-worker
# Start services
sudo systemctl start ocr-sprint-api
sudo systemctl start ocr-sprint-worker
# Check status
sudo systemctl status ocr-sprint-api
sudo systemctl status ocr-sprint-worker
# View logs
sudo journalctl -u ocr-sprint-api -f
sudo journalctl -u ocr-sprint-worker -f
```
## Langkah 9: Setup Nginx Reverse Proxy
### Install Nginx
```bash
sudo apt install -y nginx certbot python3-certbot-nginx
```
### Konfigurasi Nginx
```bash
sudo nano /etc/nginx/sites-available/ocr-sprint
```
**Content `/etc/nginx/sites-available/ocr-sprint`:**
```nginx
# Upstream untuk load balancing (jika scale horizontal)
upstream ocr_api {
server 127.0.0.1:8000;
keepalive 32;
}
# Rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
server {
listen 80;
server_name ocr.yourdomain.com; # Ganti dengan domain Anda
# Max upload size (sesuaikan dengan BLOB_MAX_UPLOAD_MB)
client_max_body_size 30M;
client_body_buffer_size 128k;
# Timeouts untuk dokumen besar
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
send_timeout 300s;
# Logging
access_log /var/log/nginx/ocr-sprint-access.log;
error_log /var/log/nginx/ocr-sprint-error.log;
# API endpoints
location /api/ {
# Rate limiting
limit_req zone=api_limit burst=20 nodelay;
proxy_pass http://ocr_api;
proxy_http_version 1.1;
# Headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Connection "";
# Disable buffering untuk streaming responses
proxy_buffering off;
}
# Health check endpoint (no rate limit)
location /api/v1/health {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
access_log off;
}
# Metrics endpoint (restrict access)
location /metrics {
# Allow only from internal network
allow 10.0.0.0/8;
allow 172.16.0.0/12;
allow 192.168.0.0/16;
allow 127.0.0.1;
deny all;
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
# Docs (opsional, bisa di-disable di production)
location /docs {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
location /redoc {
proxy_pass http://ocr_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
}
```
### Enable Site
```bash
# Test konfigurasi
sudo nginx -t
# Enable site
sudo ln -s /etc/nginx/sites-available/ocr-sprint /etc/nginx/sites-enabled/
# Remove default site (opsional)
sudo rm /etc/nginx/sites-enabled/default
# Reload Nginx
sudo systemctl reload nginx
```
### Setup SSL dengan Let's Encrypt
```bash
# Install certbot
sudo apt install -y certbot python3-certbot-nginx
# Obtain certificate (ganti dengan domain Anda)
sudo certbot --nginx -d ocr.yourdomain.com
# Test auto-renewal
sudo certbot renew --dry-run
```
Certbot akan otomatis mengupdate konfigurasi Nginx untuk HTTPS.
## Langkah 10: Setup Firewall
```bash
# Install UFW (jika belum ada)
sudo apt install -y ufw
# Allow SSH (PENTING! Jangan sampai terkunci)
sudo ufw allow 22/tcp
# Allow HTTP dan HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Enable firewall
sudo ufw enable
# Check status
sudo ufw status
```
## Langkah 11: Verifikasi Deployment
### Test dari Server
```bash
# Health check
curl http://localhost:8000/api/v1/health
# Test dengan API key
curl -X POST http://localhost:8000/api/v1/documents?sync=true \
-H "X-API-Key: your-api-key-here" \
-F "file=@/path/to/test.pdf"
```
### Test dari Client
```bash
# Health check via domain
curl https://ocr.yourdomain.com/api/v1/health
# Upload dokumen
curl -X POST https://ocr.yourdomain.com/api/v1/documents \
-H "X-API-Key: your-api-key-here" \
-F "file=@document.pdf"
```
## Monitoring dan Maintenance
### View Logs
```bash
# API logs
sudo journalctl -u ocr-sprint-api -f
# Worker logs
sudo journalctl -u ocr-sprint-worker -f
# Nginx logs
sudo tail -f /var/log/nginx/ocr-sprint-access.log
sudo tail -f /var/log/nginx/ocr-sprint-error.log
# PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-14-main.log
```
### Service Management
```bash
# Restart services
sudo systemctl restart ocr-sprint-api
sudo systemctl restart ocr-sprint-worker
# Stop services
sudo systemctl stop ocr-sprint-api
sudo systemctl stop ocr-sprint-worker
# Check status
sudo systemctl status ocr-sprint-api
sudo systemctl status ocr-sprint-worker
```
### Database Backup
```bash
# Create backup script
sudo nano /opt/ocr-sprint-service/backup.sh
```
**Content `backup.sh`:**
```bash
#!/bin/bash
BACKUP_DIR="/opt/ocr-sprint-service/backups"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR
# Backup database
pg_dump -U ocr -h localhost ocr_sprint | gzip > $BACKUP_DIR/db_$DATE.sql.gz
# Backup blobs (opsional, bisa besar)
# tar -czf $BACKUP_DIR/blobs_$DATE.tar.gz /opt/ocr-sprint-service/storage/blobs
# Keep only last 7 days
find $BACKUP_DIR -name "db_*.sql.gz" -mtime +7 -delete
echo "Backup completed: $DATE"
```
```bash
# Make executable
chmod +x /opt/ocr-sprint-service/backup.sh
# Setup cron job (daily at 2 AM)
sudo crontab -e
# Add line:
0 2 * * * /opt/ocr-sprint-service/backup.sh >> /var/log/ocr-backup.log 2>&1
```
### Log Rotation
```bash
sudo nano /etc/logrotate.d/ocr-sprint
```
**Content:**
```
/var/log/nginx/ocr-sprint-*.log {
daily
rotate 14
compress
delaycompress
notifempty
create 0640 www-data adm
sharedscripts
postrotate
[ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid`
endscript
}
```
## Update Application
```bash
# Switch ke user ocr
sudo su - ocr
cd /opt/ocr-sprint-service
# Pull latest code
git pull
# Activate venv
source .venv/bin/activate
# Update dependencies
pip install -e ".[ocr]"
# Run migrations
alembic upgrade head
# Exit user ocr
exit
# Restart services
sudo systemctl restart ocr-sprint-api
sudo systemctl restart ocr-sprint-worker
# Check logs
sudo journalctl -u ocr-sprint-api -n 50
```
## Performance Tuning
### Increase Worker Concurrency
```bash
# Edit worker service
sudo nano /etc/systemd/system/ocr-sprint-worker.service
# Ubah --concurrency sesuai CPU cores
# Untuk 8 cores: --concurrency=4
# Untuk 16 cores: --concurrency=8
# Reload dan restart
sudo systemctl daemon-reload
sudo systemctl restart ocr-sprint-worker
```
### PostgreSQL Tuning
```bash
sudo nano /etc/postgresql/14/main/postgresql.conf
```
**Recommended settings untuk 16GB RAM:**
```
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 10MB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
```
```bash
sudo systemctl restart postgresql
```
### Redis Tuning
```bash
sudo nano /etc/redis/redis.conf
```
**Recommended settings:**
```
maxmemory 2gb
maxmemory-policy allkeys-lru
save "" # Disable RDB snapshots untuk performance
```
```bash
sudo systemctl restart redis
```
## Troubleshooting
### Service tidak start
```bash
# Check logs
sudo journalctl -u ocr-sprint-api -n 100 --no-pager
sudo journalctl -u ocr-sprint-worker -n 100 --no-pager
# Check permissions
ls -la /opt/ocr-sprint-service
ls -la /opt/ocr-sprint-service/storage
# Test manual run
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
uvicorn ocr_sprint.main:app --host 0.0.0.0 --port 8000
```
### Database connection error
```bash
# Test connection
sudo -u ocr psql -h localhost -U ocr -d ocr_sprint
# Check PostgreSQL status
sudo systemctl status postgresql
# Check pg_hba.conf
sudo cat /etc/postgresql/14/main/pg_hba.conf | grep ocr
```
### Redis connection error
```bash
# Test Redis
redis-cli ping
# Check Redis status
sudo systemctl status redis
# Check Redis logs
sudo journalctl -u redis -n 50
```
### PaddleOCR model download gagal
```bash
# Download manual
sudo su - ocr
cd /opt/ocr-sprint-service
source .venv/bin/activate
python << EOF
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='latin')
print("Models downloaded successfully")
EOF
```
### Out of memory
```bash
# Check memory usage
free -h
htop
# Reduce worker concurrency
sudo nano /etc/systemd/system/ocr-sprint-worker.service
# Ubah --concurrency=1
# Add swap (jika perlu)
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
```
## Security Checklist
- [ ] API keys diganti dengan nilai random yang kuat
- [ ] Database password diganti dari default
- [ ] Firewall enabled (UFW) - hanya port 22, 80, 443 terbuka
- [ ] SSL/TLS enabled via Let's Encrypt
- [ ] `/metrics` endpoint restricted ke internal network
- [ ] Nginx rate limiting configured
- [ ] PostgreSQL hanya listen di localhost
- [ ] Redis hanya listen di localhost
- [ ] Regular backup configured (cron job)
- [ ] Log rotation configured
- [ ] OS security updates enabled (`unattended-upgrades`)
- [ ] Fail2ban installed untuk SSH protection
## Monitoring dengan Prometheus (Opsional)
### Install Prometheus
```bash
# Download Prometheus
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
sudo mv prometheus-2.45.0.linux-amd64 /opt/prometheus
# Create user
sudo useradd --no-create-home --shell /bin/false prometheus
# Create directories
sudo mkdir /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
```
### Configure Prometheus
```bash
sudo nano /etc/prometheus/prometheus.yml
```
**Content:**
```yaml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'ocr-sprint'
static_configs:
- targets: ['localhost:8000']
metrics_path: '/metrics'
```
### Create Systemd Service
```bash
sudo nano /etc/systemd/system/prometheus.service
```
**Content:**
```ini
[Unit]
Description=Prometheus
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/opt/prometheus/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/
[Install]
WantedBy=multi-user.target
```
```bash
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
```
Access Prometheus di `http://localhost:9090`
## Support
Untuk pertanyaan atau issues, hubungi tim development.