Files
readur/docs/self-hosting/backup.md

532 lines
12 KiB
Markdown

# Backup and Recovery Guide
## Overview
This guide covers comprehensive backup strategies for Readur, including database backups, document storage, configuration files, and disaster recovery procedures.
## What to Backup
### Critical Components
1. **PostgreSQL Database** - Contains all metadata, user data, and system configuration
2. **Document Storage** - Original documents and processed files
3. **Configuration Files** - Environment variables and settings
4. **SSL Certificates** - If using custom certificates
5. **Custom Code** - Any modifications or plugins
### Backup Priority Matrix
| Component | Priority | RPO | RTO | Backup Frequency |
|-----------|----------|-----|-----|------------------|
| Database | Critical | 1 hour | 30 min | Hourly |
| Documents | Critical | 24 hours | 2 hours | Daily |
| Config | High | 24 hours | 1 hour | On change |
| Logs | Medium | 7 days | N/A | Weekly |
| Cache | Low | N/A | N/A | Not required |
## Database Backup
### PostgreSQL Backup Methods
#### Method 1: pg_dump (Logical Backup)
```bash
#!/bin/bash
# backup-database.sh
# Configuration
DB_NAME="readur"
DB_USER="readur"
BACKUP_DIR="/backup/postgres"
DATE=$(date +%Y%m%d_%H%M%S)
# Create backup
pg_dump -U $DB_USER -d $DB_NAME -F custom -f "$BACKUP_DIR/readur_$DATE.dump"
# Compress backup
gzip "$BACKUP_DIR/readur_$DATE.dump"
# Keep only last 30 days
find $BACKUP_DIR -name "*.dump.gz" -mtime +30 -delete
# Upload to S3 (optional)
aws s3 cp "$BACKUP_DIR/readur_$DATE.dump.gz" s3://backup-bucket/postgres/
```
#### Method 2: Physical Backup with pg_basebackup
```bash
#!/bin/bash
# physical-backup.sh
# Stop application (optional for consistency)
docker-compose stop readur
# Perform base backup
pg_basebackup -U replicator -D /backup/pgdata_$(date +%Y%m%d) \
-Fp -Xs -P -R
# Start application
docker-compose start readur
```
#### Method 3: Continuous Archiving (WAL)
```bash
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
max_wal_senders = 3
wal_keep_segments = 64
```
### Docker Database Backup
```bash
#!/bin/bash
# docker-db-backup.sh
# Backup database from Docker container
docker-compose exec -T postgres pg_dump -U readur readur | \
gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
# Alternative: Using docker run
docker run --rm \
--network readur_default \
postgres:14 \
pg_dump -h postgres -U readur readur | \
gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
```
## Document Storage Backup
### Local Storage Backup
```bash
#!/bin/bash
# backup-documents.sh
SOURCE="/data/readur/documents"
BACKUP_DIR="/backup/documents"
DATE=$(date +%Y%m%d)
# Incremental backup with rsync
rsync -avz --delete \
--backup --backup-dir="$BACKUP_DIR/incremental_$DATE" \
"$SOURCE/" "$BACKUP_DIR/current/"
# Create tar archive
tar -czf "$BACKUP_DIR/documents_$DATE.tar.gz" \
-C "$BACKUP_DIR" current/
# Keep only last 7 daily backups
find $BACKUP_DIR -name "documents_*.tar.gz" -mtime +7 -delete
```
### S3 Storage Backup
```bash
#!/bin/bash
# backup-s3.sh
# Sync S3 bucket to another bucket
aws s3 sync s3://readur-documents s3://readur-backup \
--delete \
--storage-class GLACIER_IR
# Or to local storage
aws s3 sync s3://readur-documents /backup/s3-documents \
--delete
```
### Deduplication Strategy
```bash
#!/bin/bash
# dedup-backup.sh
# Use restic for deduplication
restic -r /backup/restic init
# Backup with deduplication
restic -r /backup/restic backup \
/data/readur/documents \
--tag documents \
--host readur-server
# Prune old snapshots
restic -r /backup/restic forget \
--keep-daily 7 \
--keep-weekly 4 \
--keep-monthly 12 \
--prune
```
## Configuration Backup
### Environment and Settings
```bash
#!/bin/bash
# backup-config.sh
CONFIG_DIR="/etc/readur"
BACKUP_DIR="/backup/config"
DATE=$(date +%Y%m%d_%H%M%S)
# Create config archive
tar -czf "$BACKUP_DIR/config_$DATE.tar.gz" \
$CONFIG_DIR/.env \
$CONFIG_DIR/docker-compose.yml \
$CONFIG_DIR/nginx.conf \
/etc/ssl/certs/readur* \
/etc/systemd/system/readur*
# Encrypt sensitive configuration
gpg --encrypt --recipient backup@company.com \
"$BACKUP_DIR/config_$DATE.tar.gz"
# Remove unencrypted file
rm "$BACKUP_DIR/config_$DATE.tar.gz"
```
## Automated Backup Solution
### Complete Backup Script
```bash
#!/bin/bash
# readur-backup.sh
set -e
# Configuration
BACKUP_ROOT="/backup"
S3_BUCKET="s3://company-backups/readur"
SLACK_WEBHOOK="https://hooks.slack.com/services/XXX"
DATE=$(date +%Y%m%d_%H%M%S)
# Functions
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}
notify() {
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$1\"}" $SLACK_WEBHOOK
}
# Create backup directories
mkdir -p "$BACKUP_ROOT"/{database,documents,config,logs}
# 1. Database backup
log "Starting database backup..."
docker-compose exec -T postgres pg_dump -U readur readur | \
gzip > "$BACKUP_ROOT/database/readur_$DATE.sql.gz"
# 2. Documents backup (if local storage)
if [ "$STORAGE_BACKEND" = "local" ]; then
log "Starting documents backup..."
rsync -avz --delete \
/data/readur/documents/ \
"$BACKUP_ROOT/documents/current/"
tar -czf "$BACKUP_ROOT/documents/documents_$DATE.tar.gz" \
-C "$BACKUP_ROOT/documents" current/
fi
# 3. Configuration backup
log "Starting configuration backup..."
tar -czf "$BACKUP_ROOT/config/config_$DATE.tar.gz" \
.env docker-compose.yml
# 4. Upload to S3
log "Uploading to S3..."
aws s3 sync "$BACKUP_ROOT" "$S3_BUCKET" \
--exclude "*/current/*" \
--storage-class STANDARD_IA
# 5. Cleanup old backups
log "Cleaning up old backups..."
find "$BACKUP_ROOT/database" -name "*.sql.gz" -mtime +7 -delete
find "$BACKUP_ROOT/documents" -name "*.tar.gz" -mtime +7 -delete
find "$BACKUP_ROOT/config" -name "*.tar.gz" -mtime +30 -delete
# 6. Verify backup
BACKUP_SIZE=$(du -sh "$BACKUP_ROOT" | cut -f1)
log "Backup completed. Total size: $BACKUP_SIZE"
# 7. Send notification
notify "Readur backup completed successfully. Size: $BACKUP_SIZE"
```
### Cron Schedule
```bash
# /etc/crontab
# Hourly database backup
0 * * * * root /opt/readur/scripts/backup-database.sh
# Daily full backup at 2 AM
0 2 * * * root /opt/readur/scripts/readur-backup.sh
# Weekly configuration backup
0 3 * * 0 root /opt/readur/scripts/backup-config.sh
```
## Recovery Procedures
### Database Recovery
#### From pg_dump Backup
```bash
#!/bin/bash
# restore-database.sh
BACKUP_FILE="$1"
# Stop application
docker-compose stop readur
# Drop existing database
docker-compose exec postgres psql -U postgres -c "DROP DATABASE IF EXISTS readur;"
docker-compose exec postgres psql -U postgres -c "CREATE DATABASE readur OWNER readur;"
# Restore backup
gunzip -c "$BACKUP_FILE" | docker-compose exec -T postgres psql -U readur readur
# Run migrations
docker-compose exec readur alembic upgrade head
# Start application
docker-compose start readur
```
#### Point-in-Time Recovery
```bash
# Restore to specific time
recovery_target_time = '2024-01-15 14:30:00'
# Restore base backup
pg_basebackup -R -D /var/lib/postgresql/data
# Apply WAL logs
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
```
### Document Recovery
```bash
#!/bin/bash
# restore-documents.sh
BACKUP_FILE="$1"
TARGET_DIR="/data/readur/documents"
# Extract backup
tar -xzf "$BACKUP_FILE" -C /tmp/
# Restore with verification
rsync -avz --checksum \
/tmp/current/ \
"$TARGET_DIR/"
# Fix permissions
chown -R readur:readur "$TARGET_DIR"
chmod -R 755 "$TARGET_DIR"
```
### Full System Recovery
```bash
#!/bin/bash
# disaster-recovery.sh
set -e
# 1. Install Docker and dependencies
apt-get update
apt-get install -y docker.io docker-compose
# 2. Restore configuration
gpg --decrypt config_backup.tar.gz.gpg | tar -xzf - -C /etc/readur/
# 3. Pull Docker images
docker-compose pull
# 4. Restore database
gunzip -c database_backup.sql.gz | \
docker-compose exec -T postgres psql -U readur
# 5. Restore documents
tar -xzf documents_backup.tar.gz -C /data/readur/
# 6. Start services
docker-compose up -d
# 7. Verify
curl -f http://localhost:8000/health || exit 1
echo "Recovery completed successfully"
```
## Backup Verification
### Automated Testing
```bash
#!/bin/bash
# verify-backup.sh
# Test database backup
TEST_DB="readur_test"
# Create test database
createdb $TEST_DB
# Restore backup to test database
gunzip -c "$1" | psql $TEST_DB
# Verify data integrity
RECORD_COUNT=$(psql -t -c "SELECT COUNT(*) FROM documents" $TEST_DB)
echo "Restored $RECORD_COUNT documents"
# Cleanup
dropdb $TEST_DB
```
### Backup Monitoring
```python
#!/usr/bin/env python3
# monitor-backups.py
import os
import time
from datetime import datetime, timedelta
import smtplib
from email.mime.text import MIMEText
BACKUP_DIR = "/backup"
MAX_AGE_HOURS = 25 # Alert if backup older than 25 hours
def check_backup_age(directory):
latest_backup = None
latest_time = 0
for file in os.listdir(directory):
if file.endswith('.gz'):
file_time = os.path.getmtime(os.path.join(directory, file))
if file_time > latest_time:
latest_time = file_time
latest_backup = file
if latest_backup:
age = time.time() - latest_time
return latest_backup, age / 3600 # Age in hours
return None, float('inf')
def send_alert(message):
msg = MIMEText(message)
msg['Subject'] = 'Readur Backup Alert'
msg['From'] = 'monitor@company.com'
msg['To'] = 'admin@company.com'
s = smtplib.SMTP('localhost')
s.send_message(msg)
s.quit()
# Check each backup type
for backup_type in ['database', 'documents', 'config']:
dir_path = os.path.join(BACKUP_DIR, backup_type)
filename, age_hours = check_backup_age(dir_path)
if age_hours > MAX_AGE_HOURS:
send_alert(f"WARNING: {backup_type} backup is {age_hours:.1f} hours old")
else:
print(f"OK: {backup_type} backup is {age_hours:.1f} hours old")
```
## Cloud Backup Solutions
### AWS Backup Integration
```yaml
# CloudFormation template
Resources:
BackupPlan:
Type: AWS::Backup::BackupPlan
Properties:
BackupPlan:
BackupPlanName: ReadurBackupPlan
BackupPlanRule:
- RuleName: DailyBackups
TargetBackupVault: Default
ScheduleExpression: "cron(0 5 ? * * *)"
StartWindowMinutes: 60
CompletionWindowMinutes: 120
Lifecycle:
DeleteAfterDays: 30
MoveToColdStorageAfterDays: 7
```
### Backup to Multiple Destinations
```bash
#!/bin/bash
# multi-destination-backup.sh
BACKUP_FILE="readur_$(date +%Y%m%d).tar.gz"
# Local backup
cp "$BACKUP_FILE" /mnt/nas/backups/
# AWS S3
aws s3 cp "$BACKUP_FILE" s3://backup-bucket/
# Google Cloud Storage
gsutil cp "$BACKUP_FILE" gs://backup-bucket/
# Azure Blob Storage
az storage blob upload \
--container-name backups \
--name "$BACKUP_FILE" \
--file "$BACKUP_FILE"
```
## Best Practices
### Security
1. **Encrypt backups** at rest and in transit
2. **Test recovery** procedures regularly
3. **Store backups** in multiple locations
4. **Rotate credentials** used for backup access
5. **Monitor backup** success and failures
### Testing
1. **Monthly recovery drills** to test procedures
2. **Quarterly full recovery** to separate environment
3. **Annual disaster recovery** exercise
4. **Document lessons learned** and update procedures
### Documentation
Maintain documentation for:
- Backup schedules and retention policies
- Recovery procedures and contact information
- RTO/RPO requirements
- Backup verification procedures
- Encryption keys and access credentials
## Related Documentation
- [Storage Configuration](./storage.md)
- [Migration Guide](../migration-guide.md)
- [Security Best Practices](../security-guide.md)
- [Monitoring Setup](./monitoring.md)