mirror of
https://github.com/readur/readur.git
synced 2026-01-08 23:40:24 -06:00
12 KiB
12 KiB
Backup and Recovery Guide
Overview
This guide covers comprehensive backup strategies for Readur, including database backups, document storage, configuration files, and disaster recovery procedures.
What to Backup
Critical Components
- PostgreSQL Database - Contains all metadata, user data, and system configuration
- Document Storage - Original documents and processed files
- Configuration Files - Environment variables and settings
- SSL Certificates - If using custom certificates
- Custom Code - Any modifications or plugins
Backup Priority Matrix
| Component | Priority | RPO | RTO | Backup Frequency |
|---|---|---|---|---|
| Database | Critical | 1 hour | 30 min | Hourly |
| Documents | Critical | 24 hours | 2 hours | Daily |
| Config | High | 24 hours | 1 hour | On change |
| Logs | Medium | 7 days | N/A | Weekly |
| Cache | Low | N/A | N/A | Not required |
Database Backup
PostgreSQL Backup Methods
Method 1: pg_dump (Logical Backup)
#!/bin/bash
# backup-database.sh
# Configuration
DB_NAME="readur"
DB_USER="readur"
BACKUP_DIR="/backup/postgres"
DATE=$(date +%Y%m%d_%H%M%S)
# Create backup
pg_dump -U $DB_USER -d $DB_NAME -F custom -f "$BACKUP_DIR/readur_$DATE.dump"
# Compress backup
gzip "$BACKUP_DIR/readur_$DATE.dump"
# Keep only last 30 days
find $BACKUP_DIR -name "*.dump.gz" -mtime +30 -delete
# Upload to S3 (optional)
aws s3 cp "$BACKUP_DIR/readur_$DATE.dump.gz" s3://backup-bucket/postgres/
Method 2: Physical Backup with pg_basebackup
#!/bin/bash
# physical-backup.sh
# Stop application (optional for consistency)
docker-compose stop readur
# Perform base backup
pg_basebackup -U replicator -D /backup/pgdata_$(date +%Y%m%d) \
-Fp -Xs -P -R
# Start application
docker-compose start readur
Method 3: Continuous Archiving (WAL)
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
max_wal_senders = 3
wal_keep_segments = 64
Docker Database Backup
#!/bin/bash
# docker-db-backup.sh
# Backup database from Docker container
docker-compose exec -T postgres pg_dump -U readur readur | \
gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
# Alternative: Using docker run
docker run --rm \
--network readur_default \
postgres:14 \
pg_dump -h postgres -U readur readur | \
gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
Document Storage Backup
Local Storage Backup
#!/bin/bash
# backup-documents.sh
SOURCE="/data/readur/documents"
BACKUP_DIR="/backup/documents"
DATE=$(date +%Y%m%d)
# Incremental backup with rsync
rsync -avz --delete \
--backup --backup-dir="$BACKUP_DIR/incremental_$DATE" \
"$SOURCE/" "$BACKUP_DIR/current/"
# Create tar archive
tar -czf "$BACKUP_DIR/documents_$DATE.tar.gz" \
-C "$BACKUP_DIR" current/
# Keep only last 7 daily backups
find $BACKUP_DIR -name "documents_*.tar.gz" -mtime +7 -delete
S3 Storage Backup
#!/bin/bash
# backup-s3.sh
# Sync S3 bucket to another bucket
aws s3 sync s3://readur-documents s3://readur-backup \
--delete \
--storage-class GLACIER_IR
# Or to local storage
aws s3 sync s3://readur-documents /backup/s3-documents \
--delete
Deduplication Strategy
#!/bin/bash
# dedup-backup.sh
# Use restic for deduplication
restic -r /backup/restic init
# Backup with deduplication
restic -r /backup/restic backup \
/data/readur/documents \
--tag documents \
--host readur-server
# Prune old snapshots
restic -r /backup/restic forget \
--keep-daily 7 \
--keep-weekly 4 \
--keep-monthly 12 \
--prune
Configuration Backup
Environment and Settings
#!/bin/bash
# backup-config.sh
CONFIG_DIR="/etc/readur"
BACKUP_DIR="/backup/config"
DATE=$(date +%Y%m%d_%H%M%S)
# Create config archive
tar -czf "$BACKUP_DIR/config_$DATE.tar.gz" \
$CONFIG_DIR/.env \
$CONFIG_DIR/docker-compose.yml \
$CONFIG_DIR/nginx.conf \
/etc/ssl/certs/readur* \
/etc/systemd/system/readur*
# Encrypt sensitive configuration
gpg --encrypt --recipient backup@company.com \
"$BACKUP_DIR/config_$DATE.tar.gz"
# Remove unencrypted file
rm "$BACKUP_DIR/config_$DATE.tar.gz"
Automated Backup Solution
Complete Backup Script
#!/bin/bash
# readur-backup.sh
set -e
# Configuration
BACKUP_ROOT="/backup"
S3_BUCKET="s3://company-backups/readur"
SLACK_WEBHOOK="https://hooks.slack.com/services/XXX"
DATE=$(date +%Y%m%d_%H%M%S)
# Functions
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}
notify() {
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$1\"}" $SLACK_WEBHOOK
}
# Create backup directories
mkdir -p "$BACKUP_ROOT"/{database,documents,config,logs}
# 1. Database backup
log "Starting database backup..."
docker-compose exec -T postgres pg_dump -U readur readur | \
gzip > "$BACKUP_ROOT/database/readur_$DATE.sql.gz"
# 2. Documents backup (if local storage)
if [ "$STORAGE_BACKEND" = "local" ]; then
log "Starting documents backup..."
rsync -avz --delete \
/data/readur/documents/ \
"$BACKUP_ROOT/documents/current/"
tar -czf "$BACKUP_ROOT/documents/documents_$DATE.tar.gz" \
-C "$BACKUP_ROOT/documents" current/
fi
# 3. Configuration backup
log "Starting configuration backup..."
tar -czf "$BACKUP_ROOT/config/config_$DATE.tar.gz" \
.env docker-compose.yml
# 4. Upload to S3
log "Uploading to S3..."
aws s3 sync "$BACKUP_ROOT" "$S3_BUCKET" \
--exclude "*/current/*" \
--storage-class STANDARD_IA
# 5. Cleanup old backups
log "Cleaning up old backups..."
find "$BACKUP_ROOT/database" -name "*.sql.gz" -mtime +7 -delete
find "$BACKUP_ROOT/documents" -name "*.tar.gz" -mtime +7 -delete
find "$BACKUP_ROOT/config" -name "*.tar.gz" -mtime +30 -delete
# 6. Verify backup
BACKUP_SIZE=$(du -sh "$BACKUP_ROOT" | cut -f1)
log "Backup completed. Total size: $BACKUP_SIZE"
# 7. Send notification
notify "Readur backup completed successfully. Size: $BACKUP_SIZE"
Cron Schedule
# /etc/crontab
# Hourly database backup
0 * * * * root /opt/readur/scripts/backup-database.sh
# Daily full backup at 2 AM
0 2 * * * root /opt/readur/scripts/readur-backup.sh
# Weekly configuration backup
0 3 * * 0 root /opt/readur/scripts/backup-config.sh
Recovery Procedures
Database Recovery
From pg_dump Backup
#!/bin/bash
# restore-database.sh
BACKUP_FILE="$1"
# Stop application
docker-compose stop readur
# Drop existing database
docker-compose exec postgres psql -U postgres -c "DROP DATABASE IF EXISTS readur;"
docker-compose exec postgres psql -U postgres -c "CREATE DATABASE readur OWNER readur;"
# Restore backup
gunzip -c "$BACKUP_FILE" | docker-compose exec -T postgres psql -U readur readur
# Run migrations
docker-compose exec readur alembic upgrade head
# Start application
docker-compose start readur
Point-in-Time Recovery
# Restore to specific time
recovery_target_time = '2024-01-15 14:30:00'
# Restore base backup
pg_basebackup -R -D /var/lib/postgresql/data
# Apply WAL logs
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
Document Recovery
#!/bin/bash
# restore-documents.sh
BACKUP_FILE="$1"
TARGET_DIR="/data/readur/documents"
# Extract backup
tar -xzf "$BACKUP_FILE" -C /tmp/
# Restore with verification
rsync -avz --checksum \
/tmp/current/ \
"$TARGET_DIR/"
# Fix permissions
chown -R readur:readur "$TARGET_DIR"
chmod -R 755 "$TARGET_DIR"
Full System Recovery
#!/bin/bash
# disaster-recovery.sh
set -e
# 1. Install Docker and dependencies
apt-get update
apt-get install -y docker.io docker-compose
# 2. Restore configuration
gpg --decrypt config_backup.tar.gz.gpg | tar -xzf - -C /etc/readur/
# 3. Pull Docker images
docker-compose pull
# 4. Restore database
gunzip -c database_backup.sql.gz | \
docker-compose exec -T postgres psql -U readur
# 5. Restore documents
tar -xzf documents_backup.tar.gz -C /data/readur/
# 6. Start services
docker-compose up -d
# 7. Verify
curl -f http://localhost:8000/health || exit 1
echo "Recovery completed successfully"
Backup Verification
Automated Testing
#!/bin/bash
# verify-backup.sh
# Test database backup
TEST_DB="readur_test"
# Create test database
createdb $TEST_DB
# Restore backup to test database
gunzip -c "$1" | psql $TEST_DB
# Verify data integrity
RECORD_COUNT=$(psql -t -c "SELECT COUNT(*) FROM documents" $TEST_DB)
echo "Restored $RECORD_COUNT documents"
# Cleanup
dropdb $TEST_DB
Backup Monitoring
#!/usr/bin/env python3
# monitor-backups.py
import os
import time
from datetime import datetime, timedelta
import smtplib
from email.mime.text import MIMEText
BACKUP_DIR = "/backup"
MAX_AGE_HOURS = 25 # Alert if backup older than 25 hours
def check_backup_age(directory):
latest_backup = None
latest_time = 0
for file in os.listdir(directory):
if file.endswith('.gz'):
file_time = os.path.getmtime(os.path.join(directory, file))
if file_time > latest_time:
latest_time = file_time
latest_backup = file
if latest_backup:
age = time.time() - latest_time
return latest_backup, age / 3600 # Age in hours
return None, float('inf')
def send_alert(message):
msg = MIMEText(message)
msg['Subject'] = 'Readur Backup Alert'
msg['From'] = 'monitor@company.com'
msg['To'] = 'admin@company.com'
s = smtplib.SMTP('localhost')
s.send_message(msg)
s.quit()
# Check each backup type
for backup_type in ['database', 'documents', 'config']:
dir_path = os.path.join(BACKUP_DIR, backup_type)
filename, age_hours = check_backup_age(dir_path)
if age_hours > MAX_AGE_HOURS:
send_alert(f"WARNING: {backup_type} backup is {age_hours:.1f} hours old")
else:
print(f"OK: {backup_type} backup is {age_hours:.1f} hours old")
Cloud Backup Solutions
AWS Backup Integration
# CloudFormation template
Resources:
BackupPlan:
Type: AWS::Backup::BackupPlan
Properties:
BackupPlan:
BackupPlanName: ReadurBackupPlan
BackupPlanRule:
- RuleName: DailyBackups
TargetBackupVault: Default
ScheduleExpression: "cron(0 5 ? * * *)"
StartWindowMinutes: 60
CompletionWindowMinutes: 120
Lifecycle:
DeleteAfterDays: 30
MoveToColdStorageAfterDays: 7
Backup to Multiple Destinations
#!/bin/bash
# multi-destination-backup.sh
BACKUP_FILE="readur_$(date +%Y%m%d).tar.gz"
# Local backup
cp "$BACKUP_FILE" /mnt/nas/backups/
# AWS S3
aws s3 cp "$BACKUP_FILE" s3://backup-bucket/
# Google Cloud Storage
gsutil cp "$BACKUP_FILE" gs://backup-bucket/
# Azure Blob Storage
az storage blob upload \
--container-name backups \
--name "$BACKUP_FILE" \
--file "$BACKUP_FILE"
Best Practices
Security
- Encrypt backups at rest and in transit
- Test recovery procedures regularly
- Store backups in multiple locations
- Rotate credentials used for backup access
- Monitor backup success and failures
Testing
- Monthly recovery drills to test procedures
- Quarterly full recovery to separate environment
- Annual disaster recovery exercise
- Document lessons learned and update procedures
Documentation
Maintain documentation for:
- Backup schedules and retention policies
- Recovery procedures and contact information
- RTO/RPO requirements
- Backup verification procedures
- Encryption keys and access credentials