Files
readur/docs/configuration-reference.md

19 KiB

Configuration Reference

This document provides a comprehensive reference for all configuration options available in Readur, including environment variables, configuration files, and runtime settings.

Environment Variables

Core Configuration

Variable Type Default Description Required
DATABASE_URL String postgresql://readur:readur@localhost/readur PostgreSQL connection string (takes priority over individual vars) Yes*
POSTGRES_HOST String localhost PostgreSQL host (used if DATABASE_URL not set) No
POSTGRES_PORT String 5432 PostgreSQL port (used if DATABASE_URL not set) No
POSTGRES_DB String readur PostgreSQL database name (used if DATABASE_URL not set) No
POSTGRES_USER String readur PostgreSQL username (used if DATABASE_URL not set) No
POSTGRES_PASSWORD String readur PostgreSQL password (used if DATABASE_URL not set) No
SERVER_ADDRESS String 0.0.0.0:8080 Server bind address (host:port) No
SERVER_HOST String 0.0.0.0 Server host (used if SERVER_ADDRESS not set) No
SERVER_PORT String 8080 Server port (used if SERVER_ADDRESS not set) No
JWT_SECRET String Auto-generated Secret key for JWT tokens (min 32 chars) Recommended
SESSION_SECRET String Auto-generated Secret for session encryption Recommended
UPLOAD_PATH String ./uploads Directory for file uploads No
ALLOWED_FILE_TYPES String pdf,txt,doc,docx,png,jpg,jpeg Comma-separated allowed extensions No
LOG_LEVEL String info Logging level (debug, info, warn, error) No
LOG_FORMAT String text Log format (text, json) No

Authentication & Security

Variable Type Default Description Required
AUTH_ENABLED Boolean true Enable authentication No
DEFAULT_USER_ROLE String viewer Default role for new users (admin, editor, viewer) No
AUTO_CREATE_USERS Boolean false Auto-create users on first login (OIDC) No
SESSION_TIMEOUT Integer 3600 Session timeout in seconds No
PASSWORD_MIN_LENGTH Integer 8 Minimum password length No
REQUIRE_EMAIL_VERIFICATION Boolean false Require email verification No
MAX_LOGIN_ATTEMPTS Integer 5 Maximum failed login attempts No
LOCKOUT_DURATION Integer 900 Account lockout duration (seconds) No

OIDC/SSO Configuration

Variable Type Default Description Required
OIDC_ENABLED Boolean false Enable OIDC authentication No
OIDC_CLIENT_ID String - OIDC client ID If OIDC enabled
OIDC_CLIENT_SECRET String - OIDC client secret If OIDC enabled
OIDC_ISSUER_URL String - OIDC issuer URL If OIDC enabled
OIDC_REDIRECT_URI String - OIDC redirect URI If OIDC enabled
OIDC_SCOPES String openid profile email OIDC scopes No
OIDC_USER_INFO_ENDPOINT String Auto-discovered User info endpoint No
OIDC_TOKEN_ENDPOINT String Auto-discovered Token endpoint No
OIDC_AUTH_ENDPOINT String Auto-discovered Authorization endpoint No

Storage Configuration

Local Storage

Variable Type Default Description Required
STORAGE_TYPE String local Storage backend (local, s3, azure) No
LOCAL_STORAGE_PATH String ./uploads Local storage directory No
TEMP_STORAGE_PATH String ./uploads/temp Temporary files directory No
THUMBNAIL_PATH String ./uploads/thumbnails Thumbnail storage directory No
BACKUP_PATH String ./uploads/backups Backup directory No

S3 Storage

Variable Type Default Description Required
S3_ENABLED Boolean false Enable S3 storage backend No
S3_BUCKET_NAME String - S3 bucket name If S3 enabled
S3_ACCESS_KEY_ID String - AWS Access Key ID If S3 enabled
S3_SECRET_ACCESS_KEY String - AWS Secret Access Key If S3 enabled
S3_REGION String us-east-1 AWS region No
S3_ENDPOINT_URL String - Custom S3 endpoint (for MinIO, etc.) No
S3_PREFIX String - S3 key prefix No
S3_USE_SSL Boolean true Use HTTPS for S3 No
S3_VERIFY_SSL Boolean true Verify SSL certificates No
S3_STORAGE_CLASS String STANDARD S3 storage class No
S3_SERVER_SIDE_ENCRYPTION String - Server-side encryption (AES256, aws:kms) No
S3_KMS_KEY_ID String - KMS key ID for encryption No

Watch Directory Configuration

Variable Type Default Description Required
WATCH_FOLDER String ./watch Global watch directory No
USER_WATCH_BASE_DIR String ./user_watch Base directory for per-user folders No
ENABLE_PER_USER_WATCH Boolean false Enable per-user watch directories No
WATCH_INTERVAL_SECONDS Integer 60 Scan interval in seconds No
FILE_STABILITY_CHECK_MS Integer 2000 File stability check delay (ms) No
MAX_FILE_AGE_HOURS Integer 24 Maximum file age to process No
WATCH_RECURSIVE Boolean true Watch subdirectories recursively No
WATCH_FILE_PATTERNS String * File patterns to watch (glob) No
WATCH_IGNORE_PATTERNS String .*,~*,*.tmp Patterns to ignore No
MOVE_AFTER_PROCESSING Boolean false Move files after processing No
PROCESSED_FILES_DIR String ./processed Directory for processed files No

OCR Configuration

Variable Type Default Description Required
OCR_ENABLED Boolean true Enable OCR processing No
OCR_LANGUAGE String eng Default OCR language(s) No
OCR_ENGINE String tesseract OCR engine (tesseract, cloud) No
CONCURRENT_OCR_JOBS Integer CPU cores / 2 Concurrent OCR workers No
OCR_TIMEOUT_SECONDS Integer 300 OCR timeout per document No
OCR_RETRY_ATTEMPTS Integer 3 OCR retry attempts No
OCR_RETRY_DELAY Integer 60 Delay between retries (seconds) No
OCR_CONFIDENCE_THRESHOLD Float 0.6 Minimum OCR confidence No
MAX_FILE_SIZE_MB Integer 100 Maximum file size for OCR No
OCR_DPI Integer 300 DPI for image processing No
OCR_PSM Integer 3 Tesseract page segmentation mode No
OCR_OEM Integer 1 Tesseract OCR engine mode No
TESSERACT_DATA_PATH String /usr/share/tesseract-ocr/4.00/tessdata Tesseract data directory No

Database Configuration

Variable Type Default Description Required
DATABASE_MAX_CONNECTIONS Integer 32 Maximum database connections No
DATABASE_MIN_CONNECTIONS Integer 5 Minimum idle connections No
DATABASE_CONNECT_TIMEOUT Integer 5 Connection timeout (seconds) No
DATABASE_ACQUIRE_TIMEOUT Integer 10 Acquire timeout (seconds) No
DATABASE_IDLE_TIMEOUT Integer 600 Idle connection timeout No
DATABASE_MAX_LIFETIME Integer 1800 Max connection lifetime No
DATABASE_SSL_MODE String prefer SSL mode (disable, prefer, require) No
DATABASE_SSL_CERT String - Path to SSL certificate No
DATABASE_SSL_KEY String - Path to SSL key No
DATABASE_SSL_ROOT_CERT String - Path to root certificate No

Performance & Resources

Variable Type Default Description Required
MEMORY_LIMIT_MB Integer 2048 Memory limit in MB No
CPU_CORES Integer Auto-detect Number of CPU cores to use No
WORKER_THREADS Integer CPU cores Worker thread count No
BLOCKING_THREADS Integer 512 Blocking thread pool size No
CACHE_SIZE_MB Integer 256 In-memory cache size No
BATCH_SIZE Integer 100 Default batch processing size No
PARALLEL_UPLOADS Integer 5 Concurrent file uploads No
REQUEST_TIMEOUT Integer 30 HTTP request timeout (seconds) No
RATE_LIMIT_ENABLED Boolean true Enable rate limiting No
RATE_LIMIT_PER_MINUTE Integer 100 Requests per minute limit No

Notification Configuration

Variable Type Default Description Required
NOTIFICATIONS_ENABLED Boolean true Enable notifications No
EMAIL_ENABLED Boolean false Enable email notifications No
SMTP_HOST String - SMTP server host If email enabled
SMTP_PORT Integer 587 SMTP server port No
SMTP_USERNAME String - SMTP username If email enabled
SMTP_PASSWORD String - SMTP password If email enabled
SMTP_FROM_ADDRESS String - From email address If email enabled
SMTP_USE_TLS Boolean true Use TLS for SMTP No
WEBHOOK_ENABLED Boolean false Enable webhook notifications No
WEBHOOK_URL String - Webhook endpoint URL If webhook enabled
WEBHOOK_SECRET String - Webhook signing secret No

Monitoring & Metrics

Variable Type Default Description Required
METRICS_ENABLED Boolean true Enable metrics collection No
PROMETHEUS_ENABLED Boolean false Enable Prometheus metrics No
PROMETHEUS_PORT Integer 9090 Prometheus metrics port No
HEALTH_CHECK_PATH String /health Health check endpoint No
READY_CHECK_PATH String /ready Readiness check endpoint No
METRICS_PATH String /metrics Metrics endpoint No
TRACING_ENABLED Boolean false Enable distributed tracing No
JAEGER_ENDPOINT String - Jaeger collector endpoint If tracing enabled
TRACE_SAMPLE_RATE Float 0.1 Trace sampling rate (0-1) No

Network Configuration

Variable Type Default Description Required
CORS_ENABLED Boolean true Enable CORS No
CORS_ALLOWED_ORIGINS String * Allowed CORS origins No
CORS_ALLOWED_METHODS String GET,POST,PUT,DELETE,OPTIONS Allowed HTTP methods No
CORS_ALLOWED_HEADERS String * Allowed headers No
CORS_MAX_AGE Integer 3600 CORS preflight cache (seconds) No
PROXY_COUNT Integer 0 Number of reverse proxies No
TRUSTED_PROXIES String - Comma-separated trusted proxy IPs No
WEBSOCKET_ENABLED Boolean true Enable WebSocket support No
WEBSOCKET_MAX_CONNECTIONS Integer 1000 Maximum WebSocket connections No

Feature Flags

Variable Type Default Description Required
FEATURE_ADVANCED_SEARCH Boolean true Enable advanced search No
FEATURE_LABELS Boolean true Enable document labels No
FEATURE_SOURCES Boolean true Enable external sources No
FEATURE_ANALYTICS Boolean true Enable analytics dashboard No
FEATURE_NOTIFICATIONS Boolean true Enable notifications No
FEATURE_MULTI_LANGUAGE_OCR Boolean true Enable multi-language OCR No
FEATURE_WEBDAV Boolean true Enable WebDAV sync No
FEATURE_API_V2 Boolean false Enable API v2 endpoints No

Database Connection Priority

The database connection can be configured in two ways:

  1. Using DATABASE_URL (takes priority if set):

    DATABASE_URL=postgresql://username:password@host:port/database
    
  2. Using individual PostgreSQL variables (used if DATABASE_URL is not set):

    POSTGRES_HOST=localhost
    POSTGRES_PORT=5432
    POSTGRES_DB=readur
    POSTGRES_USER=readur
    POSTGRES_PASSWORD=your_password
    

This flexibility allows for easy deployment across different platforms:

  • Docker/Kubernetes: Often provide individual variables
  • Heroku/Railway: Typically provide DATABASE_URL
  • Local Development: Use either method based on preference

Configuration Files

Main Configuration (readur.yml)

# readur.yml - Main configuration file
server:
  address: 0.0.0.0
  port: 8080
  workers: 4

database:
  url: postgresql://readur:password@localhost/readur
  max_connections: 32
  min_connections: 5

storage:
  type: s3  # or 'local'
  s3:
    bucket: readur-documents
    region: us-east-1
    prefix: documents/

ocr:
  enabled: true
  language: eng+fra+deu
  concurrent_jobs: 4
  timeout: 300

auth:
  jwt_secret: ${JWT_SECRET}
  session_timeout: 3600
  
oidc:
  enabled: false
  client_id: ${OIDC_CLIENT_ID}
  client_secret: ${OIDC_CLIENT_SECRET}
  issuer_url: https://auth.example.com

Docker Compose Override

# docker-compose.override.yml
version: '3.8'
services:
  readur:
    environment:
      - DATABASE_URL=postgresql://readur:${DB_PASSWORD}@db:5432/readur
      - JWT_SECRET=${JWT_SECRET}
      - S3_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
      - S3_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
      - LOG_LEVEL=debug
    volumes:
      - ./config/readur.yml:/app/config/readur.yml:ro

Kubernetes ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: readur-config
data:
  SERVER_PORT: "8080"
  DATABASE_MAX_CONNECTIONS: "50"
  OCR_LANGUAGE: "eng+spa+fra"
  CONCURRENT_OCR_JOBS: "8"
  LOG_LEVEL: "info"
  CORS_ALLOWED_ORIGINS: "https://app.example.com"

Kubernetes Secret

apiVersion: v1
kind: Secret
metadata:
  name: readur-secrets
type: Opaque
stringData:
  DATABASE_URL: "postgresql://readur:password@postgres:5432/readur"
  JWT_SECRET: "your-secure-random-secret-min-32-chars"
  S3_ACCESS_KEY_ID: "AKIAIOSFODNN7EXAMPLE"
  S3_SECRET_ACCESS_KEY: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

Configuration Precedence

Configuration values are loaded in the following order (later sources override earlier ones):

  1. Default values (built into application)
  2. Configuration file (readur.yml)
  3. Environment variables
  4. Command-line arguments
  5. Database settings (for user-specific settings)

Multi-Environment Configuration

Development

# .env.development
DATABASE_URL=postgresql://readur:readur@localhost/readur_dev
SERVER_PORT=8080
LOG_LEVEL=debug
OCR_ENABLED=false
S3_ENABLED=false
CORS_ALLOWED_ORIGINS=http://localhost:3000

Staging

# .env.staging
DATABASE_URL=postgresql://readur:${DB_PASSWORD}@db-staging.internal/readur_staging
SERVER_PORT=8080
LOG_LEVEL=info
S3_ENABLED=true
S3_BUCKET_NAME=readur-staging
CORS_ALLOWED_ORIGINS=https://staging.readur.app

Production

# .env.production
DATABASE_URL=postgresql://readur:${DB_PASSWORD}@db-prod.internal/readur_prod
SERVER_PORT=8080
LOG_LEVEL=warn
S3_ENABLED=true
S3_BUCKET_NAME=readur-production
CORS_ALLOWED_ORIGINS=https://readur.app
RATE_LIMIT_ENABLED=true
PROMETHEUS_ENABLED=true

Dynamic Configuration

Some settings can be changed at runtime without restart:

User Settings (per-user)

  • Theme preference
  • Language preference
  • Items per page
  • Email notifications
  • OCR default language

System Settings (admin only)

  • OCR concurrent jobs
  • Rate limits
  • Feature flags
  • Notification settings

Access via API:

# Get current settings
GET /api/settings

# Update settings
PUT /api/settings
{
  "ocr_concurrent_jobs": 6,
  "rate_limit_per_minute": 200
}

Configuration Validation

Readur validates configuration on startup:

# Test configuration
readur --config-test

# Validate specific file
readur --config-file readur.yml --validate

# Show effective configuration
readur --show-config

Common validation errors:

Invalid database URL: The database connection string format is incorrect or malformed.

Error: Invalid DATABASE_URL format
Expected: postgresql://user:pass@host:port/database
  1. Missing required S3 credentials

    Error: S3_ENABLED=true but S3_ACCESS_KEY_ID not set
    
  2. Path conflicts

    Error: UPLOAD_PATH and WATCH_FOLDER cannot be the same directory
    

Best Practices

Security

Never commit secrets: Always use environment variables or secret management systems to protect sensitive information.

# Use environment variables
JWT_SECRET=${JWT_SECRET}

# Or use secret management
JWT_SECRET=$(vault kv get -field=jwt_secret secret/readur)

Use strong secrets: Generate cryptographically secure secrets with sufficient entropy.

# Generate secure secrets
openssl rand -hex 32

Rotate secrets regularly: Implement a schedule for rotating sensitive credentials.

# Quarterly rotation
0 0 1 */3 * /scripts/rotate-secrets.sh

Performance

Tune database connections: Configure the optimal number of database connections based on your system's resources.

# Formula: connections = (worker_threads * 2) + management_connections
DATABASE_MAX_CONNECTIONS=$(($(nproc) * 2 + 5))

Optimize OCR workers: Set the appropriate number of concurrent OCR workers to balance performance and resource usage.

# Formula: ocr_workers = cpu_cores / 2
CONCURRENT_OCR_JOBS=$(($(nproc) / 2))

Configure caching: Set up appropriate cache sizes to improve response times while managing memory usage.

# Cache size based on available memory
CACHE_SIZE_MB=$(($(free -m | awk 'NR==2{print $7}') / 4))

Monitoring

Enable metrics in production: Turn on metrics collection to monitor system performance and health.

METRICS_ENABLED=true
PROMETHEUS_ENABLED=true

Set appropriate log levels: Configure logging verbosity based on your environment and debugging needs.

# Production
LOG_LEVEL=warn

# Debugging
LOG_LEVEL=debug

Configure alerts: Set up alerting to be notified of critical system events.

WEBHOOK_URL=https://alerts.example.com/readur

Troubleshooting Configuration

Debug Configuration Loading

# Enable verbose configuration logging
RUST_LOG=readur::config=debug readur

# Show configuration sources
readur --config-sources

Common Issues

  1. Environment variable not loading

    • Check variable name (must match exactly)
    • Verify no spaces around =
    • Check for quotes in values
  2. Configuration file ignored

    • Verify file path
    • Check YAML syntax
    • Ensure proper permissions
  3. Settings not taking effect

    • Check configuration precedence
    • Verify no overrides
    • Some settings require restart

Migration from Previous Versions

From v1.x to v2.x

# Migration script
#!/bin/bash

# Update environment variables
sed -i 's/STORAGE_PATH/UPLOAD_PATH/g' .env
sed -i 's/OCR_WORKERS/CONCURRENT_OCR_JOBS/g' .env

# Add new required variables
echo "S3_ENABLED=false" >> .env
echo "ENABLE_PER_USER_WATCH=false" >> .env

From v2.x to v3.x

# New variables in v3.x
echo "OIDC_ENABLED=false" >> .env
echo "FEATURE_MULTI_LANGUAGE_OCR=true" >> .env