Files
readur/docs/configuration.md
2025-08-16 18:58:35 +00:00

263 lines
7.8 KiB
Markdown

# Configuration Guide
This guide covers all configuration options available in Readur through environment variables and runtime settings.
> 📖 **See Also**: For a complete reference of all configuration options including S3 storage and advanced settings, see the [Configuration Reference](configuration-reference.md).
## Table of Contents
- [Environment Variables](#environment-variables)
- [Core Configuration](#core-configuration)
- [File Storage & Upload](#file-storage-upload)
- [Watch Folder Configuration](#watch-folder-configuration)
- [OCR & Processing Settings](#ocr-processing-settings)
- [Search & Performance](#search-performance)
- [Data Management](#data-management)
- [Port Configuration](#port-configuration)
- [Example Configurations](#example-configurations)
- [Configuration Priority](#configuration-priority)
- [Runtime Settings vs Environment Variables](#runtime-settings-vs-environment-variables)
- [Database Tuning](#database-tuning)
## Environment Variables
All application settings can be configured via environment variables:
### Core Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| `DATABASE_URL` | `postgresql://readur:readur@localhost/readur` | PostgreSQL connection string |
| `JWT_SECRET` | `your-secret-key` | Secret key for JWT tokens ⚠️ **Change in production!** |
| `SERVER_ADDRESS` | `0.0.0.0:8000` | Server bind address and port |
### File Storage & Upload
| Variable | Default | Description |
|----------|---------|-------------|
| `UPLOAD_PATH` | `./uploads` | Document storage directory |
| `ALLOWED_FILE_TYPES` | `pdf,txt,doc,docx,png,jpg,jpeg` | Comma-separated allowed file extensions |
### Watch Folder Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| `WATCH_FOLDER` | `./watch` | Directory to monitor for new files |
| `WATCH_INTERVAL_SECONDS` | `30` | Polling interval for network filesystems (seconds) |
| `FILE_STABILITY_CHECK_MS` | `500` | Time to wait for file write completion (milliseconds) |
| `MAX_FILE_AGE_HOURS` | _(none)_ | Skip files older than this many hours |
| `FORCE_POLLING_WATCH` | _(none)_ | Force polling mode even for local filesystems |
### OCR & Processing Settings
*Note: These settings can also be configured per-user via the web interface*
| Variable | Default | Description |
|----------|---------|-------------|
| `OCR_LANGUAGE` | `eng` | OCR language code (eng, fra, deu, spa, etc.) |
| `CONCURRENT_OCR_JOBS` | `4` | Maximum parallel OCR processes |
| `OCR_TIMEOUT_SECONDS` | `300` | OCR processing timeout per file |
| `MAX_FILE_SIZE_MB` | `50` | Maximum file size for processing |
| `AUTO_ROTATE_IMAGES` | `true` | Automatically rotate images for better OCR |
| `ENABLE_IMAGE_PREPROCESSING` | `true` | Apply image enhancement before OCR |
### Search & Performance
| Variable | Default | Description |
|----------|---------|-------------|
| `SEARCH_RESULTS_PER_PAGE` | `25` | Default number of search results per page |
| `SEARCH_SNIPPET_LENGTH` | `200` | Length of text snippets in search results |
| `FUZZY_SEARCH_THRESHOLD` | `0.8` | Similarity threshold for fuzzy search (0.0-1.0) |
| `MEMORY_LIMIT_MB` | `512` | Memory limit for OCR processes |
| `CPU_PRIORITY` | `normal` | CPU priority: `low`, `normal`, `high` |
### Data Management
| Variable | Default | Description |
|----------|---------|-------------|
| `RETENTION_DAYS` | _(none)_ | Auto-delete documents after N days |
| `ENABLE_AUTO_CLEANUP` | `false` | Enable automatic cleanup of old documents |
| `ENABLE_COMPRESSION` | `false` | Compress stored documents to save space |
| `ENABLE_BACKGROUND_OCR` | `true` | Process OCR in background queue |
## Port Configuration
Readur supports flexible port configuration:
```bash
# Method 1: Specify full server address
SERVER_ADDRESS=0.0.0.0:8000
# Method 2: Use separate host and port (recommended)
SERVER_HOST=0.0.0.0
SERVER_PORT=8000
# For development: Configure frontend port
CLIENT_PORT=5173
BACKEND_PORT=8000
```
## Example Configurations
### Development Configuration
```env
# Basic development setup
DATABASE_URL=postgresql://readur:readur@localhost/readur
JWT_SECRET=dev-secret-key-not-for-production
SERVER_ADDRESS=0.0.0.0:8000
UPLOAD_PATH=./uploads
WATCH_FOLDER=./watch
OCR_LANGUAGE=eng
CONCURRENT_OCR_JOBS=2
```
### Production Configuration
```env
# Core settings
DATABASE_URL=postgresql://readur:secure_password@postgres:5432/readur
JWT_SECRET=your-very-long-random-secret-key-generated-with-openssl
SERVER_ADDRESS=0.0.0.0:8000
# File handling
UPLOAD_PATH=/app/uploads
ALLOWED_FILE_TYPES=pdf,png,jpg,jpeg,tiff,bmp,gif,txt,rtf,doc,docx
# Watch folder for NFS mount
WATCH_FOLDER=/mnt/nfs/documents
WATCH_INTERVAL_SECONDS=60
FILE_STABILITY_CHECK_MS=1000
MAX_FILE_AGE_HOURS=168
FORCE_POLLING_WATCH=1
# OCR optimization
OCR_LANGUAGE=eng
CONCURRENT_OCR_JOBS=8
OCR_TIMEOUT_SECONDS=600
MAX_FILE_SIZE_MB=200
AUTO_ROTATE_IMAGES=true
ENABLE_IMAGE_PREPROCESSING=true
# Performance tuning
MEMORY_LIMIT_MB=2048
CPU_PRIORITY=high
ENABLE_COMPRESSION=true
ENABLE_BACKGROUND_OCR=true
# Search optimization
SEARCH_RESULTS_PER_PAGE=50
SEARCH_SNIPPET_LENGTH=300
FUZZY_SEARCH_THRESHOLD=0.7
# Data management
RETENTION_DAYS=2555 # 7 years
ENABLE_AUTO_CLEANUP=true
```
### Network Filesystem Configuration
```env
# For NFS mounts
WATCH_FOLDER=/mnt/nfs/documents
WATCH_INTERVAL_SECONDS=60
FILE_STABILITY_CHECK_MS=1000
FORCE_POLLING_WATCH=1
# For SMB/CIFS mounts
WATCH_FOLDER=/mnt/smb/shared
WATCH_INTERVAL_SECONDS=30
FILE_STABILITY_CHECK_MS=2000
# For S3 mounts (using s3fs)
WATCH_FOLDER=/mnt/s3/bucket
WATCH_INTERVAL_SECONDS=120
FILE_STABILITY_CHECK_MS=5000
FORCE_POLLING_WATCH=1
```
## Configuration Priority
Settings are applied in this order (later values override earlier ones):
1. **Application defaults** (built into the code)
2. **Environment variables** (system-wide configuration)
3. **User settings** (per-user database settings via web interface)
This allows for flexible deployment where system administrators can set defaults while users can customize their experience.
## Runtime Settings vs Environment Variables
Some settings can be configured in two ways:
1. **Environment Variables**: Set at container startup, affects the entire application
2. **User Settings**: Configured per-user via the web interface, stored in database
**Environment variables take precedence** and provide system-wide defaults. User settings override these defaults for individual users where applicable.
Settings configurable via web interface:
- OCR language preferences
- Search result limits
- File type restrictions
- OCR processing options
- Data retention policies
## Database Tuning
For better search performance with large document collections:
```sql
-- Increase shared_buffers for better caching
ALTER SYSTEM SET shared_buffers = '256MB';
-- Optimize for full-text search
ALTER SYSTEM SET default_text_search_config = 'pg_catalog.english';
-- Restart PostgreSQL after changes
```
## Security Configuration
### Generating Secure Secrets
```bash
# Generate secure JWT secret
JWT_SECRET=$(openssl rand -base64 64)
# Generate secure database password
DB_PASSWORD=$(openssl rand -base64 32)
# Save to .env file
cat > .env << EOF
JWT_SECRET=${JWT_SECRET}
DB_PASSWORD=${DB_PASSWORD}
EOF
```
### Quick Reference - Essential Variables
For a minimal production deployment, configure these essential variables:
```bash
# Security (REQUIRED)
JWT_SECRET=your-secure-random-key-here
DATABASE_URL=postgresql://user:password@host:port/database
# File Storage
UPLOAD_PATH=/app/uploads
WATCH_FOLDER=/path/to/mounted/folder
# Watch Folder (for network mounts)
WATCH_INTERVAL_SECONDS=60
FORCE_POLLING_WATCH=1
# Performance
CONCURRENT_OCR_JOBS=4
MAX_FILE_SIZE_MB=100
```
## Next Steps
- Review [deployment options](deployment.md) for production setup
- Learn about [folder watching](WATCH_FOLDER.md) for automatic document ingestion
- Optimize [OCR performance](dev/OCR_OPTIMIZATION_GUIDE.md) for your use case