7.0 KiB
Watch Folder Guide
The watch folder feature automatically monitors a directory for new files and processes them with OCR, making them searchable in Readur. Your original files are never modified or deleted - Readur simply copies and processes them while leaving the originals untouched.
What is Watch Folder?
Watch folder allows you to:
- Drop files anywhere - Point Readur to any folder (local, network drive, cloud mount)
- Automatic processing - New files are automatically detected and processed
- Non-destructive - Original files remain exactly where you put them
- Background operation - Processing happens in the background while you continue working
Perfect for scenarios where you want to automatically process files from:
- Network drives (NFS, SMB shares)
- Cloud storage mounts (Google Drive, Dropbox, OneDrive)
- Local folders where you save scanned documents
- Shared team folders
How It Works
- Point Readur to your folder - Set the
WATCH_FOLDERpath to any directory you want monitored - Drop files - Add documents to that folder (PDFs, images, text files, Word docs)
- Automatic detection - Readur notices new files within seconds (local) or minutes (network)
- OCR processing - Files are automatically processed to extract searchable text
- Search and find - Your documents become searchable in the Readur web interface
Key Features
✅ Works with any storage type - Local drives, network shares, cloud mounts
✅ Smart processing - Only processes supported file types
✅ Duplicate prevention - Won't process the same file twice
✅ Safe operation - Never modifies or deletes your original files
✅ Background processing - Doesn't interrupt your workflow
Quick Setup
Basic Setup (Docker Compose)
- Edit your docker-compose.yml:
services:
readur:
image: ghcr.io/readur/readur:main
volumes:
# Mount your folder to the watch directory
- /path/to/your/documents:/app/watch
environment:
- WATCH_FOLDER=/app/watch
- Start Readur:
docker compose up -d
- Start dropping files into
/path/to/your/documents- they'll be automatically processed!
Configuration Options
| Setting | Default | What it does |
|---|---|---|
WATCH_FOLDER |
./watch |
Which folder to monitor |
WATCH_INTERVAL_SECONDS |
30 |
How often to check for new files (network drives) |
MAX_FILE_AGE_HOURS |
(none) | Ignore files older than this |
ALLOWED_FILE_TYPES |
pdf,png,jpg,jpeg,tiff,bmp,txt,doc,docx |
Which file types to process |
Usage
Basic Setup
-
Set the watch folder path:
export WATCH_FOLDER=/path/to/your/mounted/folder -
Start the application:
./readur -
Copy files to the watch folder: The application will automatically detect and process new files.
Docker Usage
# Mount your folder to the container's watch directory
docker run -d \
-v /path/to/your/files:/app/watch \
-e WATCH_FOLDER=/app/watch \
-e WATCH_INTERVAL_SECONDS=60 \
readur:latest
Docker Compose
services:
readur:
image: ghcr.io/readur/readur:main
volumes:
- /mnt/nfs/documents:/app/watch
- readur_uploads:/app/uploads
environment:
WATCH_FOLDER: /app/watch
WATCH_INTERVAL_SECONDS: 30
FILE_STABILITY_CHECK_MS: 1000
MAX_FILE_AGE_HOURS: 168 # 1 week
ports:
- "8000:8000"
Filesystem-Specific Configuration
NFS Mounts
# Recommended settings for NFS
export WATCH_INTERVAL_SECONDS=60
export FILE_STABILITY_CHECK_MS=1000
export FORCE_POLLING_WATCH=1
SMB/CIFS Mounts
# Recommended settings for SMB
export WATCH_INTERVAL_SECONDS=30
export FILE_STABILITY_CHECK_MS=2000
S3 Mounts (s3fs, goofys, etc.)
# Recommended settings for S3
export WATCH_INTERVAL_SECONDS=120
export FILE_STABILITY_CHECK_MS=5000
export FORCE_POLLING_WATCH=1
Local Filesystems
# Optimal settings for local storage (default behavior)
# No special configuration needed - uses inotify automatically
Supported File Types
The watch folder processes these file types for OCR:
- PDF:
*.pdf - Images:
*.png,*.jpg,*.jpeg,*.tiff,*.bmp,*.gif - Text:
*.txt - Word Documents:
*.doc,*.docx
File Processing Priority
Files are prioritized for OCR processing based on:
- File Size: Smaller files get higher priority
- File Type: Images > Text files > PDFs > Word documents
- Queue Time: Older items get higher priority within the same size/type category
Monitoring and Logs
The application provides detailed logging for watch folder operations:
INFO readur::watcher: Starting hybrid folder watcher on: /app/watch
INFO readur::watcher: Using watch strategy: Hybrid
INFO readur::watcher: Started polling-based watcher on: /app/watch
INFO readur::watcher: Processing new file: "/app/watch/document.pdf"
INFO readur::watcher: Successfully queued file for OCR: document.pdf (size: 2048 bytes)
Troubleshooting
Files Not Being Detected
-
Check permissions:
ls -la /path/to/watch/folder chmod 755 /path/to/watch/folder -
Verify file types:
# Only supported file types are processed echo $ALLOWED_FILE_TYPES -
Check file stability:
# Increase stability check time for slow networks export FILE_STABILITY_CHECK_MS=2000
High CPU Usage
-
Increase polling interval:
export WATCH_INTERVAL_SECONDS=120 -
Limit file age:
export MAX_FILE_AGE_HOURS=24
Network Mount Issues
-
Force polling mode:
export FORCE_POLLING_WATCH=1 -
Increase stability check:
export FILE_STABILITY_CHECK_MS=5000
Testing
Use the provided test script to verify functionality:
./test_watch_folder.sh
This creates sample files in the watch folder for testing.
Security Considerations
- Files are copied to a secure upload directory, not processed in-place
- Original files in the watch folder are never modified or deleted
- System files and hidden files are automatically excluded
- File size limits prevent processing of excessively large files (>500MB)
Performance
- Local filesystems: Near-instant detection via inotify
- Network filesystems: Detection within polling interval (default 30s)
- Concurrent processing: Multiple files processed simultaneously
- Memory efficient: Streams large files without loading entirely into memory
Examples
Basic File Drop
# Copy a file to the watch folder
cp document.pdf /app/watch/
# File will be automatically detected and processed
Batch Processing
# Copy multiple files
cp *.pdf /app/watch/
# All supported files will be queued for processing
Real-time Monitoring
# Watch the logs for processing updates
docker logs -f readur-container | grep watcher