3.2 KiB
📤 Smart File Upload Guide
Readur provides an intuitive drag-and-drop file upload system that supports multiple document formats and batch processing.
Supported File Types
-
PDF Files (.pdf)
Direct text extraction and OCR for scanned PDFs -
Images (.png, .jpg, .jpeg, .tiff, .bmp, .webp)
Full OCR text extraction -
Text Files (.txt, .rtf)
Direct text import -
Office Documents (.docx, .doc, .xlsx, .xls, .pptx, .ppt)
Text extraction and OCR
Upload Methods
Drag & Drop
- Navigate to the main dashboard
- Drag files from your computer directly onto the upload area
- Multiple files can be selected and dropped simultaneously
- Progress indicators show upload and processing status
Browse & Select
- Click the "Upload Documents" button
- Use the file browser to select one or multiple files
- Click "Open" to begin the upload process
Batch Processing
- Upload multiple files at once for efficient processing
- Each file is processed independently for OCR and text extraction
- Real-time status updates show processing progress
- Failed uploads can be retried individually
Processing Pipeline
- File Validation - Verify file type and size limits
- Enhanced File Type Detection (v2.5.4+) - Magic number detection using Rust 'infer' crate
- Storage - Secure file storage with backup (local or S3)
- OCR Processing - Automatic text extraction using Tesseract
- Indexing - Full-text search indexing in PostgreSQL
- Metadata Extraction - File properties and document information
Enhanced File Type Detection (v2.5.4+)
Readur now uses content-based file type detection rather than relying solely on file extensions:
- Magic Number Detection: Identifies files by their content signature, not just extension
- Broader Format Support: Automatically recognizes more document and image formats
- Security Enhancement: Prevents malicious files with incorrect extensions from being processed
- Performance: Fast, native Rust implementation for minimal overhead
Automatically Detected Formats:
- Documents: PDF, DOCX, XLSX, PPTX, ODT, ODS, ODP
- Images: PNG, JPEG, GIF, BMP, TIFF, WebP, HEIC
- Archives: ZIP, RAR, 7Z, TAR, GZ
- Text: TXT, MD, CSV, JSON, XML
This enhancement ensures files are correctly identified even when extensions are missing or incorrect, improving both reliability and security.
Best Practices
-
File Size
Keep individual files under 50MB for optimal performance -
File Names
Use descriptive names for better organization -
Batch Size
Upload 10-20 files at once for best performance -
Network
Stable internet connection recommended for large uploads
Troubleshooting
Upload Fails
- Check file size limits
- Verify file format is supported
- Ensure stable internet connection
- Try uploading fewer files at once
OCR Issues
- Ensure images have good contrast and resolution
- PDF files may need higher quality scans
- Check the OCR Optimization Guide for advanced tips
Security
- All uploads are scanned for malicious content
- Files are stored securely with proper access controls
- User permissions apply to all uploaded documents
- Automatic backup ensures data safety