Files
readur/docs/api-reference.md
2025-07-01 19:00:29 +00:00

17 KiB

API Reference

Readur provides a comprehensive REST API for integrating with external systems and building custom workflows.

Table of Contents

Base URL

http://localhost:8000/api

For production deployments, replace with your configured domain and ensure HTTPS is used.

Authentication

Readur uses JWT (JSON Web Token) authentication. Include the token in the Authorization header:

Authorization: Bearer <jwt_token>

Obtaining a Token

POST /api/auth/login
Content-Type: application/json

{
  "username": "admin",
  "password": "your_password"
}

Response:

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": 1,
    "username": "admin",
    "email": "admin@example.com",
    "role": "admin"
  }
}

Error Handling

All API errors follow a consistent format:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid request parameters",
    "details": {
      "field": "email",
      "reason": "Invalid email format"
    }
  }
}

Common HTTP status codes:

  • 200 - Success
  • 201 - Created
  • 400 - Bad Request
  • 401 - Unauthorized
  • 403 - Forbidden
  • 404 - Not Found
  • 422 - Validation Error
  • 500 - Internal Server Error

Rate Limiting

API requests are rate-limited to prevent abuse:

  • Authenticated users: 1000 requests per hour
  • Unauthenticated users: 100 requests per hour

Rate limit headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1640995200

Endpoints

Authentication Endpoints

Register New User

POST /api/auth/register
Content-Type: application/json

{
  "username": "john_doe",
  "email": "john@example.com",
  "password": "secure_password"
}

Login

POST /api/auth/login
Content-Type: application/json

{
  "username": "john_doe",
  "password": "secure_password"
}

Get Current User

GET /api/auth/me
Authorization: Bearer <jwt_token>

OIDC Login (Redirect)

GET /api/auth/oidc/login

Redirects to the configured OIDC provider for authentication.

OIDC Callback

GET /api/auth/oidc/callback?code=<auth_code>&state=<state>

Handles the callback from the OIDC provider and issues a JWT token.

Logout

POST /api/auth/logout
Authorization: Bearer <jwt_token>

Document Endpoints

Upload Document

POST /api/documents
Authorization: Bearer <jwt_token>
Content-Type: multipart/form-data

file: <binary_file_data>
tags: ["invoice", "2024"]  # Optional

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "filename": "invoice_2024.pdf",
  "mime_type": "application/pdf",
  "size": 1048576,
  "uploaded_at": "2024-01-01T00:00:00Z",
  "ocr_status": "pending"
}

List Documents

GET /api/documents?limit=50&offset=0&sort=-uploaded_at
Authorization: Bearer <jwt_token>

Query parameters:

  • limit - Number of results (default: 50, max: 100)
  • offset - Pagination offset
  • sort - Sort field (prefix with - for descending)
  • mime_type - Filter by MIME type
  • ocr_status - Filter by OCR status
  • tag - Filter by tag

Get Document Details

GET /api/documents/{id}
Authorization: Bearer <jwt_token>

Download Document

GET /api/documents/{id}/download
Authorization: Bearer <jwt_token>

Delete Document

DELETE /api/documents/{id}
Authorization: Bearer <jwt_token>

Update Document

PATCH /api/documents/{id}
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "tags": ["invoice", "paid", "2024"]
}

Get Document Debug Information

GET /api/documents/{id}/debug
Authorization: Bearer <jwt_token>

Response:

{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "processing_pipeline": {
    "upload": "completed",
    "ocr_queue": "completed", 
    "ocr_processing": "completed",
    "validation": "completed"
  },
  "ocr_details": {
    "confidence": 89.5,
    "word_count": 342,
    "processing_time": 4.2
  },
  "file_info": {
    "mime_type": "application/pdf",
    "size": 1048576,
    "pages": 3
  }
}

Get Document Thumbnail

GET /api/documents/{id}/thumbnail
Authorization: Bearer <jwt_token>

Get Document OCR Text

GET /api/documents/{id}/ocr
Authorization: Bearer <jwt_token>

Get Document Processed Image

GET /api/documents/{id}/processed-image
Authorization: Bearer <jwt_token>

View Document in Browser

GET /api/documents/{id}/view
Authorization: Bearer <jwt_token>

Get Failed Documents

GET /api/documents/failed?limit=50&offset=0
Authorization: Bearer <jwt_token>

Query parameters:

  • limit - Number of results (default: 50)
  • offset - Pagination offset
  • stage - Filter by failure stage
  • reason - Filter by failure reason

View Failed Document

GET /api/documents/failed/{id}/view
Authorization: Bearer <jwt_token>

Get Duplicate Documents

GET /api/documents/duplicates?limit=50&offset=0
Authorization: Bearer <jwt_token>

Delete Low Confidence Documents

POST /api/documents/delete-low-confidence
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "confidence_threshold": 70.0,
  "preview_only": false
}

Delete Failed OCR Documents

POST /api/documents/delete-failed-ocr
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "preview_only": false
}

Bulk Delete Documents

DELETE /api/documents
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "document_ids": ["550e8400-e29b-41d4-a716-446655440000", "..."]
}

Search Endpoints

Search Documents

GET /api/search?query=invoice&limit=20
Authorization: Bearer <jwt_token>

Query parameters:

  • query - Search query (required)
  • limit - Number of results
  • offset - Pagination offset
  • mime_types - Comma-separated MIME types
  • tags - Comma-separated tags
  • date_from - Start date (ISO 8601)
  • date_to - End date (ISO 8601)

Response:

{
  "results": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "filename": "invoice_2024.pdf",
      "snippet": "...invoice for services rendered in Q1 2024...",
      "score": 0.95,
      "highlights": ["invoice", "2024"]
    }
  ],
  "total": 42,
  "limit": 20,
  "offset": 0
}
POST /api/search/advanced
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "query": "invoice",
  "filters": {
    "mime_types": ["application/pdf"],
    "tags": ["unpaid"],
    "date_range": {
      "from": "2024-01-01",
      "to": "2024-12-31"
    },
    "file_size": {
      "min": 1024,
      "max": 10485760
    }
  },
  "options": {
    "fuzzy": true,
    "snippet_length": 200
  }
}

OCR Queue Endpoints

Get Queue Status

GET /api/queue/status
Authorization: Bearer <jwt_token>

Response:

{
  "pending": 15,
  "processing": 3,
  "completed_today": 127,
  "failed_today": 2,
  "average_processing_time": 4.5
}

Retry OCR Processing

POST /api/documents/{id}/retry-ocr
Authorization: Bearer <jwt_token>

Get Failed OCR Jobs

GET /api/queue/failed
Authorization: Bearer <jwt_token>

Get Queue Statistics

GET /api/queue/stats
Authorization: Bearer <jwt_token>

Response:

{
  "pending_count": 15,
  "processing_count": 3,
  "failed_count": 2,
  "completed_today": 127,
  "average_processing_time_seconds": 4.5,
  "queue_health": "healthy"
}

Requeue Failed Items

POST /api/queue/requeue-failed
Authorization: Bearer <jwt_token>

Enqueue Pending Documents

POST /api/queue/enqueue-pending
Authorization: Bearer <jwt_token>

Pause OCR Processing

POST /api/queue/pause
Authorization: Bearer <jwt_token>

Resume OCR Processing

POST /api/queue/resume
Authorization: Bearer <jwt_token>

Settings Endpoints

Get User Settings

GET /api/settings
Authorization: Bearer <jwt_token>

Update User Settings

PUT /api/settings
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "ocr_language": "eng",
  "search_results_per_page": 50,
  "enable_notifications": true
}

Sources Endpoints

List Sources

GET /api/sources
Authorization: Bearer <jwt_token>

Create Source

POST /api/sources
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "name": "Network Drive",
  "type": "local_folder",
  "config": {
    "path": "/mnt/network/documents",
    "scan_interval": 3600
  },
  "enabled": true
}

Update Source

PUT /api/sources/{id}
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "enabled": false
}

Delete Source

DELETE /api/sources/{id}
Authorization: Bearer <jwt_token>

Sync Source

POST /api/sources/{id}/sync
Authorization: Bearer <jwt_token>

Stop Source Sync

POST /api/sources/{id}/sync/stop
Authorization: Bearer <jwt_token>

Test Source Connection

POST /api/sources/{id}/test
Authorization: Bearer <jwt_token>

Estimate Source Crawl

POST /api/sources/{id}/estimate
Authorization: Bearer <jwt_token>

Estimate Crawl with Configuration

POST /api/sources/estimate
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "source_type": "webdav",
  "config": {
    "url": "https://example.com/webdav",
    "username": "user",
    "password": "pass"
  }
}

Test Connection with Configuration

POST /api/sources/test-connection
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "source_type": "webdav", 
  "config": {
    "url": "https://example.com/webdav",
    "username": "user",
    "password": "pass"
  }
}

WebDAV Endpoints

Test WebDAV Connection

POST /api/webdav/test-connection
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "url": "https://example.com/webdav",
  "username": "user",
  "password": "pass"
}

Estimate WebDAV Crawl

POST /api/webdav/estimate-crawl
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "url": "https://example.com/webdav",
  "username": "user", 
  "password": "pass"
}

Get WebDAV Sync Status

GET /api/webdav/sync-status
Authorization: Bearer <jwt_token>

Start WebDAV Sync

POST /api/webdav/start-sync
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "url": "https://example.com/webdav",
  "username": "user",
  "password": "pass"
}

Cancel WebDAV Sync

POST /api/webdav/cancel-sync
Authorization: Bearer <jwt_token>

Labels Endpoints

List Labels

GET /api/labels
Authorization: Bearer <jwt_token>

Create Label

POST /api/labels
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "name": "Important",
  "color": "#FF0000"
}

Update Label

PUT /api/labels/{id}
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "name": "Very Important",
  "color": "#FF00FF"
}

Delete Label

DELETE /api/labels/{id}
Authorization: Bearer <jwt_token>

User Endpoints

List Users (Admin Only)

GET /api/users
Authorization: Bearer <jwt_token>

Get User

GET /api/users/{id}
Authorization: Bearer <jwt_token>

Update User

PUT /api/users/{id}
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "email": "newemail@example.com",
  "role": "user"
}

Delete User (Admin Only)

DELETE /api/users/{id}
Authorization: Bearer <jwt_token>

Notifications Endpoints

List Notifications

GET /api/notifications?limit=50&offset=0
Authorization: Bearer <jwt_token>

Get Notification Summary

GET /api/notifications/summary
Authorization: Bearer <jwt_token>

Response:

{
  "unread_count": 5,
  "total_count": 23,
  "latest_notification": {
    "id": 1,
    "type": "ocr_completed",
    "message": "OCR processing completed for document.pdf",
    "created_at": "2024-01-01T12:00:00Z"
  }
}

Mark Notification as Read

POST /api/notifications/{id}/read
Authorization: Bearer <jwt_token>

Mark All Notifications as Read

POST /api/notifications/read-all
Authorization: Bearer <jwt_token>

Delete Notification

DELETE /api/notifications/{id}
Authorization: Bearer <jwt_token>

Ignored Files Endpoints

List Ignored Files

GET /api/ignored-files?limit=50&offset=0
Authorization: Bearer <jwt_token>

Query parameters:

  • limit - Number of results (default: 50)
  • offset - Pagination offset
  • filename - Filter by filename
  • source_type - Filter by source type

Get Ignored Files Statistics

GET /api/ignored-files/stats
Authorization: Bearer <jwt_token>

Response:

{
  "total_ignored_files": 42,
  "total_size_bytes": 104857600,
  "most_recent_ignored_at": "2024-01-01T12:00:00Z"
}

Get Ignored File Details

GET /api/ignored-files/{id}
Authorization: Bearer <jwt_token>

Remove File from Ignored List

DELETE /api/ignored-files/{id}
Authorization: Bearer <jwt_token>

Bulk Remove Files from Ignored List

DELETE /api/ignored-files/bulk-delete
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "ignored_file_ids": [1, 2, 3, 4]
}

Metrics Endpoints

Get System Metrics

GET /api/metrics
Authorization: Bearer <jwt_token>

Get Prometheus Metrics

GET /metrics

Returns Prometheus-formatted metrics (no authentication required).

Health Check

Health Check

GET /api/health

Response:

{
  "status": "healthy",
  "timestamp": "2024-01-01T12:00:00Z",
  "version": "1.0.0"
}

Examples

Python Example

import requests

# Configuration
BASE_URL = "http://localhost:8000/api"
USERNAME = "admin"
PASSWORD = "your_password"

# Login
response = requests.post(f"{BASE_URL}/auth/login", json={
    "username": USERNAME,
    "password": PASSWORD
})
token = response.json()["token"]
headers = {"Authorization": f"Bearer {token}"}

# Upload document
with open("document.pdf", "rb") as f:
    files = {"file": ("document.pdf", f, "application/pdf")}
    response = requests.post(
        f"{BASE_URL}/documents",
        headers=headers,
        files=files
    )
    document_id = response.json()["id"]

# Search documents
response = requests.get(
    f"{BASE_URL}/search",
    headers=headers,
    params={"query": "invoice 2024"}
)
results = response.json()["results"]

JavaScript Example

// Configuration
const BASE_URL = 'http://localhost:8000/api';

// Login
async function login(username, password) {
  const response = await fetch(`${BASE_URL}/auth/login`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ username, password })
  });
  const data = await response.json();
  return data.token;
}

// Upload document
async function uploadDocument(token, file) {
  const formData = new FormData();
  formData.append('file', file);
  
  const response = await fetch(`${BASE_URL}/documents`, {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${token}` },
    body: formData
  });
  return response.json();
}

// Search documents
async function searchDocuments(token, query) {
  const response = await fetch(
    `${BASE_URL}/search?query=${encodeURIComponent(query)}`,
    {
      headers: { 'Authorization': `Bearer ${token}` }
    }
  );
  return response.json();
}

cURL Examples

# Login
TOKEN=$(curl -s -X POST http://localhost:8000/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"your_password"}' \
  | jq -r .token)

# Upload document
curl -X POST http://localhost:8000/api/documents \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@document.pdf"

# Search documents
curl -X GET "http://localhost:8000/api/search?query=invoice" \
  -H "Authorization: Bearer $TOKEN"

# Get document
curl -X GET http://localhost:8000/api/documents/550e8400-e29b-41d4-a716-446655440000 \
  -H "Authorization: Bearer $TOKEN"

OpenAPI Specification

The complete OpenAPI specification is available at:

GET /api-docs/openapi.json

Interactive Swagger UI documentation is available at:

GET /swagger-ui

You can use this with tools like Swagger UI or to generate client libraries.

SDK Support

Official SDKs are planned for:

  • Python
  • JavaScript/TypeScript
  • Go
  • Ruby

Check the GitHub repository for the latest SDK availability.