Fix Paperless-ngx issues, restores missing assets, and improves stability.

- Fixed: Paperless-ngx document uploads were being incorrectly flagged as duplicates due to invalid API parameter usage (checksum → checksum__iexact). - Fixed: API token authentication with Paperless-ngx now works properly when Two-Factor Authentication (2FA) is enabled, ensuring secure token-only integration. - Fixed: Restored missing i18next JavaScript libraries for non-Docker installations, ensuring status page and i18n features function correctly. - Enhanced: Replaced psycopg2-binary with psycopg2 for production stability and compatibility. - Enhanced: Adjusted .gitignore to track /lib directory, ensuring essential libraries are available across environments.
2026-02-11 00:08:44 -06:00 · 2025-09-18 10:56:43 -03:00
parent 2cd6cd407a
commit 6b035b59a8
11 changed files with 132 additions and 63 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -12,7 +12,6 @@ dist/
 downloads/
 eggs/
 .eggs/
-lib/
 lib64/
 parts/
 sdist/
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,43 @@
 # Changelog

+## 0.10.1.12 - 2025-09-18
+
+### Fixed
+- **Paperless-ngx Document Upload Duplicate Detection:** Fixed critical bug where incorrect API parameter caused all new document uploads to Paperless-ngx to be falsely identified as duplicates, preventing new documents from being uploaded and instead linking them to existing unrelated documents.
+  - **Root Cause:** The duplicate detection logic was using an invalid API parameter `checksum` instead of the correct `checksum__iexact` parameter when querying Paperless-ngx for existing documents, causing the API to return unexpected results that incorrectly matched all uploads as duplicates.
+  - **Solution:** Corrected the API parameter from `checksum` to `checksum__iexact` to properly perform exact checksum matching against existing documents in Paperless-ngx.
+  - **Impact:** Users can now successfully upload new documents to Paperless-ngx without false duplicate warnings, while legitimate duplicate detection continues to work correctly for actual duplicate files.
+  - **Credit:** Fix contributed by @sjafferali in PR #127.
+  - _Files: `backend/paperless_handler.py`_
+
+### Enhanced
+- **Production Database Driver Optimization:** Replaced `psycopg2-binary` with `psycopg2` for improved production stability and performance.
+  - **Production Best Practice:** Switched from the development-oriented `psycopg2-binary` package to the production-recommended `psycopg2` package to avoid potential conflicts with system libraries and improve runtime stability.
+  - **Build Dependencies:** The existing Dockerfile already contains all necessary build dependencies (`build-essential`, `libpq-dev`) required to compile `psycopg2` from source.
+  - **Impact:** Enhanced production deployment stability while maintaining full PostgreSQL database connectivity and compatibility.
+  - _Files: `backend/requirements.txt`_
+
+- **Development Environment Library Access:** Removed `/lib` directory from .gitignore to allow tracking of essential library files in the repository.
+  - **Repository Management:** Updated .gitignore configuration to include library files that were previously excluded from version control.
+  - **Impact:** Ensures necessary library dependencies are properly tracked and available for development and deployment processes.
+  - _Files: `.gitignore`_
+
+### Fixed
+- **Missing JavaScript Assets for Non-Docker Installations:** Fixed critical error preventing the status page and internationalization features from functioning correctly in non-Docker installations due to missing i18next library files.
+  - **Root Cause:** The `/lib` directory containing essential i18next JavaScript libraries was excluded from version control via .gitignore, causing these files to be missing in non-Docker deployments where they couldn't be served from CDN.
+  - **Solution:** Updated service worker cache configuration to include the three required i18next library files (`i18next.min.js`, `i18nextHttpBackend.min.js`, `i18nextBrowserLanguageDetector.min.js`) and incremented cache version to ensure users receive the updated assets.
+  - **Impact:** Status page and all internationalization features now work correctly in non-Docker installations, eliminating JavaScript errors and ensuring consistent functionality across all deployment methods.
+  - **Cache Update:** Service worker cache version updated from `v20250119001` to `v20250918001` to force cache refresh for existing users.
+  - _Files: `frontend/sw.js`_
+
+- **Paperless-ngx API Token Authentication with 2FA Enabled:** Resolved critical authentication failure when using API tokens with Paperless-ngx instances that have Two-Factor Authentication (2FA) enabled. Users can now securely connect Warracker to their 2FA-protected Paperless-ngx accounts without compromising security.
+  - **Root Cause:** The Paperless-ngx integration was inadvertently using session-based authentication paths that conflicted with 2FA requirements, causing API token requests to be rejected even when tokens were valid.
+  - **Solution:** Implemented pure token-only authentication by clearing cookies before each request, disabling automatic redirects to login pages, and ensuring all API calls use only the `Authorization: Token <token>` header without session interference.
+  - **Enhanced Error Handling:** Added detection and clear error messaging for authentication redirects (3xx responses) that would indicate token rejection, helping users troubleshoot configuration issues.
+  - **Backward Compatibility:** All existing functionality remains unchanged for users without 2FA enabled, ensuring seamless operation across different Paperless-ngx configurations.
+  - **Security Maintained:** Users can keep 2FA enabled on their Paperless-ngx accounts while using API tokens for Warracker integration, maintaining the highest security standards.
+  - _Files: `backend/paperless_handler.py`, `backend/file_routes.py`_
+
 ## 0.10.1.11 - 2025-09-07

 ### Enhanced
--- a/backend/file_routes.py
+++ b/backend/file_routes.py
@@ -527,13 +527,15 @@ def paperless_search():
        logger.info(f"Searching Paperless documents with params: {params}")
        
        # Make request to Paperless-ngx using the session from paperless handler
-        response = paperless_handler.session.get(
-            search_url,
-            params=params,
-            timeout=30
-        )
-        
-        response.raise_for_status()
+        try:
+            response = paperless_handler.get(search_url, params=params, timeout=30)
+            response.raise_for_status()
+        except Exception as e:
+            # Provide user-friendly error on auth failures
+            return jsonify({
+                'success': False,
+                'message': f'Paperless-ngx search failed: {str(e)}'
+            }), 400
        search_result = response.json()
        
        logger.info(f"Paperless search returned {len(search_result.get('results', []))} documents")
@@ -562,12 +564,11 @@ def paperless_tags():
            return jsonify({'success': False, 'message': 'Paperless-ngx integration not available'}), 400
        
        # Make request to Paperless-ngx tags endpoint
-        response = paperless_handler.session.get(
-            f"{paperless_handler.paperless_url}/api/tags/",
-            timeout=30
-        )
-        
-        response.raise_for_status()
+        try:
+            response = paperless_handler.get('/api/tags/', timeout=30)
+            response.raise_for_status()
+        except Exception as e:
+            return jsonify({'success': False, 'message': f'Paperless-ngx tags failed: {str(e)}'}), 400
        tags_result = response.json()
        
        logger.info(f"Paperless tags returned {len(tags_result.get('results', []))} tags")
--- a/backend/paperless_handler.py
+++ b/backend/paperless_handler.py
@@ -32,8 +32,58 @@ class PaperlessHandler:
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': f'Token {api_token}',
-            'User-Agent': 'Warracker-PaperlessIntegration/1.0'
+            'User-Agent': 'Warracker-PaperlessIntegration/1.0',
+            'Accept': 'application/json'
        })
+        # Ensure no environment-provided authentication (proxies, netrc) interferes
+        try:
+            self.session.trust_env = False
+        except Exception:
+            pass
+
+    def _build_url(self, url_or_path: str) -> str:
+        """Build absolute URL for Paperless-ngx API calls."""
+        if url_or_path.startswith('http://') or url_or_path.startswith('https://'):
+            return url_or_path
+        return f"{self.paperless_url.rstrip('/')}/{url_or_path.lstrip('/')}"
+
+    def _request(self, method: str, url_or_path: str, **kwargs) -> requests.Response:
+        """
+        Perform a request ensuring token-only auth:
+        - Always include Authorization: Token <token>
+        - Clear cookies before sending (avoid session/CSRF/2FA paths)
+        - Do not auto-follow redirects to login pages unless explicitly requested
+        """
+        headers = kwargs.pop('headers', {}) or {}
+        merged_headers = {
+            'Authorization': f'Token {self.api_token}',
+            'User-Agent': 'Warracker-PaperlessIntegration/1.0',
+            'Accept': 'application/json'
+        }
+        merged_headers.update(headers)
+
+        # Avoid sending any cookies that could switch us to session auth
+        try:
+            self.session.cookies.clear()
+        except Exception:
+            pass
+
+        if 'allow_redirects' not in kwargs:
+            kwargs['allow_redirects'] = False
+
+        url = self._build_url(url_or_path)
+        response = self.session.request(method, url, headers=merged_headers, **kwargs)
+        # Treat redirects to login (or any redirect) as auth failures for API token mode
+        if 300 <= response.status_code < 400:
+            location = response.headers.get('Location', '')
+            raise requests.exceptions.HTTPError(
+                f"Unexpected redirect (HTTP {response.status_code}) to '{location}'. Token auth likely rejected.",
+                response=response
+            )
+        return response
+
+    def get(self, url_or_path: str, **kwargs) -> requests.Response:
+        return self._request('GET', url_or_path, **kwargs)
        
    def test_connection(self) -> Tuple[bool, str]:
        """
@@ -43,7 +93,7 @@ class PaperlessHandler:
            (success: bool, message: str)
        """
        try:
-            response = self.session.get(f'{self.paperless_url}/api/documents/', params={'page_size': 1})
+            response = self.get('/api/documents/', params={'page_size': 1})
            response.raise_for_status()
            return True, "Connection successful"
        except requests.exceptions.ConnectionError:
@@ -86,22 +136,14 @@ class PaperlessHandler:

            

-            response = self.session.get(
-
-                f'{self.paperless_url}/api/documents/',
-
+            response = self.get(
+                '/api/documents/',
                params={
-
                    'checksum__iexact': checksum,
-
                    'ordering': '-created',
-
                    'page_size': 1
-
                },
-
                timeout=15
-
            )

            
@@ -194,10 +236,11 @@ class PaperlessHandler:
            logger.info(f"MIME type: {mime_type}")
            
            # Don't set Content-Type manually - let requests handle it
-            response = self.session.post(
-                f'{self.paperless_url}/api/documents/post_document/',
+            response = self._request(
+                'POST',
+                '/api/documents/post_document/',
                files=files,
-                headers={'Authorization': f'Token {self.api_token}'},
+                data=data,
                timeout=60  # Longer timeout for uploads
            )
            
@@ -335,10 +378,7 @@ class PaperlessHandler:
        for endpoint_name, endpoint_path in endpoints_to_try:
            try:
                logger.info(f"Fetching document {endpoint_name} from Paperless-ngx: {document_id}")
-                response = self.session.get(
-                    f'{self.paperless_url}{endpoint_path}',
-                    timeout=30
-                )
+                response = self.get(endpoint_path, timeout=30)
                
                response.raise_for_status()
                
@@ -376,10 +416,7 @@ class PaperlessHandler:
            (success: bool, content: Optional[bytes], message: str)
        """
        try:
-            response = self.session.get(
-                f'{self.paperless_url}/api/documents/{document_id}/thumb/',
-                timeout=15
-            )
+            response = self.get(f'/api/documents/{document_id}/thumb/', timeout=15)
            
            response.raise_for_status()
            return True, response.content, "Thumbnail retrieved successfully"
@@ -410,11 +447,7 @@ class PaperlessHandler:
                'page_size': min(limit, 100)  # Cap at 100 for performance
            }
            
-            response = self.session.get(
-                f'{self.paperless_url}/api/documents/',
-                params=params,
-                timeout=15
-            )
+            response = self.get('/api/documents/', params=params, timeout=15)
            
            response.raise_for_status()
            result = response.json()
@@ -437,10 +470,7 @@ class PaperlessHandler:
            (success: bool, document_info: Optional[Dict], message: str)
        """
        try:
-            response = self.session.get(
-                f'{self.paperless_url}/api/documents/{document_id}/',
-                timeout=15
-            )
+            response = self.get(f'/api/documents/{document_id}/', timeout=15)
            
            response.raise_for_status()
            document_info = response.json()
@@ -484,7 +514,7 @@ class PaperlessHandler:
        for endpoint_name, endpoint_path in endpoints_to_test:
            try:
                logger.info(f"Testing endpoint: {self.paperless_url}{endpoint_path}")
-                response = self.session.get(f'{self.paperless_url}{endpoint_path}', timeout=15)
+                response = self.get(endpoint_path, timeout=15)
                
                debug_info['endpoints_tested'][endpoint_name] = {
                    'status_code': response.status_code,
@@ -508,9 +538,7 @@ class PaperlessHandler:
        
        # Also try to list recent documents to see if our document is there
        try:
-            response = self.session.get(f'{self.paperless_url}/api/documents/', 
-                                      params={'ordering': '-created', 'page_size': 10}, 
-                                      timeout=15)
+            response = self.get('/api/documents/', params={'ordering': '-created', 'page_size': 10}, timeout=15)
            if response.status_code == 200:
                recent_docs = response.json().get('results', [])
                debug_info['recent_documents'] = [
@@ -536,10 +564,7 @@ class PaperlessHandler:
            True if document exists, False otherwise
        """
        try:
-            response = self.session.get(
-                f'{self.paperless_url}/api/documents/{document_id}/',
-                timeout=10
-            )
+            response = self.get(f'/api/documents/{document_id}/', timeout=10)
            return response.status_code == 200
        except Exception as e:
            logger.warning(f"Error checking document existence {document_id}: {e}")
@@ -559,8 +584,8 @@ class PaperlessHandler:
            logger.info(f"Searching for document by title: {title}")
            
            # Search for documents with the given title
-            response = self.session.get(
-                f'{self.paperless_url}/api/documents/',
+            response = self.get(
+                '/api/documents/',
                params={
                    'title__icontains': title,  # Case-insensitive partial match
                    'ordering': '-created',     # Most recent first
@@ -630,14 +655,14 @@ class PaperlessHandler:
        while time.time() < deadline:
            try:
                try:
-                    resp = self.session.get(task_url_primary, timeout=10)
+                    resp = self.get(task_url_primary, timeout=10)
                    if resp.status_code == 404:
                        # Fall back to legacy ?task_id=<uuid> filter
-                        resp = self.session.get(task_url_legacy_list, params={"task_id": task_id}, timeout=10)
+                        resp = self.get(task_url_legacy_list, params={"task_id": task_id}, timeout=10)
                except requests.exceptions.HTTPError as http_err:
                    if http_err.response.status_code == 404 and http_err.response.url.rstrip('/') == task_url_primary.rstrip('/'):
                        # Primary endpoint not available, try legacy
-                        resp = self.session.get(task_url_legacy_list, params={"task_id": task_id}, timeout=10)
+                        resp = self.get(task_url_legacy_list, params={"task_id": task_id}, timeout=10)
                    else:
                        raise

--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -1,6 +1,6 @@
 Flask==3.0.3
 gunicorn==23.0.0
-psycopg2-binary==2.9.9
+psycopg2==2.9.9
 Werkzeug==3.0.3
 flask-cors==4.0.1
 Flask-Login==0.6.3
--- a/frontend/about.html
+++ b/frontend/about.html
@@ -416,7 +416,7 @@
            // Update version display dynamically
            const versionDisplay = document.getElementById('versionDisplay');
            if (versionDisplay && window.i18next) {
-                const currentVersion = '0.10.1.11'; // This should match version-checker.js
+                const currentVersion = '0.10.1.12'; // This should match version-checker.js
                versionDisplay.textContent = window.i18next.t('about.version') + ' v' + currentVersion;
            }
            
--- a/frontend/js/lib/i18next.min.js
+++ b/frontend/js/lib/i18next.min.js
--- a/frontend/js/lib/i18nextBrowserLanguageDetector.min.js
+++ b/frontend/js/lib/i18nextBrowserLanguageDetector.min.js
--- a/frontend/js/lib/i18nextHttpBackend.min.js
+++ b/frontend/js/lib/i18nextHttpBackend.min.js
--- a/frontend/sw.js
+++ b/frontend/sw.js
@@ -1,4 +1,4 @@
-const CACHE_NAME = 'warracker-cache-v20250119001';
+const CACHE_NAME = 'warracker-cache-v20250918001';
 const urlsToCache = [
  './',
  './index.html',
@@ -16,6 +16,9 @@ const urlsToCache = [
  './footer-fix.js?v=20250119001',
  './footer-content.js?v=20250119001',
  './js/i18n.js?v=20250119001',
+  './js/lib/i18next.min.js',
+  './js/lib/i18nextHttpBackend.min.js',
+  './js/lib/i18nextBrowserLanguageDetector.min.js',
  './manifest.json',
  './img/favicon-16x16.png',
  './img/favicon-32x32.png',
--- a/frontend/version-checker.js
+++ b/frontend/version-checker.js
@@ -1,6 +1,6 @@
 // Version checker for Warracker
 document.addEventListener('DOMContentLoaded', () => {
-    const currentVersion = '0.10.1.11'; // Current version of the application
+    const currentVersion = '0.10.1.12'; // Current version of the application
    const updateStatus = document.getElementById('updateStatus');
    const updateLink = document.getElementById('updateLink');
    const versionDisplay = document.getElementById('versionDisplay');