Fix text detection for blobs containing bytes in ASCII range up to a zero

Despite we provide a length for toUnicode() the validity/decode is being performed only up to the first null character, so it passes as text blobs containing bytes in the ASCII range, followed by a zero and anything else after. See issue #1772
2026-01-30 07:50:42 -06:00 · 2019-03-02 13:49:45 +01:00
parent a7fc1ab541
commit 0adb0af133
1 changed files with 6 additions and 0 deletions
--- a/src/Data.cpp
+++ b/src/Data.cpp
@@ -17,6 +17,12 @@ bool isTextOnly(QByteArray data, const QString& encoding, bool quickTest)
    if(startsWithBom(data))
        return true;

+    // We can assume that the default encoding (UTF-8) cannot contain character zero.
+    // This has to be checked explicitly because toUnicode() is ignoring bytes beyond
+    // the zero.
+    if(encoding.isEmpty() && data.contains('\0'))
+        return false;
+
    // Truncate to the first couple of bytes for quick testing
    int testSize = quickTest? std::min(512, data.size()) : data.size();
    QTextCodec::ConverterState state;