Fix text detection for blobs containing bytes in ASCII range up to a zero

Despite we provide a length for toUnicode() the validity/decode is being
performed only up to the first null character, so it passes as text blobs
containing bytes in the ASCII range, followed by a zero and anything else
after.

See issue #1772
This commit is contained in:
mgrojo
2019-03-02 13:49:45 +01:00
parent a7fc1ab541
commit 0adb0af133

View File

@@ -17,6 +17,12 @@ bool isTextOnly(QByteArray data, const QString& encoding, bool quickTest)
if(startsWithBom(data))
return true;
// We can assume that the default encoding (UTF-8) cannot contain character zero.
// This has to be checked explicitly because toUnicode() is ignoring bytes beyond
// the zero.
if(encoding.isEmpty() && data.contains('\0'))
return false;
// Truncate to the first couple of bytes for quick testing
int testSize = quickTest? std::min(512, data.size()) : data.size();
QTextCodec::ConverterState state;