Commit Graph

14 Commits

Author SHA1 Message Date
Martin Kleusberg
ed9fda28ea Speed up CSV import by not querying the stream position
Avoid querying the position in the text stream using Qt's pos() function
to update the progress dialog. Instead keep track of the stream position
manually. This is possible here because we don't ever seek in the file.
In result, this speeds up the CSV import dramatically.
2017-11-05 12:40:32 +01:00
Martin Kleusberg
5a14e47419 Mark some more constructors as explicit 2017-10-31 12:11:03 +01:00
Martin Kleusberg
ee32b3e4e1 Use nullptr where possible 2017-10-30 21:20:02 +01:00
Martin Kleusberg
659f38ebef Increase CSV parser performance 2017-09-18 15:10:43 +02:00
Martin Kleusberg
0eb1f65798 Optimise the CSV import performance
This commit bundles a number of smaller optimisations in the CSV parser
and import code. They do add up to a noticible speed gain though (at
least on some systems and configurations).
2017-09-13 15:03:13 +02:00
Martin Kleusberg
6ed8080fdb Don't parse entire CSV file before inserting the first row
We were separating the CSV import into two steps: parsing the CSV file
and inserting the parsed data. This had the advantages that it keeps the
parsing code and the database code nicely separated and that we have
full knowledge of the CSV file when we start inserting the data into the
database. However, this made it necessary to keep the entire parser
results in RAM. For large CSV files this uses enormous amounts of
memory.

This commit changes the import to parse the first 20 lines and analyse
them. This should give us a good impression of what to expect from the
rest of the file. Based on that information we then parse the file row
by row and insert each row into the database as soon as it is parsed.
This means we only have to keep one row at a time in memory while more
or less keeping the possibility to analyse the file before inserting
data.

On my system this does seem to change the runtime for small files which
take a little longer now (<5%), though these measurements aren't
conclusive. For large files it, however, it changes memory consumption
from using all memory and starting to swap within seconds to almost no
memory consumption at all. And not having to swap speeds things up a
lot.
2017-09-12 10:37:28 +02:00
Martin Kleusberg
b7a00d301a Don't track column count when parsing CSV files
When parsing a CSV file we used to check the column count for each row
and track the highest number of columns that we found. This information
then could be used to create an INSERT statement large enough for all
the data.

This column number tracking code is removed by this commit. Instead it
analyses the first 20 rows only. It does that while generating the field
list.

Performance-wise this should take a (very) little longer but makes it
easier to improve the performance in other ways later which should more
than compensate this commit.

Feature-wise this should fix some (technically invalid) corner-case CSV
files with fewer fields in the title row than in the other rows. It
should also break some other (technically invalid) corner-case CSV files
if they are imported into an existing table and have less columns than
the existing table in their first 20 rows but later on the exact same
number. Both cases, I think, don't matter too much.
2017-09-10 11:07:02 +02:00
Martin Kleusberg
e64eb8a118 Only load extra byte in the CSV parser when there's more data available 2017-06-30 22:32:13 +02:00
Martin Kleusberg
c6deca1242 Fix CSV import when line breaks appear at the buffer boundary
We're reading CSV files not all at once but in chunks. And when we're
encountering a \r char we're checking if it is followed by a \n char. So
far so good. But now it might happen that we're hitting a \r char that's
right at the end of the current buffer. In this case the lookahead check
isn't working as expected because there isn't more data available yet.
This commit fixes the issue by checking for these conditions and loading
an extra byte when needed.

See issue #1033.
2017-06-30 00:59:03 +02:00
Martin Kleusberg
743bdf9941 Fix a few warnings 2015-07-06 22:48:18 +02:00
Peinthor Rene
3ae9808289 use qt int64 type to fix build 2015-04-12 20:11:51 +02:00
Samir Aguiar
ca38995013 csvparser: Add support for old Mac OS line endings
In order to detect the CR characters, the file
must be opened in binary mode, otherwise QFile just
removes them all.

See issue #212.
2015-03-04 21:28:38 +01:00
Peinthor Rene
8d55fd5c48 cvsparse: used wrong var for last row check 2015-02-05 15:58:42 +01:00
Peinthor Rene
97e2025cc9 cvsparser: Newly implemented CSV Parser
Moved parser into it's own class
This parser now proper supports new lines in quoted text
and returns a QVector<QStringList> result.
2014-09-02 18:05:04 +02:00