At the moment space is inserted between all tokens from which a type
consists. This adds extra spaces to types like VARCHAR(5) which become
"VARCHAR ( 5 )" which causes problems in some applications.
This patch modifies the way tokens are concatenated for a type. It makes
sure that the extra space isn't inserted before "(" and ")" and also
after "(".
This commit bundles a number of smaller optimisations in the CSV parser
and import code. They do add up to a noticible speed gain though (at
least on some systems and configurations).
We were separating the CSV import into two steps: parsing the CSV file
and inserting the parsed data. This had the advantages that it keeps the
parsing code and the database code nicely separated and that we have
full knowledge of the CSV file when we start inserting the data into the
database. However, this made it necessary to keep the entire parser
results in RAM. For large CSV files this uses enormous amounts of
memory.
This commit changes the import to parse the first 20 lines and analyse
them. This should give us a good impression of what to expect from the
rest of the file. Based on that information we then parse the file row
by row and insert each row into the database as soon as it is parsed.
This means we only have to keep one row at a time in memory while more
or less keeping the possibility to analyse the file before inserting
data.
On my system this does seem to change the runtime for small files which
take a little longer now (<5%), though these measurements aren't
conclusive. For large files it, however, it changes memory consumption
from using all memory and starting to swap within seconds to almost no
memory consumption at all. And not having to swap speeds things up a
lot.
When parsing a CSV file we used to check the column count for each row
and track the highest number of columns that we found. This information
then could be used to create an INSERT statement large enough for all
the data.
This column number tracking code is removed by this commit. Instead it
analyses the first 20 rows only. It does that while generating the field
list.
Performance-wise this should take a (very) little longer but makes it
easier to improve the performance in other ways later which should more
than compensate this commit.
Feature-wise this should fix some (technically invalid) corner-case CSV
files with fewer fields in the title row than in the other rows. It
should also break some other (technically invalid) corner-case CSV files
if they are imported into an existing table and have less columns than
the existing table in their first 20 rows but later on the exact same
number. Both cases, I think, don't matter too much.
Simplify the code by storing the flag that indicates if the parsing was
successful in the parsed object itself instead of handing around pairs
of parsed objects and bools.
Foreign keys used to be stored along with the column information even
though it's more or less a table constraint. However, as we only support
single column foreign keys it was easier to store it inside that single
column. Now with multi-column foreign keys coming, a mechanism has been
introduced to store those multi-column foreign keys in the table data.
This lead to two different storing places for foreign key information:
inside the field for one-column foreign keys and inside the table for
multi-column foreign keys. This commit deletes the foreign key storage
inside fields and changes all code to use the table storage.
This changes the SQL grammar parser so that it parses foreign key
clauses instead of just reading to the end of the clause when
encoutering one. This allows using the information inside the clause
later in a more effective way. However, as of now this isn't used yet.
This commit only attempts to imitate the old behaviour using the new
approach (and might fail doing so, causing new errors...).
Instead of a single executable running different unit tests at the
same time, split the sqlobjects and import parts out of it.
While this currently duplicates the cmake boilerplate for each,
it allows to finetune each properly (like build only the sources for
it, in the future), and to call each separately.
Add the QTEST_MAIN in each test, and remove the manual QCoreApplication
handling in TestImport (handled by QTEST_MAIN).
Instead of a separate CMakeLists.txt for the tests, make them built
together with the rest of the main project. This behaviour is off
by default, and can be enabled using ENABLE_TESTING.
Furthermore, the testing facilities of cmake are now used, so ctest
(invoked by `make test`) knows about the sqlb-unittests. Thus, adapt
the Travis build steps, building the main sources and executing the
tests twice, one for sqlite and one for sqlcipher.
Make sure to write the temporary CSV file in the proper encoding
(i.e. the one specified by the test data), and to use that encoding
when reading back from it.
This way the test should behave correctly, no matter the current
system charset.
Furthermore, fix and extend unicode data: the current utf8chars is
actually UTF-16 data, so rename it and change its encoding as such.
Add a proper utf8chars data with UTF-8-only characters.
The argument count (first parameter) is a reference, and thus must be
kept alive for the whole lifetime of the QCoreApplication instance,
as also the Qt apidocs say.
Also properly create a "string array" for the actual args, instead
of badly casting a string to that.
This fixes sporadic crashes in this test.
SQL allows you to use two quote characters instead of just one in order
to escape them. Example: ['aa''bb'] is read as [aa'bb]. Add detection
for these doubled quote characters to our grammar parser.
See issue #128.
Use a common format for all include guards, make sure each header file
has one and make sure it's named after the file name.
And as a random extra in this commit: Make sure the gen_version.h file
generated by cmake ends with a line break.
Closes#59.
Now that the dependeny to the main window is removed from the
DBBrowserDB class we can get rid of all the dialogs and widget related
stuff in the cmake file used for the unit test project. Because our
Application class depends on the main window, too, the code for the CSV
tests is changed to use Qt's standard QApplication class.
Add a new test class for testing the import functionality. Currently
it's only covers some test cases for the CSV import.
Since the function to test here (DBBrowserDB::decodeCSV) is part of the
DBBrowserDB class, that class has a reference to the main window and the
main window basically depends on the entire rest of the project the
makefile grew quite a bit unfortunately.