There are two code paths for taking a logical backup of a Dolt database. One
pushes all the existing table files to the backup destination and then sets it
up so that it points at the same root value as the existing database. The other
code path does the merkle dag walk to push all the missing chunks to the
destination store, starting from the desired root value chunk.
If the destination store is missing a lot of data, this later path currently
requires a lot of bookkeeping to keep track of what has been pushed so far and
what still needs to be pushed. This is expensive in memory. It requires walking
large portions of the existing database, chunk by chunk, which can be expensive
in CPU and in I/O seeks.
This code change seeks to use the code path which uploads existing table files
more often when it is the best choice. In order to do so, the code path is now
willing to convert a journal file, which should never be pushed to a remote or
a backup, into a table file. It will do this on the fly, and will only upload
the resulting table file to the destination. The table file does not become
part of the source store, and this code path has no interaction with GC.
For now, there are some hardcoded heuristics for when to prefer pushing
existing table files rather than trying to build the upload chunk-by-chunk.
This PR uses existing table files when: the destination store is empty (has a
zero root hash value) and there is no existing journal file or the existing
journal file is less than 20% of the total repo size and the existing journal
file is less than 16GB.
* Support staging new tables with dolt add -p
When using 'dolt add -p' to stage rows from a new table (one that exists
in working but not in staging), the workspace table UPDATE mechanism
previously failed with 'table not found' because GetTableWriter looked
for the table in the staging root where it didn't exist yet.
This change adds ensureTableExistsInStaging() which:
1. Checks if the table exists in staging (fast path - no-op)
2. If not, checks if it exists in working
3. If yes, creates an empty table in staging with the same schema
4. Updates the session state to reflect the new staging root
The table is created empty (not copied with data) because 'dolt add -p'
allows partial staging - each row selected by the user will be inserted
individually into the staging table via the workspace table UPDATE.
Also adds unit tests and BATS integration tests for the new functionality.
* Update partial staging test to use correct column names and verify exact output
* Update failing test to use csv comparison
Locally dolt will render bool values as true/false, so the remote-engine
test fails because it shows 1/0. Additionally whitespace will probably
also an issue. Outputting in a csv format should result in the same data
to compare in both scenarios.
* pretty the other new tests
* force output to always be true/false
When StatementBegin encounters an error (e.g., table not found in staging
root), it stores the error in wtu.err but leaves tableWriter as nil. The
Update and Delete methods were dereferencing tableWriter before checking
if it was nil, causing a panic.
This fix adds an early return to check for errors from StatementBegin
before attempting to use tableWriter, preventing the nil pointer
dereference.