Files
dolt/integration-tests
Aaron Son a7848925be go: Optimize backups to more often transit existing table files instead of doing a merkle dag walk to push missing chunks.
There are two code paths for taking a logical backup of a Dolt database. One
pushes all the existing table files to the backup destination and then sets it
up so that it points at the same root value as the existing database. The other
code path does the merkle dag walk to push all the missing chunks to the
destination store, starting from the desired root value chunk.

If the destination store is missing a lot of data, this later path currently
requires a lot of bookkeeping to keep track of what has been pushed so far and
what still needs to be pushed. This is expensive in memory. It requires walking
large portions of the existing database, chunk by chunk, which can be expensive
in CPU and in I/O seeks.

This code change seeks to use the code path which uploads existing table files
more often when it is the best choice. In order to do so, the code path is now
willing to convert a journal file, which should never be pushed to a remote or
a backup, into a table file. It will do this on the fly, and will only upload
the resulting table file to the destination. The table file does not become
part of the source store, and this code path has no interaction with GC.

For now, there are some hardcoded heuristics for when to prefer pushing
existing table files rather than trying to build the upload chunk-by-chunk.
This PR uses existing table files when: the destination store is empty (has a
zero root hash value) and there is no existing journal file or the existing
journal file is less than 20% of the total repo size and the existing journal
file is less than 16GB.
2026-03-04 11:28:55 -08:00
..
2026-02-27 13:28:38 -08:00