Files
Aaron Son a7848925be go: Optimize backups to more often transit existing table files instead of doing a merkle dag walk to push missing chunks.
There are two code paths for taking a logical backup of a Dolt database. One
pushes all the existing table files to the backup destination and then sets it
up so that it points at the same root value as the existing database. The other
code path does the merkle dag walk to push all the missing chunks to the
destination store, starting from the desired root value chunk.

If the destination store is missing a lot of data, this later path currently
requires a lot of bookkeeping to keep track of what has been pushed so far and
what still needs to be pushed. This is expensive in memory. It requires walking
large portions of the existing database, chunk by chunk, which can be expensive
in CPU and in I/O seeks.

This code change seeks to use the code path which uploads existing table files
more often when it is the best choice. In order to do so, the code path is now
willing to convert a journal file, which should never be pushed to a remote or
a backup, into a table file. It will do this on the fly, and will only upload
the resulting table file to the destination. The table file does not become
part of the source store, and this code path has no interaction with GC.

For now, there are some hardcoded heuristics for when to prefer pushing
existing table files rather than trying to build the upload chunk-by-chunk.
This PR uses existing table files when: the destination store is empty (has a
zero root hash value) and there is no existing journal file or the existing
journal file is less than 20% of the total repo size and the existing journal
file is less than 16GB.
2026-03-04 11:28:55 -08:00
..
2026-01-08 09:55:33 -08:00
2025-09-18 00:17:11 +00:00
2026-01-08 16:37:49 -08:00
2025-01-06 11:22:25 -08:00
2026-02-26 17:45:43 -08:00
2023-12-01 11:10:55 -08:00
2025-09-24 13:27:08 -07:00
2025-04-10 11:26:41 -07:00
2025-03-07 12:12:24 -08:00
2025-07-14 14:26:43 -07:00
2025-10-16 09:59:45 -07:00
2025-09-02 01:39:42 -07:00
2025-06-24 14:07:32 -07:00
2025-07-10 09:55:16 -07:00
2025-07-10 16:54:03 -07:00
2026-01-08 12:37:14 -08:00
2025-05-28 09:11:45 -07:00
2026-02-10 12:35:20 -08:00
2026-02-11 12:03:34 -08:00
2025-08-27 16:30:48 -07:00
2025-12-18 20:33:50 +00:00
2024-03-27 16:54:46 -07:00
2025-07-29 12:04:17 -07:00
2025-10-16 09:59:45 -07:00
2026-01-07 16:55:39 -08:00
2025-07-07 10:01:35 -07:00
2024-09-10 16:49:02 -07:00
2025-09-10 13:52:12 -07:00
2025-10-16 09:59:45 -07:00
2025-03-19 03:48:45 +10:00
2025-08-04 11:46:46 -07:00
2023-10-31 15:43:05 -07:00
2023-11-29 18:43:36 -08:00
2026-02-26 14:02:21 -08:00
2026-02-20 14:37:17 -08:00
2025-05-22 13:00:41 -07:00
2025-06-10 10:59:52 -07:00
2026-02-13 13:32:24 -08:00
2026-01-22 09:19:19 -08:00
2026-02-10 12:53:13 -08:00
2026-01-06 14:01:29 -08:00

BATS - Bash Automated Testing System

BATS is used to integration test dolt. Our BATS tests started as a humble suite of integration tests. Over two years of development the suite has grown to over 1,000 tests. When we find a customer facing bug in the dolt command line or SQL implementation, we cover it with a BATS test. These tests are run on every dolt PR on Mac, Windows, and Linux using GitHub Actions.

These tests are also useful documentation. If you are wondering how a certain command or feature works in practice, using grep to find the appropriate BATS test can give you some simple examples of happy path and error case behavior.

The naming conventions for the test files have evolved over time. Generally, the files are named after the feature the file intends to test. However, some of the early tests are named after the schema of the table they implement ie. 1pk5col-ints.bats. These files were implemented to reuse setup and teardown logic. This scheme was quickly abandoned but the legacy remains.

If you find a bug in dolt, we would love a skipped bats test PR in addition to a GitHub issue.

Running for yourself

  1. Install bats.
npm install -g bats
  1. Install dolt and its utilities.
cd go/cmd/dolt && go install . && cd -
cd go/store/cmd/noms && go install . && cd -
cd go/utils/remotesrv && go install . && cd -
  1. Make sure you have python3 installed.

This came with my Mac Developer Tools and was on my PATH.

  1. pip install mysql-connector-python, pip install pyarrow and pip install pandas

I also needed this specific version on the python mysql.connector. pip install mysql.connector mostly worked but caused some SSL errors.

pip3 install mysql-connector-python
pip3 install pyarrow
pip3 install pandas
  1. Install parquet and its dependencies

I used Homebrew on Mac to install parquet. You also need to install hadoop and set PARQUET_RUNTIME_JAR to get bats to work. Here's what I ended up running.

brew install parquet-cli
brew install hadoop
export PARQUET_RUNTIME_JAR=/opt/homebrew/opt/parquet-cli/libexec/parquet-cli-1.12.3-runtime.jar
  1. Go to the directory with the bats tests and run:
bats . 

This will run all the tests. Specify a particular .bats file to run only those tests.

Here Docs

BATS tests in Dolt make extensive use of Here Docs. Common patterns include piping SQL scripts to dolt sql:

    dolt sql <<SQL
CREATE TABLE my_table (pk int PRIMARY KEY);
SQL

And creating data files for import:

    cat <<DELIM > data.csv
pk,c1,c2
1,1,1
2,2,2
DELIM
    dolt table import -c -pk=pk my_table data.csv

Skipped BATS

Various tests are skipped as TODOs and/or as documentation of known bugs. Eg:

@test "..." {
    ...
    skip "this test is currently failing because..."
}

Skipped BATS can still be partially useful for testing as they execute normally up to skip statement.

More Information

We published a blog entry on BATS with more information and some useful tips and tricks.