There are two code paths for taking a logical backup of a Dolt database. One pushes all the existing table files to the backup destination and then sets it up so that it points at the same root value as the existing database. The other code path does the merkle dag walk to push all the missing chunks to the destination store, starting from the desired root value chunk. If the destination store is missing a lot of data, this later path currently requires a lot of bookkeeping to keep track of what has been pushed so far and what still needs to be pushed. This is expensive in memory. It requires walking large portions of the existing database, chunk by chunk, which can be expensive in CPU and in I/O seeks. This code change seeks to use the code path which uploads existing table files more often when it is the best choice. In order to do so, the code path is now willing to convert a journal file, which should never be pushed to a remote or a backup, into a table file. It will do this on the fly, and will only upload the resulting table file to the destination. The table file does not become part of the source store, and this code path has no interaction with GC. For now, there are some hardcoded heuristics for when to prefer pushing existing table files rather than trying to build the upload chunk-by-chunk. This PR uses existing table files when: the destination store is empty (has a zero root hash value) and there is no existing journal file or the existing journal file is less than 20% of the total repo size and the existing journal file is less than 16GB.
BATS - Bash Automated Testing System
BATS is used to integration test dolt. Our BATS tests started as a humble suite of integration tests. Over two years
of development the suite has grown to over 1,000 tests. When we find a customer facing bug in the dolt command line or
SQL implementation, we cover it with a BATS test. These tests are run on every dolt PR on Mac, Windows, and Linux using
GitHub Actions.
These tests are also useful documentation. If you are wondering how a certain command or feature works in practice,
using grep to find the appropriate BATS test can give you some simple examples of happy path and error case behavior.
The naming conventions for the test files have evolved over time. Generally, the files are named after the feature the
file intends to test. However, some of the early tests are named after the schema of the table they implement
ie. 1pk5col-ints.bats. These files were implemented to reuse setup and teardown logic. This scheme was quickly
abandoned but the legacy remains.
If you find a bug in dolt, we would love a skipped bats test PR in addition to a GitHub issue.
Running for yourself
- Install bats.
npm install -g bats
- Install
doltand its utilities.
cd go/cmd/dolt && go install . && cd -
cd go/store/cmd/noms && go install . && cd -
cd go/utils/remotesrv && go install . && cd -
- Make sure you have
python3installed.
This came with my Mac Developer Tools and was on my PATH.
pip install mysql-connector-python,pip install pyarrowandpip install pandas
I also needed this specific version on the python mysql.connector. pip install mysql.connector mostly worked but caused some SSL errors.
pip3 install mysql-connector-python
pip3 install pyarrow
pip3 install pandas
- Install
parquetand its dependencies
I used Homebrew on Mac to install parquet. You also need to install hadoop and set PARQUET_RUNTIME_JAR to get bats to work. Here's what I ended up running.
brew install parquet-cli
brew install hadoop
export PARQUET_RUNTIME_JAR=/opt/homebrew/opt/parquet-cli/libexec/parquet-cli-1.12.3-runtime.jar
- Go to the directory with the bats tests and run:
bats .
This will run all the tests. Specify a particular .bats file to run only those tests.
Here Docs
BATS tests in Dolt make extensive use of Here Docs.
Common patterns include piping SQL scripts to dolt sql:
dolt sql <<SQL
CREATE TABLE my_table (pk int PRIMARY KEY);
SQL
And creating data files for import:
cat <<DELIM > data.csv
pk,c1,c2
1,1,1
2,2,2
DELIM
dolt table import -c -pk=pk my_table data.csv
Skipped BATS
Various tests are skipped as TODOs and/or as documentation of known bugs. Eg:
@test "..." {
...
skip "this test is currently failing because..."
}
Skipped BATS can still be partially useful for testing as they execute normally up to skip statement.
More Information
We published a blog entry on BATS with more information and some useful tips and tricks.