ChunkStores provide a Version() method so that anyone directly
using a ChunkStore (e.g. BatchStoreAdaptor) can retrieve and
check the version of the underlying store.
remoteDatabaseServer checks the version of the ChunkStore it's
backed by at startup, and then provides that version as an HTTP
header to all clients. In Go, httpBatchStore checks this header
anytime it gets a response from the server and bails if there's
version skew.
In Go, the responsibility for checking whether the running code and
the data being accessed lies with the BatchStore layer. In JS, there
is code in fetch.js that checks the header mentioned above.
Towards #1561
Fix sequence chunker bug triggered by repeatedly removing last element
The bug is we sometimes create a prollytree with a root meta sequence
node with only a single item. This is never the canonical representation
of prollytrees.
I reworked the sequence chunker to take a different approach to corner
cases. Instead of being smart and avoiding this case (which clearly
didn't work properly), it's more liberal with creating unnecessary
nodes, then it fixes them up in the finalisation step.
Apparently, the following:
s := []byte("")
s = append(s, 1, 2, 3)
f := append(s, 10, 20, 30)
g := append(s, 4, 5, 6)
Results in both f and g being [1, 2, 3, 4, 5, 6]
This was happening to us in NewLevelDBStore, so ldb.chunkPrefix was
getting set to "/chunk/" and ldb.rootKey was "/chun" (the first 5
bytes of "/chunk/") instead of "/root". This patch fixes it, but
invalidates all existing LevelDBs.
In discussing the patch that added parallelism, raf and I realized
that it's possible to be a bit more aggressive in the cases where one
queue is 'taller' than the other. In the current code, in that case,
we will parallelize work on all the Refs from the taller queue that
have a strictly higher ref-height than the head of the shorter queue.
We realized that it's safe to also take Refs from the taller queue
that are the SAME height as those at the top of the shorter queue,
as long as you handle common Refs correctly.
Fixes#1818
This reverts commit ba909929c2, but
tweaks the size of the memory cache so that, hopefully, Travis
doesn't barf. Maybe once travis is dead we can size up.
Fixes#1817
gzip is such a processor hog that using it to compress responses
makes reading noms data from the server take almost 4 times as long.
snappy still has pretty good compression ratios and reduces the time
to export sfcrime from 140s to about 40s with the Go csv exporter.
Adding some parallelism when httpBatchStore.getRefs() is deserializing
chunks from the server's response gets us 4 or 5 more seconds, down to
about 36s on my machine.
Toward #1764
The basic approach here is to take the max of the heights of the
source and sink queues, then grab all the refs of that height from
both and sort them into three sets: refs in the source, refs in the
sink, and refs in both. These are then processed in parallel and the
reachable refs are all added to the appropriate queue. Repeat as long
as stuff still shows up in the source queue.
Fixes#1564