Instead of writing blobs using base64 we now write the values as
hex (16 per row)
Blob(00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f\n10)
Towards #1837
As one step towards #1819, we've created MapMutator, which can take a
bunch of (what would normally be) Map.Set() calls and batch them up to
be applied all at once. The keys and values are held in a LevelDB cache
until everything's done. Usage looks like this:
m := types.NewMap()
mx := m.Mx()
mx = mx.Set(String("foo"), String("bar")).Set(String("baz"), Number(42))
m = mx.Finish()
We intend to make this the only way to modify collections, but at first
this will only work on an empty NomsMap.
Previously when we had an ordered (set/map) prolly tree containing
non-ordered values (blobs/refs/etc), we'd put the ref of the largest
value in each meta node, complete with its full type info and height.
This is wasteful, all we really need is the hash of the largest item for
searching the tree. In lieu of encoding just the hash - which isn't a
value - this patch creates a fake Ref<Boolean> with height 0.
In Go: there are already similar tests for numbers, but testing structs
hits the non-scalar value prolly tree code.
In JS: the set and map tests hadn't even been ported from Go, so I
ported most of their functionality.
Go creates the slice with nil for the type of the field. nil is
never a valid type and when there are recursive types we may end
up calling Hash() on the field type which panics.
Towards #1881
ChunkStores provide a Version() method so that anyone directly
using a ChunkStore (e.g. BatchStoreAdaptor) can retrieve and
check the version of the underlying store.
remoteDatabaseServer checks the version of the ChunkStore it's
backed by at startup, and then provides that version as an HTTP
header to all clients. In Go, httpBatchStore checks this header
anytime it gets a response from the server and bails if there's
version skew.
In Go, the responsibility for checking whether the running code and
the data being accessed lies with the BatchStore layer. In JS, there
is code in fetch.js that checks the header mentioned above.
Towards #1561
Fix sequence chunker bug triggered by repeatedly removing last element
The bug is we sometimes create a prollytree with a root meta sequence
node with only a single item. This is never the canonical representation
of prollytrees.
I reworked the sequence chunker to take a different approach to corner
cases. Instead of being smart and avoiding this case (which clearly
didn't work properly), it's more liberal with creating unnecessary
nodes, then it fixes them up in the finalisation step.
In discussing the patch that added parallelism, raf and I realized
that it's possible to be a bit more aggressive in the cases where one
queue is 'taller' than the other. In the current code, in that case,
we will parallelize work on all the Refs from the taller queue that
have a strictly higher ref-height than the head of the shorter queue.
We realized that it's safe to also take Refs from the taller queue
that are the SAME height as those at the top of the shorter queue,
as long as you handle common Refs correctly.
Fixes#1818
The basic approach here is to take the max of the heights of the
source and sink queues, then grab all the refs of that height from
both and sort them into three sets: refs in the source, refs in the
sink, and refs in both. These are then processed in parallel and the
reachable refs are all added to the appropriate queue. Repeat as long
as stuff still shows up in the source queue.
Fixes#1564
The bigger change here is having chunks.DeserializeToChan take a
channel of *Chunk, instead of Chunk. It saves a couple of seconds
when committing the sfcrime dataset.
Also, get rid of a d.Chk.Equal in favor of a d.Chk.True and increase
the batch size that ValidatingBatchSink uses to reduce stalls due to
putting into a ChunkStore.
At some point, the Go ValueStore code started caching the hash of a
Chunk as a validation 'hint' for the Chunk itself. JS never did
this. This must have addressed some edge case back when it was fatal
for validation code to run across a Chunk it hadn't seen and didn't
have a hint for. The downside is that doing this can cause us to send
a hint for every novel Chunk present in a writeValue payload in the
worst case.
Since I can't remember the edge case, and that edge case will no
longer be fatal anyway, removing this to avoid the (potentially
terrible) downside makes sense.
Change Dataset.Pull to use a single algorithm to pull data from a
source to a sink, regardless of which (if any) is local. The basic
algorithm is described in the first section of pulling.md. This
implementation is equivalent but phrased a bit differently. The
algorithm actually used is described in the second section of
pulling.md
The main changes:
- datas.Pull(), which implements the new pulling algorithm
- RefHeap, a priority queue that sorts types.Ref by ref-height and
then by ref.TargetHash()
- Add has() to both Database implementations. Cache has() checks.
- Switch Dataset to use new datas.Pull(). Currently not concurrent.
Toward #1568
Mostly, prune reachableChunks