When reading a meta sequence, the ChildRef of each metaTuple
is populated directly during decoding. This should be a
Ref<S> for a sequence of Type S. The old code was putting
in a Ref<T> for a sequence of Type S<T>.
To facilitate validation, DataStore needs to remember which chunks
it's seen, what their refs are, and the Noms type of the Values they
encode. Then, DataStore can look at each Value that comes in via
WriteValue() and validate it by checking every embedded ref (if any)
against this cache.
Towards #654
In pursuit of issue #654, we want to be able to figure out all the
refs contained in a given Value, along with the Types of the Values to
which those refs point. Value.Chunks() _almost_ met those needs, but
it returned a slice of ref.Ref, which doesn't convey any type info.
To address this, this patch does two things:
1) RefBase embeds the Value interface, and
2) Chunks() now returns []types.RefBase
RefBase now provides Type() as well, by virtue of embedding Value, so
callers can just iterate through the slice returned from Chunks() and
gather type info for all the refs embedded in a given Value.
I went all the way and made RefBase a Value instead of just adding the
Type() method because both types.Ref and the generated Ref types are
actually all Values, and doing so allowed me to change the definition of
refBuilderFunc in package_registry.go to be more precise. It now returns
RefBase instead of just Value.
This patch is the first step in moving all reading and writing to the
DataStore API, so that we can validate data commited to Noms.
The big change here is that types.ReadValue() no longer exists and is
replaced with a ReadValue() method on DataStore. A similar
WriteValue() method deprecates types.WriteValue(), but fully removing
that is left for a later patch. Since a lot of code in the types
package needs to read and write values, but cannot import the datas
package without creating an import cycle, the types package exports
ValueReader and ValueWriter interfaces, which DataStore implements.
Thus, a DataStore can be passed to anything in the types package which
needs to read or write values (e.g. a collection constructor or
typed-ref)
Relatedly, this patch also introduces the DataSink interface, so that
some public-facing apis no longer need to provide a ChunkSink.
Towards #654
This is opposed to chunked sets/maps/lists that were constructed. The
difference is that constructed they will have their chunk values stored
in memory, whereas from a chunkstore they only have refs.
There was a bug fixed in abacc7644a which
caused this to crash. This patch adds tests and an extra assertion.
It's already propagated for List.
The problem occurs when accessing/modifying chunked Sets and Maps that
have read from the DB then modified. For example:
set := ds.Head().(Set)
set = set.Remove(42)
// These will both crash:
set.Has(43)
set.Insert(43)
If |set| is a compoundSet then the new Set returned from Remove won't
have a ChunkSource.
- When Has is called, the set will attempt to read in chunks, but there
is no longer a ChunkSource to read from.
- When Insert is called, the set will re-chunk, which may require
reading in more chunks.
NewStreamingTypedList() reads Values from a channel and appends them
to a List, chunking as it goes and writing these chunks to a given
ChunkSink. It returns a `chan List` that the caller can get the
finished List from once he's done writing values to the `chan Value`
he provided at call-time.
The nomdl Packge dependencies are populated from nomdl imports, which
get put in a Go map, then iterated over to get refs. Unfortunately, Go
map iteration order isn't stable, and the dependency order affects the
Package's ref.
I noticed this bug when indexing data imported from the Picasa importer.
Picasa's view of the RemotePhoto package had its dependencies in
reverse order, but the indexer's view had them in the natural order, so
it wasn't recognising Picasa's imported RemotePhoto structs.
Instead of writing sequence chunks as soon as they're are created (as a
result of hitting chunk boundaries), only write them once they're
referenced - which only happens if those chunks are themselves chunked.
The effect of this is root chunks of collections/blobs aren't written
until they're committed, which makes the XML importer run twice as fast
on a month of MLB data - 60s instead of 120s, with --ldb-dump-stats
showing a PutCount of 21,272 instead of 342,254.
In the future it should be possible to avoid writing *any* chunks until
the root is committed, which will improve incremental update
performance, but that's a larger change (issue #710). This change fixes
issue #832.
The problem was that if there are chunks in the middle of a prollytree
with only a single item, which can happen if the first item in a
sequence was a chunk boundary, in some circumstances sequenceChunker
would think that it's the root of the tree.
This is done by creating a cursor for each set. This is a cursor for
the actual values in the sets. We then pick the "smallest" value from
the cursors and advance that cursor. This continues until we have
exhausted all the cursors.
setA.Union(set0, ... setN)
The time complexity is O(len(setA) + len(set0)) + ... len(setN))
This reverts commit 60ab9c7f0c.
Fixes initial patch to correctly use test harness. Implementation is
based on newTypedSet(), so hopefully has similar performance
characteristics.