This is opposed to chunked sets/maps/lists that were constructed. The
difference is that constructed they will have their chunk values stored
in memory, whereas from a chunkstore they only have refs.
There was a bug fixed in abacc7644a which
caused this to crash. This patch adds tests and an extra assertion.
It's already propagated for List.
The problem occurs when accessing/modifying chunked Sets and Maps that
have read from the DB then modified. For example:
set := ds.Head().(Set)
set = set.Remove(42)
// These will both crash:
set.Has(43)
set.Insert(43)
If |set| is a compoundSet then the new Set returned from Remove won't
have a ChunkSource.
- When Has is called, the set will attempt to read in chunks, but there
is no longer a ChunkSource to read from.
- When Insert is called, the set will re-chunk, which may require
reading in more chunks.
NewStreamingTypedList() reads Values from a channel and appends them
to a List, chunking as it goes and writing these chunks to a given
ChunkSink. It returns a `chan List` that the caller can get the
finished List from once he's done writing values to the `chan Value`
he provided at call-time.
The nomdl Packge dependencies are populated from nomdl imports, which
get put in a Go map, then iterated over to get refs. Unfortunately, Go
map iteration order isn't stable, and the dependency order affects the
Package's ref.
I noticed this bug when indexing data imported from the Picasa importer.
Picasa's view of the RemotePhoto package had its dependencies in
reverse order, but the indexer's view had them in the natural order, so
it wasn't recognising Picasa's imported RemotePhoto structs.
Instead of writing sequence chunks as soon as they're are created (as a
result of hitting chunk boundaries), only write them once they're
referenced - which only happens if those chunks are themselves chunked.
The effect of this is root chunks of collections/blobs aren't written
until they're committed, which makes the XML importer run twice as fast
on a month of MLB data - 60s instead of 120s, with --ldb-dump-stats
showing a PutCount of 21,272 instead of 342,254.
In the future it should be possible to avoid writing *any* chunks until
the root is committed, which will improve incremental update
performance, but that's a larger change (issue #710). This change fixes
issue #832.
The problem was that if there are chunks in the middle of a prollytree
with only a single item, which can happen if the first item in a
sequence was a chunk boundary, in some circumstances sequenceChunker
would think that it's the root of the tree.
This is done by creating a cursor for each set. This is a cursor for
the actual values in the sets. We then pick the "smallest" value from
the cursors and advance that cursor. This continues until we have
exhausted all the cursors.
setA.Union(set0, ... setN)
The time complexity is O(len(setA) + len(set0)) + ... len(setN))
This reverts commit 60ab9c7f0c.
Fixes initial patch to correctly use test harness. Implementation is
based on newTypedSet(), so hopefully has similar performance
characteristics.