Commit Graph

330 Commits

Author SHA1 Message Date
Chris Masone
599fb8b173 Don't pass ChunkStores to collection leaves
It turns out that the collection leaves don't actually use the
ChunkStore we give them, so stop passing it to them.
2016-02-26 13:08:16 -08:00
Benjamin Kalman
b60f1a0e7e Add tests for modifying chunked sets/maps/lists read from a chunkstore.
This is opposed to chunked sets/maps/lists that were constructed. The
difference is that constructed they will have their chunk values stored
in memory, whereas from a chunkstore they only have refs.

There was a bug fixed in abacc7644a which
caused this to crash. This patch adds tests and an extra assertion.
2016-02-25 15:53:18 -08:00
Benjamin Kalman
abacc7644a Propagate the ChunkSource through modifications to Set and Map.
It's already propagated for List.

The problem occurs when accessing/modifying chunked Sets and Maps that
have read from the DB then modified. For example:

  set := ds.Head().(Set)
  set = set.Remove(42)
  // These will both crash:
  set.Has(43)
  set.Insert(43)

If |set| is a compoundSet then the new Set returned from Remove won't
have a ChunkSource.
- When Has is called, the set will attempt to read in chunks, but there
  is no longer a ChunkSource to read from.
- When Insert is called, the set will re-chunk, which may require
  reading in more chunks.
2016-02-24 16:14:22 -08:00
Aaron Boodman
befeac553d Merge pull request #981 from aboodman/vendor-aws
Vendor using 'vendor' directory instead of Godeps
2016-02-09 09:46:38 -08:00
Aaron Boodman
cff0de3696 non-vendor changes 2016-02-08 23:15:09 -08:00
Chris Masone
83372d9596 Added newMetaTuple(), fixed new test 2016-02-08 13:47:21 -08:00
Chris Masone
937dd624d0 Introduce NewStreamingTypedList()
NewStreamingTypedList() reads Values from a channel and appends them
to a List, chunking as it goes and writing these chunks to a given
ChunkSink. It returns a `chan List` that the caller can get the
finished List from once he's done writing values to the `chan Value`
he provided at call-time.
2016-02-08 13:47:21 -08:00
Aaron Boodman
2c05a26c0b Vendor buzhash using Git submodules and Go 1.5 vendoring rather than Godeps 2016-02-07 15:37:16 -08:00
John Huang
a14cd21af4 CSV exporter to stdout 2016-02-05 10:39:39 -08:00
Rafael Weinstein
b8c399417f Reduce size of test collections 2016-02-03 10:19:57 -08:00
Rafael Weinstein
4d77492c46 JS Chunking 2016-02-02 16:13:54 -08:00
Rafael Weinstein
6c3239a1d0 Collections no longer need a ChunkStore on creation 2016-02-02 13:39:26 -08:00
Rafael Weinstein
4f93398e49 dont drop refs to child chunks on write for p-trees 2016-02-02 13:22:56 -08:00
Benjamin Kalman
8cca9354c1 Move sorting to NewPackage. 2016-01-22 18:43:41 -08:00
Benjamin Kalman
bfc987ae14 Force Package dependencies to have a stable ordering.
The nomdl Packge dependencies are populated from nomdl imports, which
get put in a Go map, then iterated over to get refs. Unfortunately, Go
map iteration order isn't stable, and the dependency order affects the
Package's ref.

I noticed this bug when indexing data imported from the Picasa importer.
Picasa's view of the RemotePhoto package had its dependencies in
reverse order, but the indexer's view had them in the natural order, so
it wasn't recognising Picasa's imported RemotePhoto structs.
2016-01-22 16:01:43 -08:00
Rafael Weinstein
3f8e608cd1 fixupTypeRef avoids creating a new value when no work is done 2016-01-12 16:57:05 -08:00
Benjamin Kalman
c3322dbb2d sequenceChunker cleanup to replace pendingFirst slice with a bool. 2016-01-08 14:15:37 -08:00
Benjamin Kalman
4a090617a5 Don't write sequence chunks until the root metaSequence is written. 2016-01-08 10:59:43 -08:00
Benjamin Kalman
e27980dbd3 Don't calculate metaSequence chunk refs until necessary.
This saves a lot of work for the XML importer.
2016-01-07 17:17:32 -08:00
Benjamin Kalman
6ac90bbd9c Store metaTuple.child as its Value not its internal Value. 2016-01-07 17:09:45 -08:00
Benjamin Kalman
232492003d Lazily write sequence chunks.
Instead of writing sequence chunks as soon as they're are created (as a
result of hitting chunk boundaries), only write them once they're
referenced - which only happens if those chunks are themselves chunked.

The effect of this is root chunks of collections/blobs aren't written
until they're committed, which makes the XML importer run twice as fast
on a month of MLB data - 60s instead of 120s, with --ldb-dump-stats
showing a PutCount of 21,272 instead of 342,254.

In the future it should be possible to avoid writing *any* chunks until
the root is committed, which will improve incremental update
performance, but that's a larger change (issue #710). This change fixes
issue #832.
2016-01-05 15:57:45 +11:00
Ben Kalman
ce334cfebb Merge pull request #790 from kalman/setmap-fix-tests
Fix bug in sequenceChunker when rechunking single item chunks.
2015-12-29 10:33:39 +11:00
Benjamin Kalman
c9b20fbb21 Make compoundList.Remove convert return value to List not compoundList. 2015-12-28 12:52:41 +11:00
Benjamin Kalman
c2f0ed3e08 Fix bug in sequenceChunker when rechunking single item chunks.
The problem was that if there are chunks in the middle of a prollytree
with only a single item, which can happen if the first item in a
sequence was a chunk boundary, in some circumstances sequenceChunker
would think that it's the root of the tree.
2015-12-24 10:58:31 +11:00
Erik Arvidsson
d915d748e9 Merge pull request #799 from arv/cursor-at-first
Clean up compoundSet sequenceCursorAtFirst
2015-12-21 14:57:33 -08:00
Erik Arvidsson
483beb3863 Clean up compoundSet sequenceCursorAtFirst
Fixes #795
2015-12-21 13:49:54 -08:00
Erik Arvidsson
dad5dda1ec Add runtime type checks to compound map
Fixes #812
2015-12-21 13:44:55 -08:00
Erik Arvidsson
81c0f03a43 Add runtime type assertions for compound list 2015-12-21 13:35:48 -08:00
Erik Arvidsson
e28852f4c0 Merge pull request #802 from arv/compound-set-missing-type-check
Add missing runtime type check to compoundSet Insert
2015-12-21 09:56:02 -08:00
Erik Arvidsson
a15ed4963b Add missing runtime type check to compoundSet Insert 2015-12-17 15:38:12 -05:00
Erik Arvidsson
1edbee5f34 Remove Set Subtract 2015-12-17 15:23:28 -05:00
Erik Arvidsson
b8be6908f8 Implement Set Union
This is done by creating a cursor for each set. This is a cursor for
the actual values in the sets. We then pick the "smallest" value from
the cursors and advance that cursor. This continues until we have
exhausted all the cursors.

  setA.Union(set0, ... setN)

The time complexity is O(len(setA) + len(set0)) + ... len(setN))
2015-12-17 10:18:04 -05:00
Rafael Weinstein
27d5f0d240 Ensure sequenceChunker.Done() returns an internal type so that callers dont have to 2015-12-17 06:20:28 -08:00
Chris Masone
a70c21116a Some compound{Set,Map} cleanup
Remove a bad comment, one-line a few things in Filter()
2015-12-16 15:57:22 -08:00
cmasone-attic
3163ff00dc Merge pull request #781 from cmasone-attic/mapfilter
Implement compoundMap Filter()
2015-12-16 15:50:37 -08:00
Dan Willhite
19228ba9d8 Merge pull request #784 from willhite/panics
Implement Filter on compoundList.
2015-12-16 14:06:57 -08:00
Dan Willhite
20f22e1020 Implement Filter on compoundList. 2015-12-16 13:32:00 -08:00
Chris Masone
a30209bcc7 Implement compoundMap Filter() 2015-12-16 12:57:41 -08:00
Chris Masone
57e2303a62 Re-land "Implement compoundSet.Filter()"
This reverts commit 60ab9c7f0c.

Fixes initial patch to correctly use test harness.  Implementation is
based on newTypedSet(), so hopefully has similar performance
characteristics.
2015-12-16 12:53:13 -08:00
cmasone-attic
60ab9c7f0c Revert "Implement compoundSet.Filter()" 2015-12-16 12:42:33 -08:00
cmasone-attic
f7f7cfaab0 Merge pull request #777 from cmasone-attic/depanic
Implement compoundSet.Filter()
2015-12-16 12:32:43 -08:00
Benjamin Kalman
2f22372c86 Implement chunked insert/remove for sets and maps.
There are some corner case failing tests, but this may be an existing bug
in the sequence chunker.
2015-12-16 11:02:41 -08:00
Dan Willhite
4e4cff2bd2 Merge pull request #779 from willhite/panics
Implement Map and MapP on compoundList.
2015-12-16 09:55:42 -08:00
Rafael Weinstein
9b107c145f Fix xml-importer & pitchmap/indexer 2015-12-15 21:33:44 -08:00
Chris Masone
18c446e934 Add set-equivalence check to unit test 2015-12-15 20:08:51 -08:00
Dan Willhite
4268ded39c Implement Map and MapP on compoundList. 2015-12-15 17:53:51 -08:00
Dan Willhite
eeea1e2056 Merge pull request #773 from willhite/panics
Implement IterAllP on compoundList.
2015-12-15 17:52:32 -08:00
Dan Willhite
efd9a2e00c Implement IterAllP on compoundList. 2015-12-15 17:42:05 -08:00
Chris Masone
8171014b20 remove comment 2015-12-15 17:37:25 -08:00
Chris Masone
0d54e01dda Implement compoundSet.Filter()
Implementation is based on newTypedSet(), so hopefully has similar
performance characteristics.
2015-12-15 17:14:11 -08:00