Commit Graph

340 Commits

Author SHA1 Message Date
Erik Arvidsson 6fad84b6e2 Normalize strings produced by Go JSON encoder
Fixes #1135
2016-03-25 10:27:45 -07:00
cmasone-attic d5e8342d7c Merge pull request #1131 from cmasone-attic/issue654
Fix ChildRef Type when reading meta sequences
2016-03-23 11:52:30 -07:00
Chris Masone 04c093f345 Fix ChildRef Type when reading meta sequences
When reading a meta sequence, the ChildRef of each metaTuple
is populated directly during decoding. This should be a
Ref<S> for a sequence of Type S. The old code was putting
in a Ref<T> for a sequence of Type S<T>.
2016-03-23 11:30:45 -07:00
Rafael Weinstein 38292e29c6 RefValue 2016-03-22 16:43:13 -07:00
Chris Masone 44bb16385c Add DataStore 'Type' cache to remember Noms type info as well.
To facilitate validation, DataStore needs to remember which chunks
it's seen, what their refs are, and the Noms type of the Values they
encode. Then, DataStore can look at each Value that comes in via
WriteValue() and validate it by checking every embedded ref (if any)
against this cache.

Towards #654
2016-03-22 16:34:26 -07:00
Chris Masone 82338bb5be Change Value.Chunks() to return []types.RefBase
In pursuit of issue #654, we want to be able to figure out all the
refs contained in a given Value, along with the Types of the Values to
which those refs point. Value.Chunks() _almost_ met those needs, but
it returned a slice of ref.Ref, which doesn't convey any type info.

To address this, this patch does two things:
1) RefBase embeds the Value interface, and
2) Chunks() now returns []types.RefBase

RefBase now provides Type() as well, by virtue of embedding Value, so
callers can just iterate through the slice returned from Chunks() and
gather type info for all the refs embedded in a given Value.

I went all the way and made RefBase a Value instead of just adding the
Type() method because both types.Ref and the generated Ref types are
actually all Values, and doing so allowed me to change the definition of
refBuilderFunc in package_registry.go to be more precise. It now returns
RefBase instead of just Value.
2016-03-21 16:13:14 -06:00
Rafael Weinstein c515b80c88 types.WriteValue -> types.EncodeValue 2016-03-18 13:53:04 -07:00
cmasone-attic cb3c00df68 Merge pull request #1089 from cmasone-attic/datastore
Move ReadValue and WriteValue to DataStore
2016-03-17 13:03:12 -07:00
Chris Masone 119a56c3a9 Move ReadValue and WriteValue to DataStore
This patch is the first step in moving all reading and writing to the
DataStore API, so that we can validate data commited to Noms.

The big change here is that types.ReadValue() no longer exists and is
replaced with a ReadValue() method on DataStore. A similar
WriteValue() method deprecates types.WriteValue(), but fully removing
that is left for a later patch. Since a lot of code in the types
package needs to read and write values, but cannot import the datas
package without creating an import cycle, the types package exports
ValueReader and ValueWriter interfaces, which DataStore implements.
Thus, a DataStore can be passed to anything in the types package which
needs to read or write values (e.g. a collection constructor or
typed-ref)

Relatedly, this patch also introduces the DataSink interface, so that
some public-facing apis no longer need to provide a ChunkSink.

Towards #654
2016-03-17 12:57:44 -07:00
Erik Arvidsson 9ac02981d1 JS: Implement blob writing 2016-03-17 11:31:07 -07:00
Chris Masone 599fb8b173 Don't pass ChunkStores to collection leaves
It turns out that the collection leaves don't actually use the
ChunkStore we give them, so stop passing it to them.
2016-02-26 13:08:16 -08:00
Benjamin Kalman b60f1a0e7e Add tests for modifying chunked sets/maps/lists read from a chunkstore.
This is opposed to chunked sets/maps/lists that were constructed. The
difference is that constructed they will have their chunk values stored
in memory, whereas from a chunkstore they only have refs.

There was a bug fixed in abacc7644a which
caused this to crash. This patch adds tests and an extra assertion.
2016-02-25 15:53:18 -08:00
Benjamin Kalman abacc7644a Propagate the ChunkSource through modifications to Set and Map.
It's already propagated for List.

The problem occurs when accessing/modifying chunked Sets and Maps that
have read from the DB then modified. For example:

  set := ds.Head().(Set)
  set = set.Remove(42)
  // These will both crash:
  set.Has(43)
  set.Insert(43)

If |set| is a compoundSet then the new Set returned from Remove won't
have a ChunkSource.
- When Has is called, the set will attempt to read in chunks, but there
  is no longer a ChunkSource to read from.
- When Insert is called, the set will re-chunk, which may require
  reading in more chunks.
2016-02-24 16:14:22 -08:00
Aaron Boodman befeac553d Merge pull request #981 from aboodman/vendor-aws
Vendor using 'vendor' directory instead of Godeps
2016-02-09 09:46:38 -08:00
Aaron Boodman cff0de3696 non-vendor changes 2016-02-08 23:15:09 -08:00
Chris Masone 83372d9596 Added newMetaTuple(), fixed new test 2016-02-08 13:47:21 -08:00
Chris Masone 937dd624d0 Introduce NewStreamingTypedList()
NewStreamingTypedList() reads Values from a channel and appends them
to a List, chunking as it goes and writing these chunks to a given
ChunkSink. It returns a `chan List` that the caller can get the
finished List from once he's done writing values to the `chan Value`
he provided at call-time.
2016-02-08 13:47:21 -08:00
Aaron Boodman 2c05a26c0b Vendor buzhash using Git submodules and Go 1.5 vendoring rather than Godeps 2016-02-07 15:37:16 -08:00
John Huang a14cd21af4 CSV exporter to stdout 2016-02-05 10:39:39 -08:00
Rafael Weinstein b8c399417f Reduce size of test collections 2016-02-03 10:19:57 -08:00
Rafael Weinstein 4d77492c46 JS Chunking 2016-02-02 16:13:54 -08:00
Rafael Weinstein 6c3239a1d0 Collections no longer need a ChunkStore on creation 2016-02-02 13:39:26 -08:00
Rafael Weinstein 4f93398e49 dont drop refs to child chunks on write for p-trees 2016-02-02 13:22:56 -08:00
Benjamin Kalman 8cca9354c1 Move sorting to NewPackage. 2016-01-22 18:43:41 -08:00
Benjamin Kalman bfc987ae14 Force Package dependencies to have a stable ordering.
The nomdl Packge dependencies are populated from nomdl imports, which
get put in a Go map, then iterated over to get refs. Unfortunately, Go
map iteration order isn't stable, and the dependency order affects the
Package's ref.

I noticed this bug when indexing data imported from the Picasa importer.
Picasa's view of the RemotePhoto package had its dependencies in
reverse order, but the indexer's view had them in the natural order, so
it wasn't recognising Picasa's imported RemotePhoto structs.
2016-01-22 16:01:43 -08:00
Rafael Weinstein 3f8e608cd1 fixupTypeRef avoids creating a new value when no work is done 2016-01-12 16:57:05 -08:00
Benjamin Kalman c3322dbb2d sequenceChunker cleanup to replace pendingFirst slice with a bool. 2016-01-08 14:15:37 -08:00
Benjamin Kalman 4a090617a5 Don't write sequence chunks until the root metaSequence is written. 2016-01-08 10:59:43 -08:00
Benjamin Kalman e27980dbd3 Don't calculate metaSequence chunk refs until necessary.
This saves a lot of work for the XML importer.
2016-01-07 17:17:32 -08:00
Benjamin Kalman 6ac90bbd9c Store metaTuple.child as its Value not its internal Value. 2016-01-07 17:09:45 -08:00
Benjamin Kalman 232492003d Lazily write sequence chunks.
Instead of writing sequence chunks as soon as they're are created (as a
result of hitting chunk boundaries), only write them once they're
referenced - which only happens if those chunks are themselves chunked.

The effect of this is root chunks of collections/blobs aren't written
until they're committed, which makes the XML importer run twice as fast
on a month of MLB data - 60s instead of 120s, with --ldb-dump-stats
showing a PutCount of 21,272 instead of 342,254.

In the future it should be possible to avoid writing *any* chunks until
the root is committed, which will improve incremental update
performance, but that's a larger change (issue #710). This change fixes
issue #832.
2016-01-05 15:57:45 +11:00
Ben Kalman ce334cfebb Merge pull request #790 from kalman/setmap-fix-tests
Fix bug in sequenceChunker when rechunking single item chunks.
2015-12-29 10:33:39 +11:00
Benjamin Kalman c9b20fbb21 Make compoundList.Remove convert return value to List not compoundList. 2015-12-28 12:52:41 +11:00
Benjamin Kalman c2f0ed3e08 Fix bug in sequenceChunker when rechunking single item chunks.
The problem was that if there are chunks in the middle of a prollytree
with only a single item, which can happen if the first item in a
sequence was a chunk boundary, in some circumstances sequenceChunker
would think that it's the root of the tree.
2015-12-24 10:58:31 +11:00
Erik Arvidsson d915d748e9 Merge pull request #799 from arv/cursor-at-first
Clean up compoundSet sequenceCursorAtFirst
2015-12-21 14:57:33 -08:00
Erik Arvidsson 483beb3863 Clean up compoundSet sequenceCursorAtFirst
Fixes #795
2015-12-21 13:49:54 -08:00
Erik Arvidsson dad5dda1ec Add runtime type checks to compound map
Fixes #812
2015-12-21 13:44:55 -08:00
Erik Arvidsson 81c0f03a43 Add runtime type assertions for compound list 2015-12-21 13:35:48 -08:00
Erik Arvidsson e28852f4c0 Merge pull request #802 from arv/compound-set-missing-type-check
Add missing runtime type check to compoundSet Insert
2015-12-21 09:56:02 -08:00
Erik Arvidsson a15ed4963b Add missing runtime type check to compoundSet Insert 2015-12-17 15:38:12 -05:00
Erik Arvidsson 1edbee5f34 Remove Set Subtract 2015-12-17 15:23:28 -05:00
Erik Arvidsson b8be6908f8 Implement Set Union
This is done by creating a cursor for each set. This is a cursor for
the actual values in the sets. We then pick the "smallest" value from
the cursors and advance that cursor. This continues until we have
exhausted all the cursors.

  setA.Union(set0, ... setN)

The time complexity is O(len(setA) + len(set0)) + ... len(setN))
2015-12-17 10:18:04 -05:00
Rafael Weinstein 27d5f0d240 Ensure sequenceChunker.Done() returns an internal type so that callers dont have to 2015-12-17 06:20:28 -08:00
Chris Masone a70c21116a Some compound{Set,Map} cleanup
Remove a bad comment, one-line a few things in Filter()
2015-12-16 15:57:22 -08:00
cmasone-attic 3163ff00dc Merge pull request #781 from cmasone-attic/mapfilter
Implement compoundMap Filter()
2015-12-16 15:50:37 -08:00
Dan Willhite 19228ba9d8 Merge pull request #784 from willhite/panics
Implement Filter on compoundList.
2015-12-16 14:06:57 -08:00
Dan Willhite 20f22e1020 Implement Filter on compoundList. 2015-12-16 13:32:00 -08:00
Chris Masone a30209bcc7 Implement compoundMap Filter() 2015-12-16 12:57:41 -08:00
Chris Masone 57e2303a62 Re-land "Implement compoundSet.Filter()"
This reverts commit 60ab9c7f0c.

Fixes initial patch to correctly use test harness.  Implementation is
based on newTypedSet(), so hopefully has similar performance
characteristics.
2015-12-16 12:53:13 -08:00
cmasone-attic 60ab9c7f0c Revert "Implement compoundSet.Filter()" 2015-12-16 12:42:33 -08:00