Commit Graph

342 Commits

Author SHA1 Message Date
cmasone-attic 22d8e175f7 Modify httpBatchStore so that writing values maintains some locality (#2983)
NBS benefits from related chunks being near one another. Initially,
let's use write-order as a proxy for "related".

This patch contains a pretty heinous hack to allow sync to continue
putting chunks into httpBatchStore top-down without breaking
server-side validation. Work to fix this is tracked in #2982

This patch fixes #2968, at least for now

* Introduces PullWithFlush() to allow noms sync to explicitly
pull chunks over and flush directly after. This allows UpdateRoot
to behave as before.

Also clears out all the legacy batch-put machinery. Now, Flush()
just directly calls sendWriteRequests().
2016-12-23 11:48:42 -08:00
Aaron Boodman fe53c15552 Marshal() should accept original field with empty value. (#2985) 2016-12-23 00:20:29 -08:00
Aaron Boodman e451a01441 Add support for original in marshal.Marshal (#2978)
Add support for `original` in marshal.Marshal

Now it roundtrips
2016-12-22 13:12:51 -08:00
Rafael Weinstein d8d8c6c7e1 Parallel s3 Slice reads (#2979)
GetMany() calls can now be serviced by <= N goroutines, where N is the number of physical reads the request in broken down into.

This patch also adds a maxReadSize param to the code which decides how to break chunk reads into physical reads, and sets the s3 blockSize to 5MB, which experimentally resulted to lower total latency.

Lastly, some small refactors.
2016-12-22 11:45:33 -08:00
Erik Arvidsson 6bc82b03dd Revert "Marshal: Be more lenient when unmarshal a struct (#2975)" (#2977)
This reverts commit b64f1d5dc9.

Reason: It is breaking samples/go/photo-index

Reopens: #2971
2016-12-21 19:14:59 -08:00
Erik Arvidsson b64f1d5dc9 Marshal: Be more lenient when unmarshal a struct (#2975)
This is a potentially breaking change!

Before this change we required all the fields in a Go struct to be
present in the Noms struct when we unmarshal the Noms struct onto the
Go struct. This is no longer the case, which means that all fields in
the Go struct that are present in the Noms struct will be copied over.

This also means that `omitempty` is useless in Unmarshal and it has been
removed.

This might break your code if expected to get errors when the field
names did not match!

Fixes #2971
2016-12-21 16:29:05 -08:00
Erik Arvidsson 521e7b6bae Marshal: Use nil for empty Go collections (#2973)
This is a breaking change!

We used to create empty Go collections `[]int{}` when unmarshalling an
empty Noms collection onto a Go collection that was `nil`. Now we keep
the Go collection as `nil` which means that you will get `[]int(nil)`
for an empty Noms List.

Fixes #2969
2016-12-21 16:21:15 -08:00
cmasone-attic d129580007 Add frag tool to measure nbs fragmentation (#2963)
Before we can defragment NBS stores, we need to understand how
fragmented they are. This tool provides a measure of fragmentation in
which optimal chunk-graph layout implies that ALL children of a given
parent can be read in one storage-layer operation (e.g. disk read, S3
transaction, etc).
2016-12-20 17:01:18 -08:00
Ben Kalman 4eca6085cb Allow skipping invalid fields when marshalling (#2967)
Currently unexported fields refuse to marshal or unmarshal, even if
they're told to skip. Now, they can be skipped. By default, they are
still errors.
2016-12-20 16:47:49 -08:00
Ben Kalman fa1ab8a61c Make datas.localFactory public and add NewLocalFactory (#2961)
I want to adapt datas.NewLocalFactory(chunks.NewMemoryStoreFactory()).
2016-12-19 10:46:25 -08:00
Dan Willhite d41a4bc6f7 Add set methods to verbose. (#2962) 2016-12-19 10:26:58 -08:00
cmasone-attic 9998ec0301 NBS: tableSet must exclude empty tables during ToSpecs() (#2959)
Fixes #2957
2016-12-15 11:45:39 -08:00
cmasone-attic 35385900d6 Close some NBS tableSets during testing (#2956)
Leaving these open can leak file handles
2016-12-14 17:17:58 -08:00
Rafael Weinstein d2289219c3 dont flush for every batch during VBS.Enqueue (#2955) 2016-12-14 16:36:12 -08:00
Rafael Weinstein f409db8c74 Add s3 tableIndex cache (#2953) 2016-12-14 15:06:31 -08:00
Rafael Weinstein e242c9d168 Move concrete impls out of table_persister.go (#2952) 2016-12-14 12:48:05 -08:00
Rafael Weinstein cc8ffacddf Factor tableIndex out of tableReader (#2950)
Factor tableIndex out of tableReader
2016-12-14 12:41:01 -08:00
Rafael Weinstein d7ee0025d6 Open NBS tables in parallel (#2946) 2016-12-13 14:15:25 -08:00
Rafael Weinstein 373900c790 fix 2016-12-13 10:25:49 -08:00
Rafael Weinstein c159876992 Make read amplification threshold configurable (#2941) 2016-12-13 09:57:41 -08:00
cmasone-attic 7f36fad716 tablePersister.Compact returns a chunkSource (#2939)
It turns out the only caller of Compact() immediately
turns around and calls Open, so why don't I just do
that FOR you?

Fixes #2935
2016-12-13 06:20:33 -08:00
Rafael Weinstein ef4e6c48d3 AWSStoreFactory (#2938) 2016-12-12 15:55:56 -08:00
Rafael Weinstein 335454b34c ChunkSink.Flush() (#2937)
Add ChunkSink.Flush() which signals the ChunkSink that any previously Put chunks should be made durable.
2016-12-12 15:39:13 -08:00
cmasone-attic de6e49c9e0 compactingChunkStore crash fix (#2936)
compactingChunkStore.close() must wait for compactions to finish.
2016-12-12 14:43:46 -08:00
cmasone-attic 7fe3b18a6b Make compaction async (#2934)
Introduce a 'compactingChunkStore', which knows how to compact itself
in the background. It satisfies get/has requests from an in-memory
table until compaction is complete. Once compaction is done, it
destroys the in-memory table and switches over to using solely the
persistent table.

Fixes #2879
2016-12-12 14:15:30 -08:00
Rafael Weinstein 0652e0b3e0 Add ChunkSource.GetMany(); RemoteBatchStore getRefs uses GetMany() (#2933)
Add GetMany(), which most ChunkStores implement by repeated calls to their own Get(), but creates the opportunity for stores to optimize reads of larger blocks of potentially sequential chunks (e.g. NBS).

Add RemoteBatchStore getRefs endpoint support for calling GetMany() rather than Get()

Remove ReadThroughChunkStore which was dead code.
2016-12-12 11:18:22 -08:00
Erik Arvidsson 80d4894fcc JS: Fix walk
The promise was resolving too early.
2016-12-09 18:51:49 -08:00
Aaron Boodman fe59ae9504 Revert "Fix JS walk handling for values of type Type (#2913)" (#2931)
This reverts commit 2717908d2b.
2016-12-09 17:43:26 -08:00
Erik Arvidsson 2717908d2b Fix JS walk handling for values of type Type (#2913)
Fixes #2911
2016-12-08 16:38:10 -08:00
cmasone-attic dff8b67aba Change NBS s3TablePersister to use Multipart upload (#2922)
Instead of putting an entire table to S3 in a single request, split it
into 5MB parts (the smallest allowable) and send all the parts in
parallel.
2016-12-08 16:10:37 -08:00
Dan Willhite e30272abeb Implement poke functionality using diff.Apply function (#2828) 2016-12-07 11:57:48 -08:00
cmasone-attic 0750459e4e Add AWS backend for NBS (#2914)
Add a new backend for NBS that stores tables in S3 and
manifests in DynamoDB.

Fixes #2877
2016-12-06 15:12:28 -08:00
Ben Kalman dc289245cb Add path @at for positional indexing, and negative indices (#2910)
This lets you do foo.bar@at(n) to get the nth element of a list, set, or
map (for lists, this is equivalent to foo.bar[n]). This patch also adds
support for negative indices to @at(-n) and [-n] to get the nth element
relative to the back of collections.
2016-12-05 14:43:41 -08:00
Rafael Weinstein 6edea9665e Verify chunks using suffix index not computing address from data. (#2907)
Revert to verifying chunks using the suffix index. Replace the inline 4-byte suffix used as integrity check with a more standard and efficient CRC32.
2016-12-05 11:44:43 -08:00
cmasone-attic b3eef38fa4 Break NomsBlockStore dependency on disk storage (#2905)
This patch introduces/expands the 'manifest' and 'tableSet'
abstractions, so that NomsBlockStore is no longer explicitly using any
file system operations

Towards issue #2877
2016-12-05 09:05:40 -08:00
Ben Kalman 0a10704149 Implement Set.At and Map.At in Go (#2903)
These get the set/map element at a specific index.
I haven't implemented it in JS yet because the JS code has no method to
create a cursor at an index. This exists in Go because a refactor was
done a few months ago to add it, but it hasn't been ported to JS.
2016-12-04 11:27:35 -08:00
Aaron Boodman 87958507b0 marshal.Unmarshal(): introduce omitempty and original support (#2900)
* marshal.Unmarshal(): introduce omitempty and original suppport.

Fixes #2795
Fixes #2796

* review comments
2016-12-02 15:13:00 -08:00
Rafael Weinstein a67bb9bf7b Minor rework of hash.Hash API (#2888)
Define the hash.Hash type to be a 20-byte array, rather than embed one. Hash API Changes: `hash.FromSlice` -> `hash.New`, `hash.FromData` -> `hash.Of`
2016-12-02 12:11:00 -08:00
Eric Halpern ca232f0ad7 Support marshaling from and unmarshaling to *types.Type (#2892)
* Support marshaling from and unmarshaling to *types.Type

* Incorporate code review suggestions.
- Use fallthrough in switch
- Update decode godoc to spec *types.Type handling. Godoc for encode is fine as is.

toward: #2889
-
2016-12-02 11:32:35 -08:00
Rafael Weinstein 9cbe8e8bc8 uint32ChunkCount (#2894)
Encode chunk counts consistently as uint32 until #2873 is addressed. This also fixes an error in passing chunkCounts resulting from compaction that don't account for dropped (duplicate) chunks.
2016-12-02 10:12:40 -08:00
Rafael Weinstein a00a5f5611 Implement experimental block store (#2870)
* Move NBS into Noms

* vendor in deps
2016-12-01 10:04:09 -08:00
Aaron Boodman b221caabea Add @type support to paths in Go (#2860)
Add @type support to paths in Go
2016-11-29 18:13:29 -08:00
Eric Halpern 3f60749013 Fix race conditions in encoder to address photo-dedup segfault (#2852)
toward: #2849
2016-11-23 15:39:14 -08:00
Ben Kalman b86b18f395 Add MustParsePath (#2851) 2016-11-23 15:24:08 -08:00
Ben Kalman 96d10ac29f Improve Set marshaling to add encoding support, and decoding to map (#2845)
The only support that marshal has for Set at the moment is decoding to
slice.
2016-11-22 11:24:18 -08:00
Erik Arvidsson 5e901b0924 Cache OIDs as we descend (#2840)
Remove validation/normalization of union order and struct field order as we decode a chunk into a type.

Instead the validation happens in ValidatingBatchSink.

We still normalize the union order when a struct type is created directly (not from a chunk) using  makeStructType.

The motivation for this change is that computing the OID (order ID) is expensive and it used to be a O(n^2) since we kept recomputing it as we traversed the type hierarchy.

Towards #2836
2016-11-21 15:18:02 -08:00
cmasone-attic 0cf72d5b85 Add debug logging to HandleWriteValue (#2846)
This patch introduces optional debug logging in util/verbose, and adds
some usage of it to HandleWriteValue and the httpBatchStore
SchedulePut code path. It also modifies chunks.DeserializeToChan() so
that callers can better recover from panics in there.

https://github.com/attic-labs/attic/issues/103
2016-11-21 15:11:34 -08:00
Ben Kalman ff4ee3c3a9 Small refactor to marshal code to support upcoming "set" tag (#2843) 2016-11-18 17:44:10 -08:00
Eric Halpern b29e50379f Reduce time required by sequence_iterator_test by using smaller chunk sizes (#2831)
* Reduce time required by sequence_iterator_test by using smaller chunk
sizes

* Simplify data generation
2016-11-16 11:26:43 -08:00
Aaron Boodman 8e64e636aa Introduce photo-dedup-by-date (#2826)
Introduce photo-dedup-by-date

This program deduplicates photos by the date they were taken. It considers two photos a group if they were separated by less than 5 seconds.
2016-11-15 14:07:57 -08:00