NBS benefits from related chunks being near one another. Initially,
let's use write-order as a proxy for "related".
This patch contains a pretty heinous hack to allow sync to continue
putting chunks into httpBatchStore top-down without breaking
server-side validation. Work to fix this is tracked in #2982
This patch fixes#2968, at least for now
* Introduces PullWithFlush() to allow noms sync to explicitly
pull chunks over and flush directly after. This allows UpdateRoot
to behave as before.
Also clears out all the legacy batch-put machinery. Now, Flush()
just directly calls sendWriteRequests().
GetMany() calls can now be serviced by <= N goroutines, where N is the number of physical reads the request in broken down into.
This patch also adds a maxReadSize param to the code which decides how to break chunk reads into physical reads, and sets the s3 blockSize to 5MB, which experimentally resulted to lower total latency.
Lastly, some small refactors.
This is a potentially breaking change!
Before this change we required all the fields in a Go struct to be
present in the Noms struct when we unmarshal the Noms struct onto the
Go struct. This is no longer the case, which means that all fields in
the Go struct that are present in the Noms struct will be copied over.
This also means that `omitempty` is useless in Unmarshal and it has been
removed.
This might break your code if expected to get errors when the field
names did not match!
Fixes#2971
This is a breaking change!
We used to create empty Go collections `[]int{}` when unmarshalling an
empty Noms collection onto a Go collection that was `nil`. Now we keep
the Go collection as `nil` which means that you will get `[]int(nil)`
for an empty Noms List.
Fixes#2969
Before we can defragment NBS stores, we need to understand how
fragmented they are. This tool provides a measure of fragmentation in
which optimal chunk-graph layout implies that ALL children of a given
parent can be read in one storage-layer operation (e.g. disk read, S3
transaction, etc).
Introduce a 'compactingChunkStore', which knows how to compact itself
in the background. It satisfies get/has requests from an in-memory
table until compaction is complete. Once compaction is done, it
destroys the in-memory table and switches over to using solely the
persistent table.
Fixes#2879
Add GetMany(), which most ChunkStores implement by repeated calls to their own Get(), but creates the opportunity for stores to optimize reads of larger blocks of potentially sequential chunks (e.g. NBS).
Add RemoteBatchStore getRefs endpoint support for calling GetMany() rather than Get()
Remove ReadThroughChunkStore which was dead code.
This lets you do foo.bar@at(n) to get the nth element of a list, set, or
map (for lists, this is equivalent to foo.bar[n]). This patch also adds
support for negative indices to @at(-n) and [-n] to get the nth element
relative to the back of collections.
This patch introduces/expands the 'manifest' and 'tableSet'
abstractions, so that NomsBlockStore is no longer explicitly using any
file system operations
Towards issue #2877
These get the set/map element at a specific index.
I haven't implemented it in JS yet because the JS code has no method to
create a cursor at an index. This exists in Go because a refactor was
done a few months ago to add it, but it hasn't been ported to JS.
* Support marshaling from and unmarshaling to *types.Type
* Incorporate code review suggestions.
- Use fallthrough in switch
- Update decode godoc to spec *types.Type handling. Godoc for encode is fine as is.
toward: #2889
-
Encode chunk counts consistently as uint32 until #2873 is addressed. This also fixes an error in passing chunkCounts resulting from compaction that don't account for dropped (duplicate) chunks.
Remove validation/normalization of union order and struct field order as we decode a chunk into a type.
Instead the validation happens in ValidatingBatchSink.
We still normalize the union order when a struct type is created directly (not from a chunk) using makeStructType.
The motivation for this change is that computing the OID (order ID) is expensive and it used to be a O(n^2) since we kept recomputing it as we traversed the type hierarchy.
Towards #2836
This patch introduces optional debug logging in util/verbose, and adds
some usage of it to HandleWriteValue and the httpBatchStore
SchedulePut code path. It also modifies chunks.DeserializeToChan() so
that callers can better recover from panics in there.
https://github.com/attic-labs/attic/issues/103
Introduce photo-dedup-by-date
This program deduplicates photos by the date they were taken. It considers two photos a group if they were separated by less than 5 seconds.