NBS benefits from related chunks being near one another. Initially,
let's use write-order as a proxy for "related".
This patch contains a pretty heinous hack to allow sync to continue
putting chunks into httpBatchStore top-down without breaking
server-side validation. Work to fix this is tracked in #2982
This patch fixes#2968, at least for now
* Introduces PullWithFlush() to allow noms sync to explicitly
pull chunks over and flush directly after. This allows UpdateRoot
to behave as before.
Also clears out all the legacy batch-put machinery. Now, Flush()
just directly calls sendWriteRequests().
This lets you do foo.bar@at(n) to get the nth element of a list, set, or
map (for lists, this is equivalent to foo.bar[n]). This patch also adds
support for negative indices to @at(-n) and [-n] to get the nth element
relative to the back of collections.
These get the set/map element at a specific index.
I haven't implemented it in JS yet because the JS code has no method to
create a cursor at an index. This exists in Go because a refactor was
done a few months ago to add it, but it hasn't been ported to JS.
Remove validation/normalization of union order and struct field order as we decode a chunk into a type.
Instead the validation happens in ValidatingBatchSink.
We still normalize the union order when a struct type is created directly (not from a chunk) using makeStructType.
The motivation for this change is that computing the OID (order ID) is expensive and it used to be a O(n^2) since we kept recomputing it as we traversed the type hierarchy.
Towards #2836
* Implement read-ahead in sequence_cursor
For each meta-sequence that contains leaf sequences, start reading ahead in
parallel and deliver in order to a buffered channel. Each advance of the cursor gets
the next sequence in the read-ahead channel.
toward: #2079
-
* Address code review comments:
- Use // for all comments
- Fix label format
- Increase channel read timeout
* Rework read-ahead to use map[int]channel sequence instead of a channel of sequences
* Rework sequence cursor read-ahead for better throughput
- Guts of read-ahead now encapsulted in sequenceReadAhead
- New implemention uses a cursor to iterate across the leaves ahead
of the current cursor
- It reads ahead using short-lived go routines that place each read-ahead
sequence in a channel that is then stored by hash in a map
- When the sequence is needed, the cursor first looks in the map. If found,
it reads the sequence from the channel stored in the map. If not, it reads
it normally.
- This approach allows for reading ahead in parallel without requiring a long
running pool of goroutines
- Introduce sequenceIterator to encapulate read-ahead behind an abstraction that
always reads forward. This is currently used narrowly but could be used more
widely as the the core implementation for all sequence iterators
* Address review comments
This is a side-by-side port, taking inspiration from the old dataspec.go
code. Notably:
- LDB support has been added in Go. It wasn't needed in JS.
- There is an Href() method on Spec now.
- Go now handles IPV6.
- Go no longer treats access_token specially.
- Go now has Pin.
- I found some issues in the JS while doing this, I'll fix later.
I've also updated the config code to use the new API so that basically
all the Go samples use the code, even if they don't really change.
ValueStore caches Values that are read out of it, but it doesn't
do the same for Values that are written. This is because we expect
that reading Values shortly after writing them is an uncommon usage
pattern, and because the Chunks that make up novel Values are
generally efficiently retrievable from the BatchStore that backs
a ValueStore. The problem discovered in issue #2802 is that ValueStore
caches non-existence as well as existence of read Values. So, reading
a Value that doesn't exist in the DB would result in the ValueStore
permanently returning nil for that Value -- even if you then go and
write it to the DB.
This patch drops the cache entry for a Value whenever it's written.
Fixes#2802
Previously we would clone them from the original cursor, to (a) not
modify the original cursor, and (b) have initialization and finalization
not interfere with each other.
However, this isn't necessary and it just creates unnecessary churn. For
example, when we read-ahead, it would be wasteful to re-read the
read-head chunks from initialization.
This puts the flow header after the copyright header.
It also:
* fixes the existing files to have valid headers
* Makes sure the script can handle doctype
There are two places where ValidatingBatchingSink could be more
concurrent: Prepare(), where it's reading in hints, and Enqueue().
Making Prepare() handle many hints concurrently is easy because the
hints don't depend on one another, so that method now just spins up
a number of goroutines and runs them all at once.
Enqueue() is more complex, because while Chunk decoding and validation
of its hash can proceed concurrently, validating that a given Chunk is
'ref-complete' requires that the chunks in the writeValue payload all
be processed in order. So, this patch uses orderedparallel to run the
new Decode() method on chunks in parallel, but then return to serial
operation before calling the modified Enqueue() method.
Fixes#1935
This allows setting a field in a struct to a new type or to set a
non-existig field in a struct.
In JS this is done through the StructMirror.p.set and in Go this is
done through Struct Set.
Fixes#2181
This patch modifies LocalDatabase so that it no longer swaps out
its embedded ValueStore during Pull(). The reason it was doing this
is that Pull() injects chunks directly into a Database, without
doing any work on its own to ensure correctness. For LocalDatabase,
WriteValue() performs de-facto validation as it goes, so it does not
need this additional validation in the general case. To address the
former wtithout impacting the latter, we were making LocalDatabase
swap out its ValueStore() during Pull(), replacing it with one that
performs validation.
This led to inconsistencies, seen in issue #2598. Collections read
from the DB _before_ this swap could have weird behavior if they
were modified and written after this swap.
The new code just lazily creates a BatchStore for use during Pull().
Fixes#2598
Blob.Concat is a simple use of the sequence concat code that List.Concat uses.
NewBlob uses Blob.Concat to construct a Blob in parallel.
Perf tests for parallel NewBlob write N temporary files then constructs a Blob
from them, so there is some I/O, but it appears to be mostly CPU bound. NewBlob
doesn't get much more than 50% faster with any P >= 2.
Noms SDK users frequently shoot themselves in the foot because they're
holding onto an "old" Database object. That is, they have a Database
tucked away in some internal state, they call Commit() on it, and
don't replace the object in their internal state with the new Database
returned from Commit.
This PR changes the Database and Dataset Go API to be in line with the
proposal in Issue #2589. JS follows in a separate patch.
These were previously intertwined into one writer that was
embedded in and only usable by the 'noms' command.
This commit separates them into to separate writers that
can be used independently or combined. I also moved them
into go/utils/writers so they can be used by other code.
The main impetus to do this was to fix Bug #2593.
This makes a number of changes to simplify code:
1. metaSequence is now a struct, not an interface
2. orderedMetaSequence and indexedMetaSequence is gone
3. metaSequenceObject is gone
4. indexedSequence is gone
5. add leafSequence struct for leaf sequences to embed
Everything but change 5 was done by rafael@atticlabs.io.
Once we integrate noms-merge into the `noms commit` command, this
function will allow us to stop requiring users to pass in the common
ancestor to be used when merging. The code can just find it and merge
away.
Toward #2535
It exploits the chunked structure of Lists to allow concatenating
arbitrarily large Lists.
I've added both functional and perf tests for Concat, and the perf tests
only made sense with tests for building and reading Lists, so now we
have those too.
When merge.ThreeWay() merges list splices, it frequently needs to
extract a slice of a List or check whether a given range of values is
exactly equal in two different Lists. Repeatedly Getting elements from
a List is expensive, because it creates a new cursor internally every
time. Adding IteratorAt(idx uint64) allows code to iterate over a
range of a List while only creating a cursor once. This allows a slice
of Values to be extracted or used in comparisons efficiently.
Toward #2445
For compound types (List, Set, Map, Ref) the concrete type may be a
union. If that is the case all the types in the union must be a
subtype of the concrete type's element type.
`C<T | V>` is a subtype of `C<S>` iff T is a subtype of S and V is a
subtype of S.
Fixes are based on Go report card output:
- `gofmt -s` eliminates some duplication in struct/slice initialization
- `golint` found some issues like: `warning: should drop = nil from declaration of var XXX; it is the zero value`
- `golint` found some issues like: `warning: receiver name XXX should be consistent with previous receiver name YYY for ZZZ`
- `golint` says not to use underscores for function/variable names
- `golint` found several issues like: `warning: if block ends with a return statement, so drop this else and outdent its block`
No functional changes are included - just source code quality improvements.
For compound types (List, Set, Map, Ref) the concrete type may be a
union. If that is the case all the types in the union must be a
subtype of the concrete type's element type.
`C<T | V>` is a subtype of `C<S>` iff T is a subtype of S and V is a
subtype of S.