Commit Graph

162 Commits

Author SHA1 Message Date
cmasone-attic 22d8e175f7 Modify httpBatchStore so that writing values maintains some locality (#2983)
NBS benefits from related chunks being near one another. Initially,
let's use write-order as a proxy for "related".

This patch contains a pretty heinous hack to allow sync to continue
putting chunks into httpBatchStore top-down without breaking
server-side validation. Work to fix this is tracked in #2982

This patch fixes #2968, at least for now

* Introduces PullWithFlush() to allow noms sync to explicitly
pull chunks over and flush directly after. This allows UpdateRoot
to behave as before.

Also clears out all the legacy batch-put machinery. Now, Flush()
just directly calls sendWriteRequests().
2016-12-23 11:48:42 -08:00
Rafael Weinstein d2289219c3 dont flush for every batch during VBS.Enqueue (#2955) 2016-12-14 16:36:12 -08:00
Rafael Weinstein 335454b34c ChunkSink.Flush() (#2937)
Add ChunkSink.Flush() which signals the ChunkSink that any previously Put chunks should be made durable.
2016-12-12 15:39:13 -08:00
Erik Arvidsson 80d4894fcc JS: Fix walk
The promise was resolving too early.
2016-12-09 18:51:49 -08:00
Aaron Boodman fe59ae9504 Revert "Fix JS walk handling for values of type Type (#2913)" (#2931)
This reverts commit 2717908d2b.
2016-12-09 17:43:26 -08:00
Erik Arvidsson 2717908d2b Fix JS walk handling for values of type Type (#2913)
Fixes #2911
2016-12-08 16:38:10 -08:00
Ben Kalman dc289245cb Add path @at for positional indexing, and negative indices (#2910)
This lets you do foo.bar@at(n) to get the nth element of a list, set, or
map (for lists, this is equivalent to foo.bar[n]). This patch also adds
support for negative indices to @at(-n) and [-n] to get the nth element
relative to the back of collections.
2016-12-05 14:43:41 -08:00
Ben Kalman 0a10704149 Implement Set.At and Map.At in Go (#2903)
These get the set/map element at a specific index.
I haven't implemented it in JS yet because the JS code has no method to
create a cursor at an index. This exists in Go because a refactor was
done a few months ago to add it, but it hasn't been ported to JS.
2016-12-04 11:27:35 -08:00
Rafael Weinstein a67bb9bf7b Minor rework of hash.Hash API (#2888)
Define the hash.Hash type to be a 20-byte array, rather than embed one. Hash API Changes: `hash.FromSlice` -> `hash.New`, `hash.FromData` -> `hash.Of`
2016-12-02 12:11:00 -08:00
Aaron Boodman b221caabea Add @type support to paths in Go (#2860)
Add @type support to paths in Go
2016-11-29 18:13:29 -08:00
Ben Kalman b86b18f395 Add MustParsePath (#2851) 2016-11-23 15:24:08 -08:00
Erik Arvidsson 5e901b0924 Cache OIDs as we descend (#2840)
Remove validation/normalization of union order and struct field order as we decode a chunk into a type.

Instead the validation happens in ValidatingBatchSink.

We still normalize the union order when a struct type is created directly (not from a chunk) using  makeStructType.

The motivation for this change is that computing the OID (order ID) is expensive and it used to be a O(n^2) since we kept recomputing it as we traversed the type hierarchy.

Towards #2836
2016-11-21 15:18:02 -08:00
Eric Halpern b29e50379f Reduce time required by sequence_iterator_test by using smaller chunk sizes (#2831)
* Reduce time required by sequence_iterator_test by using smaller chunk
sizes

* Simplify data generation
2016-11-16 11:26:43 -08:00
Eric Halpern 94a61c6aad Better test fix. Need to obtain raw bytes before reading from buffer (#2830) 2016-11-14 11:55:26 -08:00
Eric Halpern 3da7461480 Fix test break (#2829) 2016-11-14 11:13:55 -08:00
Eric Halpern 242b782748 Improve sequence read performance using read-ahead (#2711)
* Implement read-ahead in sequence_cursor
For each meta-sequence that contains leaf sequences, start reading ahead in
parallel and deliver in order to a buffered channel. Each advance of the cursor gets
the next sequence in the read-ahead channel.

toward: #2079
-

* Address code review comments:
- Use // for all comments
- Fix label format
- Increase channel read timeout

* Rework read-ahead to use map[int]channel sequence instead of a channel of sequences

* Rework sequence cursor read-ahead for better throughput

- Guts of read-ahead now encapsulted in sequenceReadAhead
- New implemention uses a cursor to iterate across the leaves ahead
  of the current cursor
  - It reads ahead using short-lived go routines that place each read-ahead
    sequence in a channel that is then stored by hash in a map
  - When the sequence is needed, the cursor first looks in the map. If found,
    it reads the sequence from the channel stored in the map. If not, it reads
    it normally.
  - This approach allows for reading ahead in parallel without requiring a long
    running pool of goroutines
- Introduce sequenceIterator to encapulate read-ahead behind an abstraction that
  always reads forward. This is currently used narrowly but could be used more
  widely as the the core implementation for all sequence iterators

* Address review comments
2016-11-12 11:51:26 -08:00
Ben Kalman 172b991ac1 Port new Spec API to Go from JS (#2807)
This is a side-by-side port, taking inspiration from the old dataspec.go
code. Notably:

- LDB support has been added in Go. It wasn't needed in JS.
- There is an Href() method on Spec now.
- Go now handles IPV6.
- Go no longer treats access_token specially.
- Go now has Pin.
- I found some issues in the JS while doing this, I'll fix later.

I've also updated the config code to use the new API so that basically
all the Go samples use the code, even if they don't really change.
2016-11-08 14:18:47 -08:00
cmasone-attic 12ddb66fc5 Clobber ValueStore cache entry on WriteValue (#2804)
ValueStore caches Values that are read out of it, but it doesn't
do the same for Values that are written. This is because we expect
that reading Values shortly after writing them is an uncommon usage
pattern, and because the Chunks that make up novel Values are
generally efficiently retrievable from the BatchStore that backs
a ValueStore. The problem discovered in issue #2802 is that ValueStore
caches non-existence as well as existence of read Values. So, reading
a Value that doesn't exist in the DB would result in the ValueStore
permanently returning nil for that Value -- even if you then go and
write it to the DB.

This patch drops the cache entry for a Value whenever it's written.

Fixes #2802
2016-11-04 15:53:26 -07:00
Dan Willhite 46586ee928 Remove msg args from d.PanicIfTrue and d.PanicIfFalse. (#2757)
Should discourage people from writing code that does unnecessary work
to generate a msg every time that an error condition is checked. Fixes #2741
2016-11-03 11:43:57 -07:00
Dan Willhite 1cd34e0ebd Add Append() method to types.Path (#2783) 2016-10-29 12:37:50 -07:00
Erik Arvidsson 2bd617ade8 Add delete field to structs (#2779) 2016-10-28 15:22:37 -07:00
Erik Arvidsson 0eb940e50a First cut at noms migrate (#2594)
This iterates over all the values of the old version and creates new
values of the new version.

Closes #2428
Fixes #2272
2016-10-21 15:16:29 -07:00
Ben Kalman 007ba18987 Use the same cursor when initializing and finalizing the chunker (#2729)
Previously we would clone them from the original cursor, to (a) not
modify the original cursor, and (b) have initialization and finalization
not interfere with each other.

However, this isn't necessary and it just creates unnecessary churn. For
example, when we read-ahead, it would be wasteful to re-read the
read-head chunks from initialization.
2016-10-20 16:04:03 -07:00
Dan Willhite e1e143a27a Extract print functionality from Diff function. (#2722)
The Diff function now returns Difference objects that can be
used in different contexts (e.g. print_diff)

Towards #609
2016-10-19 16:44:37 -07:00
Erik Arvidsson e164f8aeec Flow header after copyright (#2734)
This puts the flow header after the copyright header.

It also:
 * fixes the existing files to have valid headers
 * Makes sure the script can handle doctype
2016-10-19 11:36:48 -07:00
Erik Arvidsson b05a857d99 Handle NaN and Infinity in number encoding (#2731)
We were hitting iloops for these non finite numbers
2016-10-18 16:49:35 -07:00
cmasone-attic 3afdbd99c4 Enhance TestValidatingBatchingSinkPrepare (#2705)
The test now actually validates that processing the provided
hints populates the VBS' chunk cache
2016-10-13 13:34:59 -07:00
cmasone-attic c9c1bb9ff5 Add concurrency to use of ValidatingBatchingSink (#2684)
There are two places where ValidatingBatchingSink could be more
concurrent: Prepare(), where it's reading in hints, and Enqueue().

Making Prepare() handle many hints concurrently is easy because the
hints don't depend on one another, so that method now just spins up
a number of goroutines and runs them all at once.

Enqueue() is more complex, because while Chunk decoding and validation
of its hash can proceed concurrently, validating that a given Chunk is
'ref-complete' requires that the chunks in the writeValue payload all
be processed in order. So, this patch uses orderedparallel to run the
new Decode() method on chunks in parallel, but then return to serial
operation before calling the modified Enqueue() method.

Fixes #1935
2016-10-10 15:33:35 -07:00
Erik Arvidsson 239b02cfd5 Allow struct set to cause a type change (#2542)
This allows setting a field in a struct to a new type or to set a
non-existig field in a struct.

In JS this is done through the StructMirror.p.set and in Go this is
done through Struct Set.

Fixes #2181
2016-10-07 12:38:29 -07:00
zcstarr 40b28f94e5 Refactor Chunks and ChildValues API to work iteratively (#2599)
* Refactors Chunks and ChildValues API to be iterative change also
exposes WalkValues which replaces SomeP and AllP
2016-09-30 16:53:00 -07:00
cmasone-attic c250406a0e LocalDatabase vends separate validating BatchStore (#2624)
This patch modifies LocalDatabase so that it no longer swaps out
its embedded ValueStore during Pull(). The reason it was doing this
is that Pull() injects chunks directly into a Database, without
doing any work on its own to ensure correctness. For LocalDatabase,
WriteValue() performs de-facto validation as it goes, so it does not
need this additional validation in the general case. To address the
former wtithout impacting the latter, we were making LocalDatabase
swap out its ValueStore() during Pull(), replacing it with one that
performs validation.

This led to inconsistencies, seen in issue #2598. Collections read
from the DB _before_ this swap could have weird behavior if they
were modified and written after this swap.

The new code just lazily creates a BatchStore for use during Pull().

Fixes #2598
2016-09-27 14:24:57 -07:00
Ben Kalman 35d88dd3c6 Implement Blob.Concat and make NewBlob parallel
Blob.Concat is a simple use of the sequence concat code that List.Concat uses.
NewBlob uses Blob.Concat to construct a Blob in parallel.

Perf tests for parallel NewBlob write N temporary files then constructs a Blob
from them, so there is some I/O, but it appears to be mostly CPU bound.  NewBlob
doesn't get much more than 50% faster with any P >= 2.
2016-09-27 11:08:31 -07:00
Aaron Boodman 362a5630d9 Add photo-index: a simple photo indexer. For now only indexes by tag. (#2610)
Add photo-index: a simple photo indexer. For now only indexes by tag.

Will add indexing by face/geo in subsequent patches.
2016-09-27 10:50:37 -07:00
cmasone-attic 2e462b11a5 Make Database a mutable API that vends immutable Datasets (#2617)
Noms SDK users frequently shoot themselves in the foot because they're
holding onto an "old" Database object. That is, they have a Database
tucked away in some internal state, they call Commit() on it, and
don't replace the object in their internal state with the new Database
returned from Commit.

This PR changes the Database and Dataset Go API to be in line with the
proposal in Issue #2589. JS follows in a separate patch.
2016-09-26 12:18:14 -07:00
Dan Willhite 1b256fa72a Implement GraphBuilder (#2548) 2016-09-20 16:04:28 -07:00
Dan Willhite 1a7bfd0627 Cleanup Prefix and MaxLine writers (#2591)
These were previously intertwined into one writer that was
embedded in and only usable by the 'noms' command.
This commit separates them into to separate writers that
can be used independently or combined. I also moved them
into go/utils/writers so they can be used by other code.

The main impetus to do this was to fix Bug #2593.
2016-09-20 10:31:16 -07:00
Erik Arvidsson e3bea0f274 Tweak the display of the type of an empty Map (#2590)
Fixes #2247
Closes #2252
2016-09-16 17:48:47 -07:00
Ben Kalman 068eb0cb71 Go sequence interface/struct hierarchy refactor (#2583)
This makes a number of changes to simplify code:
1. metaSequence is now a struct, not an interface
2. orderedMetaSequence and indexedMetaSequence is gone
3. metaSequenceObject is gone
4. indexedSequence is gone
5. add leafSequence struct for leaf sequences to embed

Everything but change 5 was done by rafael@atticlabs.io.
2016-09-16 11:18:41 -07:00
cmasone-attic 96cc9ffb1c Add FindCommonAncestor for Commits (#2579)
Once we integrate noms-merge into the `noms commit` command, this
function will allow us to stop requiring users to pass in the common
ancestor to be used when merging. The code can just find it and merge
away.

Toward #2535
2016-09-15 16:00:52 -07:00
Ben Kalman da336c3aab Introduce List.Concat to the Go API (#2550)
It exploits the chunked structure of Lists to allow concatenating
arbitrarily large Lists.

I've added both functional and perf tests for Concat, and the perf tests
only made sense with tests for building and reading Lists, so now we
have those too.
2016-09-14 14:15:14 -07:00
Erik Arvidsson 5edf89cf3d Replace d.Chk.True with d.PanicIfFalse (#2563)
And same for d.Chk.False
2016-09-14 13:11:28 -07:00
Erik Arvidsson 8fbcb58ca7 Add jsdoc/godoc for List (#2554)
Towards #2297
2016-09-14 11:16:42 -07:00
Erik Arvidsson 679305dad3 Add jsdoc/godoc for Hash and Value (#2553)
Towards #2297
2016-09-13 17:55:19 -07:00
cmasone-attic 652613d09e Go: Add List.IteratorAt() and use in list merging (#2512)
When merge.ThreeWay() merges list splices, it frequently needs to
extract a slice of a List or check whether a given range of values is
exactly equal in two different Lists. Repeatedly Getting elements from
a List is expensive, because it creates a new cursor internally every
time. Adding IteratorAt(idx uint64) allows code to iterate over a
range of a List while only creating a cursor once. This allows a slice
of Values to be extracted or used in comparisons efficiently.

Toward #2445
2016-09-06 13:50:08 -07:00
Erik Arvidsson 061f7694ad JS: Fix subtype issue with unions as elem types (#2518)
For compound types (List, Set, Map, Ref) the concrete type may be a
union. If that is the case all the types in the union must be a
subtype of the concrete type's element type.

`C<T | V>` is a subtype of `C<S>` iff T is a subtype of S and V is a
subtype of S.
2016-09-06 13:38:51 -07:00
Mike Gray 47565f39d1 Improve code based on tool analysis feeback (#2521)
Fixes are based on Go report card output:
- `gofmt -s` eliminates some duplication in struct/slice initialization
- `golint` found some issues like: `warning: should drop = nil from declaration of var XXX; it is the zero value`
- `golint` found some issues like: `warning: receiver name XXX should be consistent with previous receiver name YYY for ZZZ`
- `golint` says not to use underscores for function/variable names
- `golint` found several issues like: `warning: if block ends with a return statement, so drop this else and outdent its block`

No functional changes are included - just source code quality improvements.
2016-09-06 16:35:25 -04:00
Erik Arvidsson 017d599b1d Fix subtype issue with unions as elem types (#2506)
For compound types (List, Set, Map, Ref) the concrete type may be a
union. If that is the case all the types in the union must be a
subtype of the concrete type's element type.

`C<T | V>` is a subtype of `C<S>` iff T is a subtype of S and V is a
subtype of S.
2016-09-01 18:40:51 -07:00
zcstarr aeb5c42bcc Add special encoding to csv imported struct fields (#2441)
CSV importing is changed to strip invalid characters from csv fields 
and camel case spaces. i.e. ca-mel case is translated to camelCase.
2016-08-30 14:59:10 -07:00
Aaron Boodman 24c99ae3b5 Introduce poke: a simple tool for modifying Noms data (#2449)
Introduce poke: a simple tool for modifying Noms data
2016-08-30 00:24:38 -07:00
Mike Gray 2f66e67763 fixing misspellings, fixing IneffAssign reported issues (#2436)
also removing encode-perf-rig since codec-perf-rig is more current and real
2016-08-25 13:32:34 -04:00