Commit Graph

64 Commits

Author SHA1 Message Date
Ben Kalman
03b7221c36 Use stretchr/testify not attic-labs/testify (#3677)
stretchr has fixed a bug with the -count flag. I could merge these
changes into attic-labs, but it's easier to just use strechr.

We forked stretchr a long time ago so that we didn't link in the HTTP
testing libraries into the noms binaries (because we were using d.Chk in
production code). The HTTP issue doesn't seem to happen anymore, even
though we're still using d.Chk.
2017-09-07 15:01:03 -07:00
cmasone-attic
4e7db2cd49 Fix httpChunkStore race between Close() and Has/Get (#3571)
In httpChunkStore, calls to Get/Has and friends put a request object
with a 'return channel' onto a queue channel, and then block on the
return channel. The queue channel was buffered, which made it
impossible to cause Get, Has et al to terminate reliably when the
store was closed.

This patch removes the buffering on the channel so we can
deterministically bail from Get/Has et al when closing the store. I
don't think we were actually seeing any benefit from the buffer on the
queue channels, because everywhere we write to one of them we
immediately block on another channel, waiting for the result of the
request.

Fixes #3566
2017-06-27 13:02:29 -07:00
Rafael Weinstein
3ff92950d8 Revert removal of |last| from Commit() (#3531) 2017-06-09 11:20:45 -07:00
cmasone-attic
0fd37ba20d Add chunks.Factory::CreateStoreFromCache() (#3529)
Add a method to chunks.Factory that allows the caller to
signal that it's willing to try to make forward progress
using an out-of-date ChunkStore. This allows AWSStoreFactory
and LocalStoreFactory to vend NBS instances without hammering
persistent storage every time.

Towards #3491
2017-06-09 08:52:22 -07:00
Rafael Weinstein
214054986b Enforce clearer concurrency semantics of ValueStore (#3527) 2017-06-08 11:40:22 -07:00
cmasone-attic
286555560c Add Stats() to Database (#3497)
This just returns interface{}, allowing underlying ChunkStore
implementations to return whatever kind of stats struct they want.

Fixes #3493
2017-05-23 09:55:01 -07:00
cmasone-attic
46cf38eaae Simplify Pull() (#3490)
In an NBS world, bulk 'has' checks are waaaay cheaper than they used
to be. In light of this, we can toss out the complex logic we were
using in Pull() -- which basically existed for no reason other than to
avoid doing 'has' checks. Now, the code basically just descends down a
tree of chunks breadth-first, using HasMany() at each level to figure
out which chunks are not yet in the sink all at once, and GetMany() to
pull them from the source in bulk.

Fixes #3182, Towards #3384
2017-05-22 15:50:12 -07:00
Rafael Weinstein
3cdb43146a Remove PutMany (#3448) 2017-05-03 10:40:24 -07:00
Rafael Weinstein
6e47be3899 make MemoryStoreFactory public (#3417) 2017-04-21 17:46:11 -07:00
cmasone-attic
fef871c1a7 ValueStore.Flush() no longer persists Chunks (#3416)
ValueStore.Flush() now Puts all Chunks buffered in the ValueStore
layer into the underlying ChunkStore. The Chunks are not persistent
at this point, not until and unless the caller calls Commit() on
the ChunkStore.

This patch also removes ChunkStore.Flush(). The same effect can be
achieved by calling ChunkStore.Commit() with the current Root for both
last and current.

NB: newTestValueStore is now private to the types package.
The logic is that, now, outside the types package, callers
need to hold onto the underlying ChunkStore if they want to
persist Chunks.

Toward #3404
2017-04-21 17:30:56 -07:00
cmasone-attic
16ef8884a7 Make MemoryStore come correct (#3406)
It's important that MemoryStore (and, by extension TestStore)
correctly implement the new ChunkStore semantics before we go
shifting around the Flush semantics like we want to do in #3404

In order to make this a reality, I introduced a "persistence"
layer for MemoryStore called MemoryStorage, which can vend
MemoryStoreView objects that represent a snapshot of the
persistent storage and implement the ChunkStore contract.

Fixes #3400

Removed Rebase() in HandleRootGet, and added ChunkStore
tests to validate the new Put behavior more fully
2017-04-21 14:13:52 -07:00
cmasone-attic
ff7cae6d34 Merge chunks.RootTracker interface into chunks.ChunkStore (#3408)
You can't fully specify RootTracker without referring to the
ChunkStore interface, so they should just merge.

Fixes #3402
2017-04-19 21:34:20 -07:00
cmasone-attic
dc41f18498 Code review changes from #3403 (#3405) 2017-04-19 13:34:12 -07:00
cmasone-attic
cb930dee81 Merge BatchStore into ChunkStore (#3403)
BatchStore is dead, long live ChunkStore! Merging these two required
some modification of the old ChunkStore contract to make it more
BatchStore-like in places, most specifically around Root(), Put() and
PutMany().

The first big change is that Root() now returns a cached value for the
root hash of the Store. This is how NBS worked already, so the more
interesting change here is the addition of Rebase(), which loads the
latest persistent root. Any chunks that appeared in backing storage
since the ChunkStore was opened (or last rebased) also become
visible.

UpdateRoot() has been replaced with Commit(), because UpdateRoot() was
ALREADY doing the work of persisting novel chunks as well as moving
the persisted root hash of the ChunkStore in both NBS and
httpBatchStore. This name, and the new contract (essentially Flush() +
UpdateRoot()), is a more accurate representation of what's going on.

As for Put(), the former contract for claimed to block until the chunk
was durable. That's no longer the case. Indeed, NBS was already not
fulfilling this contract. The new contract reflects this, asserting
that novel chunks aren't persisted until a Flush() or Commit() --
which has replaced UpdateRoot(). Novel chunks are immediately visible
to Get and Has calls, however.

In addition to this larger change, there are also some tweaks to
ValueStore and Database. ValueStore.Flush() no longer takes a hash,
and instead just persists any and all Chunks it has buffered since the
last time anyone called Flush(). Database.Close() used to have some
side effects where it persisted Chunks belonging to any Values the
caller had written -- that is no longer so. Values written to a
Database only become persistent upon a Commit-like operation (Commit,
CommitValue, FastForward, SetHead, or Delete).

/******** New ChunkStore interface ********/

type ChunkStore interface {
     ChunkSource
     RootTracker
}

// RootTracker allows querying and management of the root of an entire tree of
// references. The "root" is the single mutable variable in a ChunkStore. It
// can store any hash, but it is typically used by higher layers (such as
// Database) to store a hash to a value that represents the current state and
// entire history of a database.
type RootTracker interface {
     // Rebase brings this RootTracker into sync with the persistent storage's
     // current root.
     Rebase()

     // Root returns the currently cached root value.
     Root() hash.Hash

     // Commit atomically attempts to persist all novel Chunks and update the
     // persisted root hash from last to current. If last doesn't match the
     // root in persistent storage, returns false.
     // TODO: is last now redundant? Maybe this should just try to update from
     // the cached root to current?
     // TODO: Does having a separate RootTracker make sense anymore? BUG 3402
     Commit(current, last hash.Hash) bool
}

// ChunkSource is a place chunks live.
type ChunkSource interface {
     // Get the Chunk for the value of the hash in the store. If the hash is
     // absent from the store nil is returned.
     Get(h hash.Hash) Chunk

     // GetMany gets the Chunks with |hashes| from the store. On return,
     // |foundChunks| will have been fully sent all chunks which have been
     // found. Any non-present chunks will silently be ignored.
     GetMany(hashes hash.HashSet, foundChunks chan *Chunk)

     // Returns true iff the value at the address |h| is contained in the
     // source
     Has(h hash.Hash) bool

     // Returns a new HashSet containing any members of |hashes| that are
     // present in the source.
     HasMany(hashes hash.HashSet) (present hash.HashSet)

     // Put caches c in the ChunkSink. Upon return, c must be visible to
     // subsequent Get and Has calls, but must not be persistent until a call
     // to Flush(). Put may be called concurrently with other calls to Put(),
     // PutMany(), Get(), GetMany(), Has() and HasMany().
     Put(c Chunk)

     // PutMany caches chunks in the ChunkSink. Upon return, all members of
     // chunks must be visible to subsequent Get and Has calls, but must not be
     // persistent until a call to Flush(). PutMany may be called concurrently
     // with other calls to Put(), PutMany(), Get(), GetMany(), Has() and
     // HasMany().
     PutMany(chunks []Chunk)

     // Returns the NomsVersion with which this ChunkSource is compatible.
     Version() string

     // On return, any previously Put chunks must be durable. It is not safe to
     // call Flush() concurrently with Put() or PutMany().
     Flush()

     io.Closer
}

Fixes #2945
2017-04-19 13:31:58 -07:00
cmasone-attic
192bdf6801 Remove DynamoStore (#3388)
We're no longer using this, and forthcoming changes to ChunkStore
mean that we'd have to do work to continue supporting it.
2017-04-14 13:41:38 -07:00
cmasone-attic
de76d37f09 Rip out hinting, reverse-order hack; make validation lazy (#3340)
* Add HasMany() to the ChunkStore interface

We'll need this as a part of #3180

* Rip out hinting

The hinting mechanism used to assist in server-side validation
of values served us well, but now it's in the way of building a
more suitable validation strategy. Tear it out and go without
validation for a hot minute until #3180 gets done.

Fixes #3178

* Implement server-side lazy ref validation

The server, when handling writeValue, now just keeps track of all the
refs it sees in the novel chunks coming from the client. Once it's
processed all the incoming chunks, it just does a big bulk HasMany to
determine if any of them aren't present in the storage backend.

Fixes #3180

* Remove chunk-write-order requirements

With our old validation strategy, it was critical that
chunk graphs be written bottom-up, during both novel value
creation and sync. With the strategy implemented in #3180,
this is no longer required, which lets us get rid of a bunch
of machinery:

1) The reverse-order hack in httpBatchStore
2) the EnumerationOrder stuff in NomsBlockCache
3) the orderedPutCache in datas/
4) the refHeight arg on SchedulePut()

Fixes #2982
2017-04-06 16:54:40 -07:00
cmasone-attic
69c351affa Remove ChunkStore backpressure mechanism (#3278)
This was something that evolved from the way that Dynamo stores
data, and a way to allow clients to make incremental write
progress. We never actually made the clients handle it
properly, though, and so much has changed since we wrote it
that it's only going to be in the way of building something
better.

Fixes #3234
2017-03-17 12:54:58 -07:00
cmasone-attic
b1e918d1d4 Share s3, dynamodb clients (#3212)
These objects manage their own pools of HTTP connections and
other resources, so it's generally best to share them
process-wide if you can.

Fixes #3027
2017-02-22 13:41:23 -08:00
Rafael Weinstein
83b657fe62 Remove LevelDBStore (#3193)
Remove LevelDBStore
2017-02-14 21:52:25 -08:00
Ben Kalman
b0927d852c gofmt -s -w (#3159) 2017-02-08 09:37:15 -08:00
Rafael Weinstein
759c36c96f Walk avoids blobs (#3074) 2017-01-13 16:32:40 -08:00
Rafael Weinstein
5fa5484f46 remove orederedparallel (#3050) 2017-01-10 15:45:05 -08:00
Aaron Boodman
a09ef6fb44 Revert "Introduce noms version 8. Use it to guard type simplification." (#3043) 2017-01-09 16:30:25 -08:00
cmasone-attic
ee2a5fa510 Make DeserializeToChan return an error, use it everywhere (#3042)
Chunk deserialization can run into errors sometimes if, e.g. the
client hangs up during a writeValue request. The old error strategy
worked by throwing a "catchable" error and recovering. That's OK if
you've only got one goroutine, but since the writeValue handler starts
so many goroutines, architecting the code to deal with error handling
by panic/recover is dicey.

Instead, make DeserializeToChan return an error in the more common
failure cases and handle it by passing it over a channel and raising
it from a central place.
2017-01-09 13:58:44 -08:00
cmasone-attic
e7a96c3748 Add ValueStore.ReadManyValues() (#3036)
The more code can use GetMany(), the better performance gets on top of
NBS. To this end, add a call to ValueStore that allows code to read
many values concurrently. This can be used e.g. by read-ahead code
that's navigating prolly trees to increase performance.

Fixes #3019
2017-01-08 14:37:37 -08:00
Aaron Boodman
a4ffa5ba9b Introduce noms version 8. Use it to guard type simplification. (#3035)
Introduce noms version 8. Use it to guard type simplification.
2017-01-06 17:32:32 -08:00
cmasone-attic
14c20ebdd7 Make server use GetMany to load hinted chunks (#3026)
Now that we have GetMany, the server can use it directly to let the
chunk-fetching layer figure out the best way to batch up requests. A
small refactor allows ValidatingBatchingSink to directly update the
hint cache instead of relying on logic inside ReadValue to do it. I
made that change because ReadValue now also does a bunch of other
things around caching read values and checking a cache of 'pending
Puts' that will never have anything in it when used from the server.

Toward issue #3019
2017-01-05 10:59:26 -08:00
Rafael Weinstein
3242f18c20 [NBS] Implement Streaming GetMany (#3002)
Adds the ability to stream individual chunks requested via GetMany() back to caller.

Removes readAmpThresh and maxReadSize. Lowers the S3ReadBlockSize to 512k.
2017-01-03 12:25:01 -08:00
cmasone-attic
ca31583a08 Add new spec for nbs-aws (#2997)
The new spec is a URI, akin to what we use for HTTP It allows the
specification of a DynamoDB table, an S3 bucket, a database ID, and a
dataset ID: aws://table-name:bucket-name/database::dataset

The bucket name is optional and, if not provided, Noms will use a
ChunkStore implementation backed only by DynamoDB.
2017-01-02 08:24:45 -08:00
Rafael Weinstein
335454b34c ChunkSink.Flush() (#2937)
Add ChunkSink.Flush() which signals the ChunkSink that any previously Put chunks should be made durable.
2016-12-12 15:39:13 -08:00
Rafael Weinstein
0652e0b3e0 Add ChunkSource.GetMany(); RemoteBatchStore getRefs uses GetMany() (#2933)
Add GetMany(), which most ChunkStores implement by repeated calls to their own Get(), but creates the opportunity for stores to optimize reads of larger blocks of potentially sequential chunks (e.g. NBS).

Add RemoteBatchStore getRefs endpoint support for calling GetMany() rather than Get()

Remove ReadThroughChunkStore which was dead code.
2016-12-12 11:18:22 -08:00
Rafael Weinstein
a67bb9bf7b Minor rework of hash.Hash API (#2888)
Define the hash.Hash type to be a 20-byte array, rather than embed one. Hash API Changes: `hash.FromSlice` -> `hash.New`, `hash.FromData` -> `hash.Of`
2016-12-02 12:11:00 -08:00
cmasone-attic
0cf72d5b85 Add debug logging to HandleWriteValue (#2846)
This patch introduces optional debug logging in util/verbose, and adds
some usage of it to HandleWriteValue and the httpBatchStore
SchedulePut code path. It also modifies chunks.DeserializeToChan() so
that callers can better recover from panics in there.

https://github.com/attic-labs/attic/issues/103
2016-11-21 15:11:34 -08:00
Dan Willhite
46586ee928 Remove msg args from d.PanicIfTrue and d.PanicIfFalse. (#2757)
Should discourage people from writing code that does unnecessary work
to generate a msg every time that an error condition is checked. Fixes #2741
2016-11-03 11:43:57 -07:00
cmasone-attic
c9c1bb9ff5 Add concurrency to use of ValidatingBatchingSink (#2684)
There are two places where ValidatingBatchingSink could be more
concurrent: Prepare(), where it's reading in hints, and Enqueue().

Making Prepare() handle many hints concurrently is easy because the
hints don't depend on one another, so that method now just spins up
a number of goroutines and runs them all at once.

Enqueue() is more complex, because while Chunk decoding and validation
of its hash can proceed concurrently, validating that a given Chunk is
'ref-complete' requires that the chunks in the writeValue payload all
be processed in order. So, this patch uses orderedparallel to run the
new Decode() method on chunks in parallel, but then return to serial
operation before calling the modified Enqueue() method.

Fixes #1935
2016-10-10 15:33:35 -07:00
Eric Halpern
27cbfdd489 Fix noms-sync surprising quantity (#2531)
* Use sampling for a better bytes-written estimate for noms sync
* Confirmed that remaining overestimate of data written is consistent with leveldb stats and opened #2567 to track
2016-09-20 10:57:40 -07:00
Erik Arvidsson
5edf89cf3d Replace d.Chk.True with d.PanicIfFalse (#2563)
And same for d.Chk.False
2016-09-14 13:11:28 -07:00
Mike Gray
4e54c44d56 no functional changes, improving code quality (#2410)
fix misspellings; fix code that was not gofmt'd - plus take advantage of gofmt -s too; couple of unreachable golint reported fixes; reference go report card results and tests
2016-08-23 13:51:38 -04:00
Mike Gray
22bc81e355 adding godoc synopsis for several top level packages (#2394) 2016-08-22 13:50:31 -04:00
Sungguk Lim
6697c2e6fc Replace github.com/tsuru/gnuflag with github.com/juju/gnuflag (#2340)
Replace vendor folder and where it is used.
2016-08-11 10:29:57 -07:00
cmasone-attic
4ccaa7014a Rate-limit LevelDBStore read operations as well (#2239)
Under load, our server can exhaust the number of file descriptors it's
allowed to have open at one time. Part of this is because of how many
incoming connections it's handling, but we believe that handling lots
of simultaneous reads to leveldb is the larger part of the issue.

This patch applies the rate limit we were using for writing to both
read and write operations.

Fixes #2227
2016-08-04 09:35:35 -07:00
mgedigian
321350d7e5 fixing typos, stale comments, broken link (#2250) 2016-08-02 15:47:04 -07:00
cmasone-attic
55025ee801 Add caching layer to demo-server (#2228)
This patch creates a new kind of chunks.Factory that demo-server
uses to vend ChunkStore instances that all share the same
MemoryStore-based Chunk cache. This cache _will_ grow without bound,
but the current RAM/data ratio on demo.noms.io means that, in practice,
we will be fine for a bit.

This will need to be removed in favor of a real solution in Issue #2227

Fixes #2009
2016-08-01 11:55:16 -07:00
Erik Arvidsson
ed0364cc19 Switch to gnuflag (#2206)
This is to support:
- shorthands
- Putting commands anywhere (after positional arguments too)
2016-07-29 18:08:23 -07:00
Chris Masone
b0112ba52b Remove NewSerializer
NewSerializer spun up a goroutine within itself. We've decided
this is an anti-pattern. Furthermore, we were using this inside
our remote database handler code, and a panic inside that goroutine
could take down the server. The callsites now use Serialize() directly.

Fixes #2169
2016-07-28 16:05:03 -07:00
Chris Masone
afb50a6272 Add comment explaining DynamoDB table format to dynamo_store.go
Fixes #2170
2016-07-27 12:37:13 -07:00
Vinicius Baggio Fuentes
9e7f1aaef7 go/chunks: changes convenience constructor for DynamoStore.
The convenience constructor changed in this patch takes in a aws.Config
object directly. This allows any implementation of the mentioned interface
to be passed in to Noms's Dynamo store -- giving flexibility for client
code to add their own credential acquisition mechanisms, for instance.

[fixes #2151]
2016-07-26 11:52:09 -07:00
Erik Arvidsson
f2a83346ca JS: Change hash function to sha512
For browser support we use npm amscrypto.js-sha512. For node we use its
builtin crypto module.
2016-07-12 13:59:09 -07:00
Erik Arvidsson
1507b8dd8f Go: Change hash function to sha512 2016-07-12 13:59:08 -07:00
Mike Gray
a7f29a716d noms as one command line application, with version and help (#1874) 2016-07-06 15:38:25 -04:00