Using ChunkStore.PutMany() means that the DataStore server code
can detect when the ChunkStore it's writing to can't handle
the amount of data being pushed. This patch reports that
status back across the wire to the client that's attempting
to write a Value graph. Due to Issue #1259, the only thing the
client can currently do is retry the entire batch, but we hope
to do better in the future.
A novel chunk may contain references to any other novel chunk, as long
as there are no cycles. This means that breaking up the stream of
novel chunks being written to the server into batches risks creating
races -- chunks in one batch might reference chunks in another,
meaning that the server would somehow need to be able to
cross-reference batches. This seems super hard, so we've just forced
the code to write in one massive batch upon Commit(). We'll evaluate
the performance of this solution and see what we need to change.
Also, there's a terrible hack in HandleWriteValue to make it so that
pulls can work by back-channeling all their chunks via postRefs/ and
then writing the final Commit object via writeValue/
This can be fixed once we fix issue 822
This patch is unfortunately large, but it seemed necessary to make all
these changes at once to transition away from having an HTTP
ChunkStore that could allow for invalid state in the DB. Now, we have
a RemoteDataStoreClient that allows for reading and writing of Values,
and performs validation on the server side before persisting chunks.
The semantics of DataStore are that written values can be read back
out immediately, but are not guaranteed to be persistent until after
Commit() The semantics are now that Put() blocks until the Chunk is
persisted, and the new PutMany() can be used to write a number of
Chunks all at once.
From a command-line tool point of view, -h and -h-auth still work as
expected.
This patch is the first step in moving all reading and writing to the
DataStore API, so that we can validate data commited to Noms.
The big change here is that types.ReadValue() no longer exists and is
replaced with a ReadValue() method on DataStore. A similar
WriteValue() method deprecates types.WriteValue(), but fully removing
that is left for a later patch. Since a lot of code in the types
package needs to read and write values, but cannot import the datas
package without creating an import cycle, the types package exports
ValueReader and ValueWriter interfaces, which DataStore implements.
Thus, a DataStore can be passed to anything in the types package which
needs to read or write values (e.g. a collection constructor or
typed-ref)
Relatedly, this patch also introduces the DataSink interface, so that
some public-facing apis no longer need to provide a ChunkSink.
Towards #654
These items are never compressed, but all other items have a tag
on them, so it makes sense that these should as well. Remain
tolerant of old roots, though, that are not tagged.
Most people don't want to see these most of the time, so
now there's a flag I can set when I want to see the
read/write stats that DynamoStore keeps.
Fixes#1037
With this patch, DynamoStore will now hold off writing anything to
the backend until it's got a full batch or someone calls UpdateRoot().
When it's time to write, DynamoStore will now fire off a new goroutine
to build the request (including compressing chunks), send it, and wait
for a response. It will keep up to dynamoWriteConcurrency concurrent
batch writes in flight.
Includes statKeeper, which provides a way for concurrent goroutines to
keep stats like write count, bytes written, etc. Also includes a refactor
to make unwrittenPutCache a separate type that both HTTPStore and
DynamoStore use instead of using copypasted code.
It turns out that many large chunks are quite compressible, and
writing smaller chunks to DynamoDB saves time, and allows more
headroom before hitting the provisioned capacity on the backing
table. Compressed chunks are tagged with the algorithm used to
compress them, though we treat untagged chunks as uncompressed for
backward compatibility.
In order to allow tools like shove to correctly address named
DataStores, we need to have the notion of a name be settable at the
DataStoreFlags level. Once we've done that, it doesn't really make
sense to have API surface for creating DataStores without a name --
though for compatibility, the code will continue to accept an empty
string for a DataStore's name
We use a conceptually similar approach to the one we use with DynamoDB
to allow a single database to provide multiple named datastores: keys
are all namespaced using the name of the datastore. The implementation
is a bit different, as we need to run the backing database ourselves.
Fixes#917
Add DataStore.Factory and ChunkStore.Factory so that client programs
that need to create multiple namespaced {Chunk,Data}Stores of the kind
indicated by command line flags have a convenient way to do so. The
details of how this is implemented are mostly contained in the various
ChunkStore.Factory implementations:
1) MemoryStore, TestStore - no change
2) LevelDBStore - the namespace is used as a subdirectory of the path
provided by the user
3) HTTPStore - the namespace is used as a path prefix for the endpoints
supported by DataStoreServer
4) DynamoStore - the namespace is used as a prefix on all keys
This change also required that DataStoreServer be able to handle URLs
of the form "http://server:port/namespace/endpoint", e.g.
"http://localhost:8000/mydatastore/getRefs". It currently still handles
the non-namespaced endpoints as well.
In order to make code from DataStoreServer more re-usable in other
contexts, the functions that handle calls to the server's various
endpoints are now standalone and live in datas/handlers.go.
DataStoreServer just contains logic for dispatch and server lifetime
management.
This DynamoDB store borrows some logic from HttpStore, in that Get,
Has and Put requests are dumped into channels that are watched by
code in goroutines and batched up to be sent to the backend.
A few structs have been factored out of http_store.go and moved to
remote_requests.go, where they are enhanced so that DynamoStore can use
them as well.