Using ChunkStore.PutMany() means that the DataStore server code
can detect when the ChunkStore it's writing to can't handle
the amount of data being pushed. This patch reports that
status back across the wire to the client that's attempting
to write a Value graph. Due to Issue #1259, the only thing the
client can currently do is retry the entire batch, but we hope
to do better in the future.
The initial refactor had some pretty confusing struct and method
names, so this patch renames a number of things and migrates a bunch
of code to the types/ from datas/, where it seems to be a better
logical fit.
datas.cachingValueStore -> types.ValueStore
datas.hintedChunkStore interface -> types.BatchStore
datas.naiveHintedChunkSink -> types.BatchStoreAdaptor
datas.httpHintedChunkStore -> datas.httpBatchStore
datas.notAHintedChunkSink -> datas.notABatchStore
Also, types now exports a ValidatingBatchingSink, which is used by
datas.HandleWriteValue to process incoming hints and validate incoming
Chunks before putting them into a ChunkStore.
Towards Issue #1250
For performance reasons, Package objects for generated Noms Types are
side-loaded when reading Values. This means that the
opportunistically-populated chunk->Type map used by DataStore when
validating writes won't see these chunks in a number of cases. This
can lead to false negatives and erroneous validation failures. This
patch special-cases RefOfPackage when caching the Chunks reachable
from a newly-read Value, manually fetching them from the
types.PackageRegistry and crawling their reachable Chunks.
Fixes#1229.
A novel chunk may contain references to any other novel chunk, as long
as there are no cycles. This means that breaking up the stream of
novel chunks being written to the server into batches risks creating
races -- chunks in one batch might reference chunks in another,
meaning that the server would somehow need to be able to
cross-reference batches. This seems super hard, so we've just forced
the code to write in one massive batch upon Commit(). We'll evaluate
the performance of this solution and see what we need to change.
Also, there's a terrible hack in HandleWriteValue to make it so that
pulls can work by back-channeling all their chunks via postRefs/ and
then writing the final Commit object via writeValue/
This can be fixed once we fix issue 822
This patch is unfortunately large, but it seemed necessary to make all
these changes at once to transition away from having an HTTP
ChunkStore that could allow for invalid state in the DB. Now, we have
a RemoteDataStoreClient that allows for reading and writing of Values,
and performs validation on the server side before persisting chunks.
The semantics of DataStore are that written values can be read back
out immediately, but are not guaranteed to be persistent until after
Commit() The semantics are now that Put() blocks until the Chunk is
persisted, and the new PutMany() can be used to write a number of
Chunks all at once.
From a command-line tool point of view, -h and -h-auth still work as
expected.
This patch removes the special RemoteDataStore implementation of
CopyReachableChunksP, as this is seldom-used and adds complexity
that stands in the way of Issue 654
To facilitate validation, DataStore needs to remember which chunks
it's seen, what their refs are, and the Noms type of the Values they
encode. Then, DataStore can look at each Value that comes in via
WriteValue() and validate it by checking every embedded ref (if any)
against this cache.
Towards #654
In pursuit of issue #654, we want to be able to figure out all the
refs contained in a given Value, along with the Types of the Values to
which those refs point. Value.Chunks() _almost_ met those needs, but
it returned a slice of ref.Ref, which doesn't convey any type info.
To address this, this patch does two things:
1) RefBase embeds the Value interface, and
2) Chunks() now returns []types.RefBase
RefBase now provides Type() as well, by virtue of embedding Value, so
callers can just iterate through the slice returned from Chunks() and
gather type info for all the refs embedded in a given Value.
I went all the way and made RefBase a Value instead of just adding the
Type() method because both types.Ref and the generated Ref types are
actually all Values, and doing so allowed me to change the definition of
refBuilderFunc in package_registry.go to be more precise. It now returns
RefBase instead of just Value.
This patch is the first step in moving all reading and writing to the
DataStore API, so that we can validate data commited to Noms.
The big change here is that types.ReadValue() no longer exists and is
replaced with a ReadValue() method on DataStore. A similar
WriteValue() method deprecates types.WriteValue(), but fully removing
that is left for a later patch. Since a lot of code in the types
package needs to read and write values, but cannot import the datas
package without creating an import cycle, the types package exports
ValueReader and ValueWriter interfaces, which DataStore implements.
Thus, a DataStore can be passed to anything in the types package which
needs to read or write values (e.g. a collection constructor or
typed-ref)
Relatedly, this patch also introduces the DataSink interface, so that
some public-facing apis no longer need to provide a ChunkSink.
Towards #654
This is the most naive implementation of Dataset deletion. It does no
cleanup of data, but simply drops the Dataset's head from the map at
the root of the DataStore.
There was no way to create a datas.Factory from outside
the datas package except via datas.Flags. Also, this patch
moves the CreateFactory() method on Flags into the file
where all the rest of the Flags stuff is defined.
In order to allow tools like shove to correctly address named
DataStores, we need to have the notion of a name be settable at the
DataStoreFlags level. Once we've done that, it doesn't really make
sense to have API surface for creating DataStores without a name --
though for compatibility, the code will continue to accept an empty
string for a DataStore's name
We use a conceptually similar approach to the one we use with DynamoDB
to allow a single database to provide multiple named datastores: keys
are all namespaced using the name of the datastore. The implementation
is a bit different, as we need to run the backing database ourselves.
Fixes#917
Add DataStore.Factory and ChunkStore.Factory so that client programs
that need to create multiple namespaced {Chunk,Data}Stores of the kind
indicated by command line flags have a convenient way to do so. The
details of how this is implemented are mostly contained in the various
ChunkStore.Factory implementations:
1) MemoryStore, TestStore - no change
2) LevelDBStore - the namespace is used as a subdirectory of the path
provided by the user
3) HTTPStore - the namespace is used as a path prefix for the endpoints
supported by DataStoreServer
4) DynamoStore - the namespace is used as a prefix on all keys
This change also required that DataStoreServer be able to handle URLs
of the form "http://server:port/namespace/endpoint", e.g.
"http://localhost:8000/mydatastore/getRefs". It currently still handles
the non-namespaced endpoints as well.
In order to make code from DataStoreServer more re-usable in other
contexts, the functions that handle calls to the server's various
endpoints are now standalone and live in datas/handlers.go.
DataStoreServer just contains logic for dispatch and server lifetime
management.
This DynamoDB store borrows some logic from HttpStore, in that Get,
Has and Put requests are dumped into channels that are watched by
code in goroutines and batched up to be sent to the backend.
A few structs have been factored out of http_store.go and moved to
remote_requests.go, where they are enhanced so that DynamoStore can use
them as well.