This patch is unfortunately large, but it seemed necessary to make all
these changes at once to transition away from having an HTTP
ChunkStore that could allow for invalid state in the DB. Now, we have
a RemoteDataStoreClient that allows for reading and writing of Values,
and performs validation on the server side before persisting chunks.
The semantics of DataStore are that written values can be read back
out immediately, but are not guaranteed to be persistent until after
Commit() The semantics are now that Put() blocks until the Chunk is
persisted, and the new PutMany() can be used to write a number of
Chunks all at once.
From a command-line tool point of view, -h and -h-auth still work as
expected.
We use a conceptually similar approach to the one we use with DynamoDB
to allow a single database to provide multiple named datastores: keys
are all namespaced using the name of the datastore. The implementation
is a bit different, as we need to run the backing database ourselves.
Fixes#917
Add DataStore.Factory and ChunkStore.Factory so that client programs
that need to create multiple namespaced {Chunk,Data}Stores of the kind
indicated by command line flags have a convenient way to do so. The
details of how this is implemented are mostly contained in the various
ChunkStore.Factory implementations:
1) MemoryStore, TestStore - no change
2) LevelDBStore - the namespace is used as a subdirectory of the path
provided by the user
3) HTTPStore - the namespace is used as a path prefix for the endpoints
supported by DataStoreServer
4) DynamoStore - the namespace is used as a prefix on all keys
This change also required that DataStoreServer be able to handle URLs
of the form "http://server:port/namespace/endpoint", e.g.
"http://localhost:8000/mydatastore/getRefs". It currently still handles
the non-namespaced endpoints as well.
In order to make code from DataStoreServer more re-usable in other
contexts, the functions that handle calls to the server's various
endpoints are now standalone and live in datas/handlers.go.
DataStoreServer just contains logic for dispatch and server lifetime
management.
Also, factor out a separate NewDataStoreWithRootTracker() since
the common case is to use the same value for both the ChunkStore
and the RootTracker.
Fixes#134
In addition to putting in the 'pull' tool that I forgot to add in my initial PR,
I added an extra unit test to cover a case that we found to be buggy, as well
as addressing some comments by aa and arv.
1) Switched to io.Copy in CopyChunks
2) Added NewFlagsWithPrefix()
3) Cleaned up some error reporting
We've been keeping some special-case code in Datastore to handle the
situation where the Root of the store is nonexistent. There was some
checking for the empty Ref and creating a SetOfCommit out of whole
cloth. This meant that if you asked an empty Datastore (or Dataset)
what its Heads() were, it would give you back a Value that wasn't
backed by a Chunk in its underlying ChunkStore. This caused some
issues with pull code, so we decided to change things such that a
DataStore is primed with an initial empty SetOfCommit upon creation.
This means creating a Dataset in a DataStore now shows up in DataStore
history, which it did not before. Essentially, every Dataset now has
an "initial commit" of an empty SetOfCommit when it's created. I think
that having these show up as part of the DataStore history makes sense,
but the model may evolve over time.
This initial implementation requires that both the "remote" and local
ChunkStores be accessible by the machine running the pull command.
I took an initial pass at splitting up the functions so that, e.g.,
calculating which refs are needed could be done on an actual remote
machine, and we can add a chunk copying routine that gets data from
the network or something.
Towards issue #81
Also change ChunkStore from being embedded in server to being a field of it. It is equivalent, but this feels more semantically correct and it's less code anyway.