this point the compoundBlob only contains blob leafs but a future
change will create multiple tiers. Both these implement the new Blob
interface.
The splitting is done by using a rolling hash over the last 64 bytes,
when that hash ends with 13 consecutive ones we split the data.
Issue #17
Also, factor out a separate NewDataStoreWithRootTracker() since
the common case is to use the same value for both the ChunkStore
and the RootTracker.
Fixes#134
Fixes#145.
I looked for other occurences via `git grep Bytes()`. Only other
suspicious cases I found were related to types.Blob, which doesn't
deal in io.Reader yet, and memoryChunkWriter. In memoryChunkWriter
we're copying from a buffer, so I don't think there's any win, so
left that one.
In addition to putting in the 'pull' tool that I forgot to add in my initial PR,
I added an extra unit test to cover a case that we found to be buggy, as well
as addressing some comments by aa and arv.
1) Switched to io.Copy in CopyChunks
2) Added NewFlagsWithPrefix()
3) Cleaned up some error reporting
I accidentally left this commented out at some point, and it rusted
a bit when I made the change to have walk.doTreeWalk() walk over
chunks rather than values. So, re-enable the tests and fix them.
We've been keeping some special-case code in Datastore to handle the
situation where the Root of the store is nonexistent. There was some
checking for the empty Ref and creating a SetOfCommit out of whole
cloth. This meant that if you asked an empty Datastore (or Dataset)
what its Heads() were, it would give you back a Value that wasn't
backed by a Chunk in its underlying ChunkStore. This caused some
issues with pull code, so we decided to change things such that a
DataStore is primed with an initial empty SetOfCommit upon creation.
This means creating a Dataset in a DataStore now shows up in DataStore
history, which it did not before. Essentially, every Dataset now has
an "initial commit" of an empty SetOfCommit when it's created. I think
that having these show up as part of the DataStore history makes sense,
but the model may evolve over time.
This initial implementation requires that both the "remote" and local
ChunkStores be accessible by the machine running the pull command.
I took an initial pass at splitting up the functions so that, e.g.,
calculating which refs are needed could be done on an actual remote
machine, and we can add a chunk copying routine that gets data from
the network or something.
Towards issue #81
ReadValue() tries to Get() the ref it's given from the ChunkSource it's given.
We recently changed ChunkSource to return nil with no error if the ref is not
in the ChunkSource. ReadValue, though, soldiers on in the case of a nil
return value from Get, calling Close() on it and other things. This is, I
think, bad.
The walk package contains walk.Some() and walk.All() which let you
walk the Chunk graph starting at a given Ref.
I also added GetReachabilitySetDiff(), which will determine which
refs in a given ChunkSource can be reached from one given ref, but not
the other.
Towards issue #82
This will enable us to walk the chunk graph without having to go
through weird contortions to figure out which values don't have
chunks in any chunkstore (because they were inlined).
Towards issue #82