After a compound blob is created we try to chunk it again in a similar
way to how we chunk Lists. We use the refs of the sub blob and compute
a rolling hash over these. If the hash matches a pattern then we split
the existing compound blob into a new compound blob with sub blobs
which are slices of the original compound blob.
Issue #17
When we are building the chunked lists we had a lot of loops that did
O(log n) Get operations. Since we are just getting consecutive elements
from the list we can make getting the next one O(1) making these loop
go from O(n*log(n)) to O(n)
Issue #215
When we write the part after the change and we hit a chunk split we
check whether the list also had a split at the same index (adjusted
for adding/removal). If it did then we know that the rest of the sub
list are the same.
Issue #215
Instead of returning errors, these now use d.Exp to raise catchable
errors.
Also, added commit hash at which code was pulled from encoding/json
Marshal io.Reader into a Blob, unmarshal Blob into io.Writer
Unmarshal and Marshal are tools for moving data from Noms into native Go and
back. The rules are described in the documentation of the two functions, but
the behavior is broadly similar to encoding/json.
Towards issue #160
The json serialization now only contains the length of each individual
blob child.
The go representation of this still uses offsets but the offsets are
for the end delimiter.
For "hi" "bye" we get
{"cb", [{"ref": "sha1-hi"}, 2, {"ref": "sha1-bye"}, 3]}
compoundBlob{[2, 5], [sha1-hi, ,sha1-bye]}
Keeping the length in the serialization leads to smaller serializations
Using the end offset leads to simpler binary search and allows us to
use the last entry as the length.
Issue #17
this point the compoundBlob only contains blob leafs but a future
change will create multiple tiers. Both these implement the new Blob
interface.
The splitting is done by using a rolling hash over the last 64 bytes,
when that hash ends with 13 consecutive ones we split the data.
Issue #17
ReadValue() tries to Get() the ref it's given from the ChunkSource it's given.
We recently changed ChunkSource to return nil with no error if the ref is not
in the ChunkSource. ReadValue, though, soldiers on in the case of a nil
return value from Get, calling Close() on it and other things. This is, I
think, bad.