Commit Graph

37 Commits

Author SHA1 Message Date
Ben Kalman 27c498e032 Merge pull request #716 from kalman/buz-window-size
Correctly distinguish between chunking window size and buzhash window…
2015-12-03 15:26:56 -08:00
Benjamin Kalman 338de4e583 Correctly distinguish between chunking window size and buzhash window size.
Previously the buzhash boundary checker used a single value for the
window size, both as the buzhash buffer size when constructing a hash
object, and reported as its window size to the boundary checker
interface. This was wrong because we don't always pass single byte
values to the hasher, for example refs are 20 bytes.

The compound list chunking compensated for this by only passing the
first byte of each list leaf's ref rather than the full ref. This is bad
because there is obviously less entropy in 1 byte vs 20 bytes.

The meta sequence chunking compensated for this by multiplying the
chunking window size by 20, but this also had the effect of
unnecessarily considering 20 times more chunked elements than would fit
in the buzhash buffer.
2015-12-03 14:58:35 -08:00
Rafael Weinstein 4c1f4464af Compound & Leaf values now have same Type() 2015-12-02 10:19:27 -08:00
Erik Arvidsson 61f14f8c9a Rename noms UInt* to Uint*
Fixes #673
2015-12-02 12:01:42 -05:00
Benjamin Kalman 4b901cdc84 Support compoundList.Append. 2015-11-20 16:40:19 -08:00
Rafael Weinstein da2c7461df CompoundList lives again 2015-11-16 11:27:47 -08:00
Erik Arvidsson a72ce41a1d Go: TypeRef -> Type
Remaining identifiers
2015-11-13 17:54:53 -05:00
Benjamin Kalman efa52fed84 Implement a generic sequence chunker, and use it to create blobs. 2015-11-13 14:21:26 -08:00
Rafael Weinstein f0ebce2d74 Added metaSequence + enc/dec & tests 2015-11-10 15:29:44 -08:00
Erik Arvidsson 756b893e8f Remove FromVal functions
The generated objects are all type.Values now so FromVal is not needed
2015-11-04 12:13:55 -05:00
Aaron Boodman 0622c8c860 Remove Future 2015-11-02 13:44:47 -08:00
Aaron Boodman bad6be3037 Remove futureFromValue. Another step toward removing Future. 2015-11-02 13:44:17 -08:00
Aaron Boodman c52bf0bbf5 Convert NewBlob() away from using resolvedFutures
Instead, use a backing MemoryStore. This is part of removing Futures.
2015-10-30 12:50:56 -07:00
Erik Arvidsson ede5f43204 Value should also have a TypeRef
This is so that we can get the runtime type of a value
2015-09-30 16:15:13 -04:00
Erik Arvidsson e10e6224b0 Codegen for NomDL
This adds a new codegen that reads .noms files and generates Go
API for these types

Issue #304
2015-09-17 14:01:49 -04:00
Aaron Boodman f58670bc83 NewBlob(): Reader.Read() can return both data and error.
Fixes #264
2015-09-04 15:02:29 -07:00
Erik Arvidsson d06da3ca0a Chunking: Multi level chunking for blobs
After a compound blob is created we try to chunk it again in a similar
way to how we chunk Lists. We use the refs of the sub blob and compute
a rolling hash over these. If the hash matches a pattern then we split
the existing compound blob into a new compound blob with sub blobs
which are slices of the original compound blob.

Issue #17
2015-09-03 19:47:17 -04:00
Erik Arvidsson b6197aadc4 Add support for compound lists.
Lists are now either a leafList or a compoundList. The compoundList
consists of sublists that are the chunks of the whole list.
2015-08-28 15:41:51 -04:00
Dan Willhite ab34143ba5 Pin dependencies using godep tool. Rewrite dep urls. 2015-08-26 14:05:40 -07:00
Erik Arvidsson ddebdcaefd Slight modification to compound blob encoding
The json serialization now only contains the length of each individual
blob child.

The go representation of this still uses offsets but the offsets are
for the end delimiter.

For "hi" "bye" we get

{"cb", [{"ref": "sha1-hi"}, 2, {"ref": "sha1-bye"}, 3]}

compoundBlob{[2, 5], [sha1-hi, ,sha1-bye]}

Keeping the length in the serialization leads to smaller serializations

Using the end offset leads to simpler binary search and allows us to
use the last entry as the length.

Issue #17
2015-08-07 11:24:27 -04:00
Erik Arvidsson ea52c4ac7c Implement Seek for Blob.Reader()
This allows us to only read the relevant chunks

Issue #17, #155
2015-08-06 12:22:41 -04:00
Erik Arvidsson 1e7db8e341 Swith to use offsets in compoun blobs
This is in preparation for Seek

Issue #155, #17
2015-08-04 19:11:44 -04:00
Erik Arvidsson 4e69837ef0 This introduce two new internal values, blobLeaf and compoundBlob. At
this point the compoundBlob only contains blob leafs but a future
change will create multiple tiers. Both these implement the new Blob
interface.

The splitting is done by using a rolling hash over the last 64 bytes,
when that hash ends with 13 consecutive ones we split the data.

Issue #17
2015-08-03 20:09:42 -04:00
Erik Arvidsson c5964aadcd Make Blob take a Reader instead of byte array
This is in preparation for chunking
2015-07-30 18:53:22 -04:00
Aaron Boodman 7944c1b3af Revert "Make WriteValue return a "skinny" copy of input value" 2015-07-30 09:23:35 -07:00
Aaron Boodman a84893c0d8 Make WriteValue return a "skinny" copy of input value
Fixes #141
2015-07-29 16:06:54 -07:00
Chris Masone 4fe00d4f81 Address aa's comments
- Return factory methods to privacy
- use tighter syntax inside Chunks() methods
- Rename Futures() -> Chunks()
2015-07-23 15:32:38 -07:00
Chris Masone a560139d73 Make types.future public
This will enable us to walk the chunk graph without having to go
through weird contortions to figure out which values don't have
chunks in any chunkstore (because they were inlined).

Towards issue #82
2015-07-23 15:32:26 -07:00
Erik Arvidsson 3fdc008f5c Codegen: Add support for noms types
This makes it possible to do a List of Bool or Map of Int32 etc
2015-07-22 12:24:27 -07:00
Rafael Weinstein 50029d1380 Removing cachedRef inferior of func ensureRef() 2015-07-10 16:16:13 -07:00
Aaron Boodman 96f21c4a60 Remove the Foo/flatFoo abstraction in the types package.
Fixes #24.
2015-07-10 11:29:03 -07:00
Aaron Boodman 53003f23f2 Add Value::Ref() 2015-06-12 15:22:27 -07:00
Aaron Boodman 22eb927c4e Blob::Read()->Blob::Reader() 2015-06-12 08:45:29 -07:00
Aaron Boodman 4f8d555bf8 ByteLen()->Len() ... since String is now aggregated in Blob, don't need this weird naming. 2015-06-09 22:58:10 -07:00
Aaron Boodman 266a96a084 Factor Blob back out into a method of String, rather than embedded. Embedding makes branching on nom type hard. 2015-06-09 22:56:17 -07:00
Aaron Boodman 7d00f1bf12 Add initial String implementation 2015-06-04 14:42:28 -07:00
Aaron Boodman eb7db31b90 Add initial Blob implementation 2015-06-04 12:24:36 -07:00