This makes all but types.Type be backed by a []byte.
The motivation is to reduce the allocations and the work needed to be
done when we read parts of a value (especially prolly trees).
Towards #2270
In most cases this will avoid writing the root chunk of a prolly tree,
which is the behavior we're aiming for: a prolly tree might be used
inline in which case the root never needs to be written.
The solution in this patch is imperfect because it may unnecessarily
write chunks, but this is rare.
Fixes https://github.com/attic-labs/noms/issues/3645
This allows parsing all Noms values from the string representation
used by human readable encoding:
```
v, err := nomdl.Parse(vrw, `map {"abc": 42}`)
```
Fixes#1466
This patch implements a new strategy for Pull() that pulls the chunks
from a given level of the graph over in the order they'll be
encountered by clients reading the graph.
Fixes#2968
The new version of this tool now estimates the locality of a DB
written using the "grandchild" strategy implemented by
types.ValueStore. It does do by dividing each level of the graph
up into groups that are roughly the size of the branching factor
of that level, and then calculating how many physical reads are
needed to read each group.
In the case of perfect locality, each group could be read in a
single physical read, so that's what the tool uses as its estimate
of the optimal case.
Toward #2968
When requesting a range of values read all the chunks ahead of time.
This works for indexed sequences. Does not include support for ordered sequences.
Work towards https://github.com/attic-labs/noms/issues/3619
stretchr has fixed a bug with the -count flag. I could merge these
changes into attic-labs, but it's easier to just use strechr.
We forked stretchr a long time ago so that we didn't link in the HTTP
testing libraries into the noms binaries (because we were using d.Chk in
production code). The HTTP issue doesn't seem to happen anymore, even
though we're still using d.Chk.
Print Ref values as #123 instead of 123
Since our hashes are SHA-512 and we write them using Base32 there are a lot of overlaps with other parts of NomDL. This makes them unambiguous.
Towards #1466
This removes the type tagged version of the human readable encoding.
Motivation: Simplify this in preparation to make the HRS unambiguous so that we can write a parser.
Towards #1466
Introduce Sloppy - an estimating compression function for snappy - which allows for the rolling hash to better produce a given target chunk size after compression.
When we added GetMany and HasMany, we didn't realize that requests
could then be larger than the allowable HTTP form size. This patch
makes the body of getRefs and hasRefs be serialized as binary instead,
which addresses this issue and actually makes the request body more
compact.
Fixes#3589
* use kingpin for help and new commands, set up dummy command for noms blob
* document existing commands using kingpin
* remove noms-get and noms-set in favor of new noms blob command
* normalize bool flags in tests, remove redundant cases that kingpin now handles
* add kingpin to vendor files
* make profile flags global
* move --verbose and --quiet to global flags
Prior to this patch, whenever we created a chunkSource for a table
persisted to AWS, awsTablePersister::Open() would hit DynamoDB to
check whether the table data was stored there. That's how it knew
whether to create a dynamoTableReader or an s3TableReader. This
results in consulting Dynamo (or the in-memory small-table cache)
every time we go to open a table. Most of the time, this isn't
necessary, as we separately cache table indices and really only
need that data at Open() time.
This patch defers reading table data if possible.
Fixes#3607
This will likely bloat the cache with tables no one's going to read
data from, BUT doing this also means that most checks to see if a table
is in Dynamo at all can proceed locally. Stopgap until #3607 lands
Fail over to fully-consistent reads if no result. This means that
misses will get more expensive, but hits will cost us half what they
were costing in the initial version of the code.
Fixes#3604
Looking at metrics on staging today, there are frequent spikes of
tens of thousands of throttled DynamoDB reads. One explanation is
that we're constantly evicting 'hot' tables from the in-memory
cache because the working set is larger than the space we've
allotted for the cache.
There seems to be a floor on the amount of time required to persist
small objects to S3. For workloads that generate lots of small tables,
this can really add up. DynamoDB is much faster to read/write, and can
hold items of up to 400k. This patch stores tables with < 64 chunks
that are < 400k in DynamoDB, caching them in memory on persist and
open to reduce load on the back end.
Fixes#3559