Added --invert argument to indicate column major order
Added --append argument to append imported data to current head of dataset
Added --limit-records to import data for limited number of rows
* Modifications to ipfs-chat and ipfs chunkstore
* Change ipfs paths to include directory where ipfs repo is stored.
* Rework ipfs-chat to create ipfs chunkstores manually rather than
relying on Spec.ForDataset. This enables creating two chunkstores
(one local and one network) using the same IpfsNode (ipfs repo).
* Create separate replicate function for daemon and mergeMessage
function for client to experiment with slightly different behaviors
for each.
* Re-organization of code to remove duplication.
The main points are:
* added event loop to process events synchronously
* more agressive about not re-processing msgs from other nodes
that we've already processed
* fixed bug in ipfs chunkstore HasMany()
* Add go-base58 library
This makes all but types.Type be backed by a []byte.
The motivation is to reduce the allocations and the work needed to be
done when we read parts of a value (especially prolly trees).
Towards #2270
This allows parsing all Noms values from the string representation
used by human readable encoding:
```
v, err := nomdl.Parse(vrw, `map {"abc": 42}`)
```
Fixes#1466
Tweaking the main loop that processes list entries to avoid some
map assignments, lookups, and allocations saves 15% or so, resulting
in an overall savings of about 1m on the 6m runtime of our test
workload (as run on my laptop).
Towards #3690
Takes the output of a CSV file imported as a List of Struct and
"inverts" it so that it's now a Struct of Lists.
Example:
List<Struct Row {
Base?: String,
DOLocationID?: String,
}>
becomes
Struct Columnar {
base: List<String>,
dolocationid: List<String>,
}
stretchr has fixed a bug with the -count flag. I could merge these
changes into attic-labs, but it's easier to just use strechr.
We forked stretchr a long time ago so that we didn't link in the HTTP
testing libraries into the noms binaries (because we were using d.Chk in
production code). The HTTP issue doesn't seem to happen anymore, even
though we're still using d.Chk.
* Add --lowercase option to map column names to lowercase struct names
By default, each column name maps to a struct field preserving the original case.
If --lowercase is specified the resulting struct fields will always be lowercase.
Introduce Sloppy - an estimating compression function for snappy - which allows for the rolling hash to better produce a given target chunk size after compression.
When we added GetMany and HasMany, we didn't realize that requests
could then be larger than the allowable HTTP form size. This patch
makes the body of getRefs and hasRefs be serialized as binary instead,
which addresses this issue and actually makes the request body more
compact.
Fixes#3589
* use kingpin for help and new commands, set up dummy command for noms blob
* document existing commands using kingpin
* remove noms-get and noms-set in favor of new noms blob command
* normalize bool flags in tests, remove redundant cases that kingpin now handles
* add kingpin to vendor files
* make profile flags global
* move --verbose and --quiet to global flags
Right now, the only kinds of Commits that the server will retry are those
to different datasets. That is, if another client concurrently landed a change
to some other dataset, and that is the only thing that causes your Commit
attempt to fail, the server will retry.
Fixes#3582
In an NBS world, bulk 'has' checks are waaaay cheaper than they used
to be. In light of this, we can toss out the complex logic we were
using in Pull() -- which basically existed for no reason other than to
avoid doing 'has' checks. Now, the code basically just descends down a
tree of chunks breadth-first, using HasMany() at each level to figure
out which chunks are not yet in the sink all at once, and GetMany() to
pull them from the source in bulk.
Fixes#3182, Towards #3384
Used to be that an NBS table was named by hashing the hashes
of every chunk present in the table, in hash order. That means
that to generate the name of a table you'd need to iterate
the prefix map and load every associated suffix. That would
be expensive when e.g. compacting multiple tables. This is
waaay cheaper and only slightly more likely to wind up with a
name collision.
Toward #3411