Add optional merging functionality to noms commit.
noms merge <database> <left-dataset-name> <right-dataset-name> <output-dataset-name>
The command above will look in the given Database for the two named
Datasets and, if possible, merge their HeadValue()s and commit the
result back to <output-dataset-name>.
Fixes#2535
This puts the flow header after the copyright header.
It also:
* fixes the existing files to have valid headers
* Makes sure the script can handle doctype
There are two places where ValidatingBatchingSink could be more
concurrent: Prepare(), where it's reading in hints, and Enqueue().
Making Prepare() handle many hints concurrently is easy because the
hints don't depend on one another, so that method now just spins up
a number of goroutines and runs them all at once.
Enqueue() is more complex, because while Chunk decoding and validation
of its hash can proceed concurrently, validating that a given Chunk is
'ref-complete' requires that the chunks in the writeValue payload all
be processed in order. So, this patch uses orderedparallel to run the
new Decode() method on chunks in parallel, but then return to serial
operation before calling the modified Enqueue() method.
Fixes#1935
This patch implements evolving support for configuring aliases and defaults for the noms cli (started with #2131)
For an introduction, please take a look at the sample code here: https://github.com/attic-labs/noms/blob/master/samples/cli/nomsconfig/README.md
Improvements include:
- All go samples now work with .nomsconfig
- Absolute paths in ldb specs are now properly handled
- Add -v|--verbose flag to commands to debug expansion
- Make default just another alias and change [default] section to [db.default]
- Introduce the `.` shorthand to refer to a previously mentioned dataset/object
This patch modifies LocalDatabase so that it no longer swaps out
its embedded ValueStore during Pull(). The reason it was doing this
is that Pull() injects chunks directly into a Database, without
doing any work on its own to ensure correctness. For LocalDatabase,
WriteValue() performs de-facto validation as it goes, so it does not
need this additional validation in the general case. To address the
former wtithout impacting the latter, we were making LocalDatabase
swap out its ValueStore() during Pull(), replacing it with one that
performs validation.
This led to inconsistencies, seen in issue #2598. Collections read
from the DB _before_ this swap could have weird behavior if they
were modified and written after this swap.
The new code just lazily creates a BatchStore for use during Pull().
Fixes#2598
Noms SDK users frequently shoot themselves in the foot because they're
holding onto an "old" Database object. That is, they have a Database
tucked away in some internal state, they call Commit() on it, and
don't replace the object in their internal state with the new Database
returned from Commit.
This PR changes the Database and Dataset Go API to be in line with the
proposal in Issue #2589. JS follows in a separate patch.
* Implement noms cli configuration support
- Introduce .nomsconfig
- Supports a default db to use when no explicit db is given
- Supports defining db aliases to use as short hand for db urls
- See samples/cli/nomsconfig for more info
fix: #2131
* Use sampling for a better bytes-written estimate for noms sync
* Confirmed that remaining overestimate of data written is consistent with leveldb stats and opened #2567 to track
These were previously intertwined into one writer that was
embedded in and only usable by the 'noms' command.
This commit separates them into to separate writers that
can be used independently or combined. I also moved them
into go/utils/writers so they can be used by other code.
The main impetus to do this was to fix Bug #2593.
* Add "noms commit" command
* Updated csv-import, json-import, xml-import and url-fetch to (optionally) not commit results
* Added helpers for creating commit meta-data struct through command line or function calls
Fixes are based on Go report card output:
- `gofmt -s` eliminates some duplication in struct/slice initialization
- `golint` found some issues like: `warning: should drop = nil from declaration of var XXX; it is the zero value`
- `golint` found some issues like: `warning: receiver name XXX should be consistent with previous receiver name YYY for ZZZ`
- `golint` says not to use underscores for function/variable names
- `golint` found several issues like: `warning: if block ends with a return statement, so drop this else and outdent its block`
No functional changes are included - just source code quality improvements.
fix misspellings; fix code that was not gofmt'd - plus take advantage of gofmt -s too; couple of unreachable golint reported fixes; reference go report card results and tests
This isn't triggering any problem in particular, I just noticed it. All
it meant was that noms log might take unnecessarily long with list diff
when there are a lot of changes.
In 7c103344 I changed noms log to use left-right diff so that it
completes faster, but it changed noms diff to use left-right as well.
Noms diff is supposed to use best diff.
* Fix noms_sync status message to indicate when new dataset is created
Originally, syncing a commit to a non-existent dataset and an already sync'ed dataset resulted in the same message: "<src> is up to date"
This fix distinguishes 2 cases:
- Syncing to a new dataset prints "<dest> created"
- Syncing to an identical dataset prints "<dest> already up to date"
Resolves: #2053
* Make message when noms sync creates new dataset more fun
Addresses: #2053
The Dataset.Commit() code pathway still enforces fast-forward-only
behavior, but a new SetHead() method allows the HEAD of a Dataset to
be forced to any other Commit.
noms sync detects the case where the source Commit is not a descendent
of the provided sink Dataset's HEAD and uses the new API to force the
sink to the desired new Commit, printing out the now-abandoned old
HEAD.
Fixes#2240
Previously there were 3 ways to construct a Path: via methods like
AddIndex, via parsing/AddPath, and simply via using append() (since Path
is a slice).
This patch deletes the former and leaves only ParsePath and append(),
with the ability to manually create the various path parts (I've made
them public). I also added documentation for the public types/fields.
It (a) had a spurious blank line, (b) should print the dataset path not
just its ID, (c) should include the # for the hash. Now it says:
```
> noms ds -d path/to/db::my-dataset
Deleted path/to/db::my-dataset (was #lvnbmpj94r9h3nuaj1t0m4jeqp25o1ve)
>
```
This fixes https://github.com/attic-labs/noms/issues/2235 where set/map
diff was still running when noms log has finished, and already closed
its database. Commonly this would crash.
I've removed panic-based control flow to make error handling explicit,
and added a "DiffLeftRight" method to set/map so that noms log completes
ASAP. See the issue for details.
For Blobs we print:
```
- Blob (42 kB)
+ Blob (1 B)
```
For Refs we just print the hashes:
```
- abcdeabcdeabcdeabcde
+ defghdefghdefghdefgh
```
Fixes#2213
Earlier I'd used 2 channels and selected over them, but as I found
elsewhere, this doesn't scale very well. It's simpler to just use a
close channel with a buffer size of 1.
Shows number of changes between two top level values.
```
noms diff -summarize $l::#t5p4im6uug7n5m72frr0dnjnkm04e4ph.value $l::#ueo0utduuqsf9vrntrhn25lnc19m848l.value
```
Prints:
```
13,636 insertions, 0 deletions, 107 changes, (1,919,894 values vs 1,933,530 values)
```
Where the numbers are updated as more data is computed from the diff.
Towards #2031
- Too have same API as all the other diff methods
- Send changes to channel without intermediary slices and without the
need to union and sort the fields
Previously a header was only printed once per entire map. This led to a
confusing diff where if a map like:
```
{
"a": {
"b": 1
}
}
```
changed to:
```
{
"a": {
"b": 2
},
"c": {
"d": 3
}
}
```
then the diff would be:
```
/["a"]
- "b": 1
+ "b": 2
+ "c": {
+ "d": 3
+ }
```
which makes it look like the "c" change is part of the ["a"]["b"] one.
After this change, the diff will be:
```
/["a"]
- "b": 1
+ "b": 2
/
+ "c": {
+ "d": 3
+ }
```
I also fixed up a bit of the diff formatting in this change.