The old strategy for writing values was to recursively encode them,
putting the resulting chunks into a BatchStore from the bottom up as
they were generated. The BatchStore implementation was responsible for
handling concurrency, so chunks from different Values would be
interleaved if the there were multiple calls to WriteValue happening
at the same time.
The new strategy tries to keep chunks from the same 'level' of a
graph together by caching chunks as they're encoded and only writing
them once they're referenced by some other value. When a collection
is written, the graph representing it is encoded recursively, and
chunks are generated bottom-up. The new strategy should, in practice,
mean that the children of a given parent node in this graph will be
cached until that parent gets written, and then they'll get written
all at once.
This patch adds an optional MergePolicy field to CommitOptions. It's a
callback. If the caller sets it, then the commit code will look for a
common ancestor between the Dataset HEAD and the provided Commit. If
the caller-provided Commit descends from HEAD, then Commit proceeds as
normal.
If it does not, but there is a common ancestor, the code runs
merge.ThreeWay() on the values of the provided Commit, HEAD, and the
common ancestor, invoking the MergePolicy callback to resolve
conflicts. If merge succeeds, a merge Commit is created that descends
from both HEAD and the caller-provided Commit. This becomes the new
HEAD of the Dataset.
Fixes#2534
Noms SDK users frequently shoot themselves in the foot because they're
holding onto an "old" Database object. That is, they have a Database
tucked away in some internal state, they call Commit() on it, and
don't replace the object in their internal state with the new Database
returned from Commit.
This PR changes the Database and Dataset Go API to be in line with the
proposal in Issue #2589. JS follows in a separate patch.
Prior to this change, concurrent modification of two different
Datasets in a given Database would result in an
ErrOptimisticLockFailed for whichever process lost the race. As of
this patch, concurrent changes to the _same_ Dataset will result in an
ErrMergeNeeded for the loser, but concurrent changes to different
Datasets will succeed.
Fixes#2524
Fixes are based on Go report card output:
- `gofmt -s` eliminates some duplication in struct/slice initialization
- `golint` found some issues like: `warning: should drop = nil from declaration of var XXX; it is the zero value`
- `golint` found some issues like: `warning: receiver name XXX should be consistent with previous receiver name YYY for ZZZ`
- `golint` says not to use underscores for function/variable names
- `golint` found several issues like: `warning: if block ends with a return statement, so drop this else and outdent its block`
No functional changes are included - just source code quality improvements.
- Move to commit.CommitDescendsFrom
- When searching for an ancestor in the commit history, prune
whenever the commit.Height() < ancestor.Height()
- Add new test TestCommitDescendsFrom to verify correctness and pruning.
fix misspellings; fix code that was not gofmt'd - plus take advantage of gofmt -s too; couple of unreachable golint reported fixes; reference go report card results and tests
In service of streaming set and map mutators, it'd be helpful for
ValueStore to implement some methods that are exposed only to other
code within the types package, even when a ValueStore is embedded
in a Database. That said, we don't want those methods to become a
part of the public Database API.
To enable this, this patch changes databaseCommon to embed a
*types.ValueStore, instead of composing one. This causes all the
ValueStore methods to be a part of the databaseCommon API, but
since all the Database construction methods return a Database by
interface, stuff like BatchStore() all gets masked. Thus, callers
still can't access ValueStore methods. Databases are passed into
the types package as ValueReadWriters, though, so code in there
can call any method on that interface -- including unexported stuff.
The Dataset.Commit() code pathway still enforces fast-forward-only
behavior, but a new SetHead() method allows the HEAD of a Dataset to
be forced to any other Commit.
noms sync detects the case where the source Commit is not a descendent
of the provided sink Dataset's HEAD and uses the new API to force the
sink to the desired new Commit, printing out the now-abandoned old
HEAD.
Fixes#2240
In the server side of the Remote Databse, the handler for
UpdateRoot now verifies that the new proposed Root is of a
legal type: empty map OR Map<String, Ref<Commit-like>>
Fixes#2116
We now compute the commit type based on the type of the value and
the type of the parents.
For the first commit we get:
```
struct Commit {
parents: Set<Ref<Cycle<0>>>,
value: T,
}
```
As long as we continue to commit values with type T that type stays
the same.
When we later commits a value of type U we get:
```
struct Commit {
parents: Set<Ref<struct Commit {
parents: Set<Ref<Cycle<0>>>,
value: T | U
}>>,
value: U,
}
```
The new type gets combined as a union type for the value of the inner
commit struct.
Fixes#1495
Change Dataset.Pull to use a single algorithm to pull data from a
source to a sink, regardless of which (if any) is local. The basic
algorithm is described in the first section of pulling.md. This
implementation is equivalent but phrased a bit differently. The
algorithm actually used is described in the second section of
pulling.md
The main changes:
- datas.Pull(), which implements the new pulling algorithm
- RefHeap, a priority queue that sorts types.Ref by ref-height and
then by ref.TargetHash()
- Add has() to both Database implementations. Cache has() checks.
- Switch Dataset to use new datas.Pull(). Currently not concurrent.
Toward #1568
Mostly, prune reachableChunks