Noms SDK users frequently shoot themselves in the foot because they're
holding onto an "old" Database object. That is, they have a Database
tucked away in some internal state, they call Commit() on it, and
don't replace the object in their internal state with the new Database
returned from Commit.
This PR changes the Database and Dataset Go API to be in line with the
proposal in Issue #2589. JS follows in a separate patch.
* Use sampling for a better bytes-written estimate for noms sync
* Confirmed that remaining overestimate of data written is consistent with leveldb stats and opened #2567 to track
Once we integrate noms-merge into the `noms commit` command, this
function will allow us to stop requiring users to pass in the common
ancestor to be used when merging. The code can just find it and merge
away.
Toward #2535
The Dataset.Commit() code pathway still enforces fast-forward-only
behavior, but a new SetHead() method allows the HEAD of a Dataset to
be forced to any other Commit.
noms sync detects the case where the source Commit is not a descendent
of the provided sink Dataset's HEAD and uses the new API to force the
sink to the desired new Commit, printing out the now-abandoned old
HEAD.
Fixes#2240
It turns out Pull() was making some bad assumptions about how the Go
heap package used its backing storage. Since it wasn't really relying
on heap guarantees anyway, this changes the code to use a slice of Ref
that's sorted in order of increasing ref-height: RefByHeight.
LocalDatabase generally uses a BatchStoreAdaptor, which is a kinda
dumb wrapper around ChunkStore. During a Pull(), though, this would
cause Chunks to be Put in a top-down fashion, meaning that Chunks
wound up in the backing store _before_ other Chunks that they
reference. This means that LocalDatabases were transiently invalid,
and that cancelling an in-progress pull could lead to an invalid DB.
Now, calling validatingBatchStore() on a LocalDatabase returns a
BatchStore that uses the same strategy as RemoteDatabaseClient,
caching chunks as the come in and putting them into the backing store
bottom-up when Flush() is called.
Fixes#1915
We now compute the commit type based on the type of the value and
the type of the parents.
For the first commit we get:
```
struct Commit {
parents: Set<Ref<Cycle<0>>>,
value: T,
}
```
As long as we continue to commit values with type T that type stays
the same.
When we later commits a value of type U we get:
```
struct Commit {
parents: Set<Ref<struct Commit {
parents: Set<Ref<Cycle<0>>>,
value: T | U
}>>,
value: U,
}
```
The new type gets combined as a union type for the value of the inner
commit struct.
Fixes#1495
In discussing the patch that added parallelism, raf and I realized
that it's possible to be a bit more aggressive in the cases where one
queue is 'taller' than the other. In the current code, in that case,
we will parallelize work on all the Refs from the taller queue that
have a strictly higher ref-height than the head of the shorter queue.
We realized that it's safe to also take Refs from the taller queue
that are the SAME height as those at the top of the shorter queue,
as long as you handle common Refs correctly.
Fixes#1818
The basic approach here is to take the max of the heights of the
source and sink queues, then grab all the refs of that height from
both and sort them into three sets: refs in the source, refs in the
sink, and refs in both. These are then processed in parallel and the
reachable refs are all added to the appropriate queue. Repeat as long
as stuff still shows up in the source queue.
Fixes#1564
Change Dataset.Pull to use a single algorithm to pull data from a
source to a sink, regardless of which (if any) is local. The basic
algorithm is described in the first section of pulling.md. This
implementation is equivalent but phrased a bit differently. The
algorithm actually used is described in the second section of
pulling.md
The main changes:
- datas.Pull(), which implements the new pulling algorithm
- RefHeap, a priority queue that sorts types.Ref by ref-height and
then by ref.TargetHash()
- Add has() to both Database implementations. Cache has() checks.
- Switch Dataset to use new datas.Pull(). Currently not concurrent.
Toward #1568
Mostly, prune reachableChunks