In discussing the patch that added parallelism, raf and I realized
that it's possible to be a bit more aggressive in the cases where one
queue is 'taller' than the other. In the current code, in that case,
we will parallelize work on all the Refs from the taller queue that
have a strictly higher ref-height than the head of the shorter queue.
We realized that it's safe to also take Refs from the taller queue
that are the SAME height as those at the top of the shorter queue,
as long as you handle common Refs correctly.
Fixes#1818
The basic approach here is to take the max of the heights of the
source and sink queues, then grab all the refs of that height from
both and sort them into three sets: refs in the source, refs in the
sink, and refs in both. These are then processed in parallel and the
reachable refs are all added to the appropriate queue. Repeat as long
as stuff still shows up in the source queue.
Fixes#1564
Change Dataset.Pull to use a single algorithm to pull data from a
source to a sink, regardless of which (if any) is local. The basic
algorithm is described in the first section of pulling.md. This
implementation is equivalent but phrased a bit differently. The
algorithm actually used is described in the second section of
pulling.md
The main changes:
- datas.Pull(), which implements the new pulling algorithm
- RefHeap, a priority queue that sorts types.Ref by ref-height and
then by ref.TargetHash()
- Add has() to both Database implementations. Cache has() checks.
- Switch Dataset to use new datas.Pull(). Currently not concurrent.
Toward #1568
Mostly, prune reachableChunks