mirror of
https://github.com/dolthub/dolt.git
synced 2026-01-24 18:59:02 -06:00
93 lines
4.2 KiB
Markdown
93 lines
4.2 KiB
Markdown
# Dataset pulling algorithm
|
|
The approach is to explore the chunk graph of both sink and source in order of decreasing ref-height. As the code walks, it uses the knowledge gained about which chunks are present in the sink to both prune the source-graph-walk and build up a set of `hints` that can be sent to a remote Database to aid in chunk validation.
|
|
|
|
## Basic algorithm
|
|
|
|
- let `sink` be the *sink* database
|
|
- let `source` be the *source* database
|
|
- let `snkQ` and `srcQ` be priority queues of `Ref` prioritized by highest `Ref.height`
|
|
- let `hints` be a map of `hash => hash`
|
|
- let `reachableChunks` be a set of hashes
|
|
- let `snkHdRef` be the ref (of `Commit`) of the head of the *sink* dataset
|
|
- let `srcHdRef` be the ref of the *source* `Commit`, which must descend from the `Commit` indicated by `snkHdRef`
|
|
|
|
- let `traverseSource(srcRef, srcQ, sink, source, reachableChunks)` be
|
|
- pop `srcRef` from `srcQ`
|
|
- if `!sink.has(srcRef)`
|
|
- let `c` = `source.batchStore().Get(srcRef.targetHash)`
|
|
- let `v` = `types.DecodeValue(c, source)`
|
|
- insert all child refs, `cr`, from `v` into `srcQ` and into reachableRefs
|
|
- `sink.batchStore().Put(c, srcRef.height, no hints)`
|
|
- (hints will all be gathered and handed to sink.batchStore at the end)
|
|
|
|
|
|
- let `traverseSink(sinkRef, snkQ, sink, hints)` be
|
|
- pop `snkRef` from `snkQ`
|
|
- if `snkRef.height` > 1
|
|
- let `v` = `sink.readValue(snkRef.targetHash)`
|
|
- insert all child refs, `cr`, from `v` into `snkQ` and `hints[cr] = snkRef`
|
|
|
|
|
|
- let `traverseCommon(comRef, snkHdRef, snkQ, srcQ, sink, hints)` be
|
|
- pop `comRef` from both `snkQ` and `srcQ`
|
|
- if `comRef.height` > 1
|
|
- if `comRef` is a `Ref` of `Commit`
|
|
- let `v` = `sink.readValue(comRef.targetHash)`
|
|
- if `comRef` == snkHdRef
|
|
- *ignore all parent refs*
|
|
- insert each other child ref `cr` from `v` into `snkQ` *only*, set `hints[cr] = comRef`
|
|
- else
|
|
- insert each child ref `cr` from `v` into both `snkQ` and `srcQ`, set `hints[cr] = comRef`
|
|
|
|
|
|
- let `pull(source, sink, srcHdRef, sinkHdRef)
|
|
- insert `snkHdRef` into `snkQ` and `srcHdRef` into `srcQ`
|
|
- create empty `hints` and `reachableChunks`
|
|
- while `srcQ` is non-empty
|
|
- let `srcHt` and `snkHt` be the respective heights of the *top* `Ref` in each of `srcQ` and `snkQ`
|
|
- if `srcHt` > `snkHt`, for every `srcHdRef` in `srcQ` which is of greater height than `snkHt`
|
|
- `traverseSource(srcHdRef, srcQ, sink, source)`
|
|
- else if `snkHt` > `srcHt`, for every `snkHdRef` in `snkQ` which is of greater height than `srcHt`
|
|
- `traverseSink(snkHdRef, snkQ, sink)`
|
|
- else
|
|
- for every `comRef` in which is common to `snkQ` and `srcQ` which is of height `srcHt` (and `snkHt`)
|
|
- `traverseCommon(comRef, snkHdRef, snkQ, srcQ, sink, hints)`
|
|
- for every `ref` in `srcQ` which is of height `srcHt`
|
|
- `traverseSource(ref, srcQ, sink, source, reachableChunks)`
|
|
- for every `ref` in `snkQ` which is of height `snkHt`
|
|
- `traverseSink(ref, snkQ, sink, hints)`
|
|
- for all `hash` in `reachableChunks`
|
|
- sink.batchStore().addHint(hints[hash])
|
|
|
|
|
|
## Isomorphic, but less clear, algorithm
|
|
|
|
- let all identifiers be as above
|
|
- let `traverseSource`, `traverseSink`, and `traverseCommon` be as above
|
|
|
|
- let `higherThan(refA, refB)` be
|
|
- if refA.height == refB.height
|
|
- return refA.targetHash < refB.targetHash
|
|
- return refA.height > refB.height
|
|
|
|
- let `pull(source, sink, srcHdRef, sinkHdRef)
|
|
- insert `snkHdRef` into `snkQ` and `srcHdRef` into `srcQ`
|
|
- create empty `hints` and `reachableChunks`
|
|
- while `srcQ` is non-empty
|
|
- if `sinkQ` is empty
|
|
- pop `ref` from `srcQ`
|
|
- `traverseSource(ref, srcQ, sink, source, reachableChunks))
|
|
- else if `higherThan(head of srcQ, head of snkQ)`
|
|
- pop `ref` from `srcQ`
|
|
- `traverseSource(ref, srcQ, sink, source, reachableChunks))
|
|
- else if `higherThan(head of snkQ, head of srcQ)`
|
|
- pop `ref` from `snkQ`
|
|
- `traverseSink(ref, snkQ, sink, hints)`
|
|
- else, heads of both queues are the same
|
|
- pop `comRef` from `snkQ` and `srcQ`
|
|
- `traverseCommon(comRef, snkHdRef, snkQ, srcQ, sink, hints)`
|
|
- for all `hash` in `reachableChunks`
|
|
- sink.batchStore().addHint(hints[hash])
|
|
|
|
|