Used to be that an NBS table was named by hashing the hashes
of every chunk present in the table, in hash order. That means
that to generate the name of a table you'd need to iterate
the prefix map and load every associated suffix. That would
be expensive when e.g. compacting multiple tables. This is
waaay cheaper and only slightly more likely to wind up with a
name collision.
Toward #3411
All this really does is tell the underlying ChunkStore
to go fetch the current root from persistent storage
and drops the now-out-of-date Dataset map on the floor.
Fixes https://github.com/attic-labs/attic/issues/1157
ValueStore.Flush() now Puts all Chunks buffered in the ValueStore
layer into the underlying ChunkStore. The Chunks are not persistent
at this point, not until and unless the caller calls Commit() on
the ChunkStore.
This patch also removes ChunkStore.Flush(). The same effect can be
achieved by calling ChunkStore.Commit() with the current Root for both
last and current.
NB: newTestValueStore is now private to the types package.
The logic is that, now, outside the types package, callers
need to hold onto the underlying ChunkStore if they want to
persist Chunks.
Toward #3404
It's important that MemoryStore (and, by extension TestStore)
correctly implement the new ChunkStore semantics before we go
shifting around the Flush semantics like we want to do in #3404
In order to make this a reality, I introduced a "persistence"
layer for MemoryStore called MemoryStorage, which can vend
MemoryStoreView objects that represent a snapshot of the
persistent storage and implement the ChunkStore contract.
Fixes#3400
Removed Rebase() in HandleRootGet, and added ChunkStore
tests to validate the new Put behavior more fully
Change the strategy for choosing which tables to compact. Choose 2 or more of the N smallest tables such that the resulting table will the smallest (or tied for smallest) table in the store.
BatchStore is dead, long live ChunkStore! Merging these two required
some modification of the old ChunkStore contract to make it more
BatchStore-like in places, most specifically around Root(), Put() and
PutMany().
The first big change is that Root() now returns a cached value for the
root hash of the Store. This is how NBS worked already, so the more
interesting change here is the addition of Rebase(), which loads the
latest persistent root. Any chunks that appeared in backing storage
since the ChunkStore was opened (or last rebased) also become
visible.
UpdateRoot() has been replaced with Commit(), because UpdateRoot() was
ALREADY doing the work of persisting novel chunks as well as moving
the persisted root hash of the ChunkStore in both NBS and
httpBatchStore. This name, and the new contract (essentially Flush() +
UpdateRoot()), is a more accurate representation of what's going on.
As for Put(), the former contract for claimed to block until the chunk
was durable. That's no longer the case. Indeed, NBS was already not
fulfilling this contract. The new contract reflects this, asserting
that novel chunks aren't persisted until a Flush() or Commit() --
which has replaced UpdateRoot(). Novel chunks are immediately visible
to Get and Has calls, however.
In addition to this larger change, there are also some tweaks to
ValueStore and Database. ValueStore.Flush() no longer takes a hash,
and instead just persists any and all Chunks it has buffered since the
last time anyone called Flush(). Database.Close() used to have some
side effects where it persisted Chunks belonging to any Values the
caller had written -- that is no longer so. Values written to a
Database only become persistent upon a Commit-like operation (Commit,
CommitValue, FastForward, SetHead, or Delete).
/******** New ChunkStore interface ********/
type ChunkStore interface {
ChunkSource
RootTracker
}
// RootTracker allows querying and management of the root of an entire tree of
// references. The "root" is the single mutable variable in a ChunkStore. It
// can store any hash, but it is typically used by higher layers (such as
// Database) to store a hash to a value that represents the current state and
// entire history of a database.
type RootTracker interface {
// Rebase brings this RootTracker into sync with the persistent storage's
// current root.
Rebase()
// Root returns the currently cached root value.
Root() hash.Hash
// Commit atomically attempts to persist all novel Chunks and update the
// persisted root hash from last to current. If last doesn't match the
// root in persistent storage, returns false.
// TODO: is last now redundant? Maybe this should just try to update from
// the cached root to current?
// TODO: Does having a separate RootTracker make sense anymore? BUG 3402
Commit(current, last hash.Hash) bool
}
// ChunkSource is a place chunks live.
type ChunkSource interface {
// Get the Chunk for the value of the hash in the store. If the hash is
// absent from the store nil is returned.
Get(h hash.Hash) Chunk
// GetMany gets the Chunks with |hashes| from the store. On return,
// |foundChunks| will have been fully sent all chunks which have been
// found. Any non-present chunks will silently be ignored.
GetMany(hashes hash.HashSet, foundChunks chan *Chunk)
// Returns true iff the value at the address |h| is contained in the
// source
Has(h hash.Hash) bool
// Returns a new HashSet containing any members of |hashes| that are
// present in the source.
HasMany(hashes hash.HashSet) (present hash.HashSet)
// Put caches c in the ChunkSink. Upon return, c must be visible to
// subsequent Get and Has calls, but must not be persistent until a call
// to Flush(). Put may be called concurrently with other calls to Put(),
// PutMany(), Get(), GetMany(), Has() and HasMany().
Put(c Chunk)
// PutMany caches chunks in the ChunkSink. Upon return, all members of
// chunks must be visible to subsequent Get and Has calls, but must not be
// persistent until a call to Flush(). PutMany may be called concurrently
// with other calls to Put(), PutMany(), Get(), GetMany(), Has() and
// HasMany().
PutMany(chunks []Chunk)
// Returns the NomsVersion with which this ChunkSource is compatible.
Version() string
// On return, any previously Put chunks must be durable. It is not safe to
// call Flush() concurrently with Put() or PutMany().
Flush()
io.Closer
}
Fixes#2945
ValidatingBatchingSink stopped batching a while ago, and has generally
made less and less sense over time. Splitting out decoding from
validation allows for clearer code in the server-side writeValue
handler.
Additionally, since we intend to make new chunks that are Put into a
ChunkStore not persist until Flush() or UpdateRoot() as a part of
now the only CS implementation -- splitting out these concerns allows
localBatchStore to stop caching all new chunks as it goes.
Fixes#3343
This adds IsValueSubtypeOf which skips computing the type of the value.
Use IsValueSubtypeOf to implement IsCommit which checks if a value is a
commit.
Replace usages of IsSubtype(t, TypeOf(v)) with IsValueSubtypeOf(v, t).
Fixes#3326Fixes#3348
At one point, we had some hard-to-diagnose failures
decoding chunks. We tracked the problem down and fixed
it, but it's good to keep the error reporting. It's done
more cleanly and efficiently, now.
Fixes#3148
The last patch did this in order to allow bad-behavers
to still have a chance of succeding if they write Values
top down. This ensures that they won't, and therefore
will run afoul of lazy completeness checking.
Follow on for #3371
Before ripping out all the hinting and associated proactive
validation checking, it was impossible to write Values through
the ValueStore API in any way other than bottom-up. That meant
that we could enforce the invariant that a chunk could not
appear in pendingParents unless it was also in pendingPuts.
Now, it's possible to construct legal sequences of calls to
ValueStore that result in this check being violated. Raf and
I don't think that violating this check can actually lead to
an invalid underlying database, as the lazy validation that
replaced the proactive validation should still catch any such
issues.
Fixes#3371
Our Number encoding consists of two parts. Firsts we convert the float
into f * 2**exp, then we uvarint encode f and exp. However, we didn't
normalize f so in theory we could end up with multiple representations
of the same number.
This changes the representation to make the f the smallest possible
integer that fulfills the formula above.
For example we used to encode 256 as (0x100, 0) but with this we instead
encode it as (0x01, 8).
Fixes#2307
1. Decoder no longer needed to remove struct cycles. That happened as we decoded
2. Remove no-op tests.
3. Remove dead code
4. Refactor and add test.
5. Just moving code around (mostly type_cache -> make_type)
6. Remove (no longer necessary) typeDepth counter in ValueDecoder
Introduce a "lock" hash into NBS manifests to address the bad
interaction between Flush() and optimistic locking. Our original
design didn't include Flush(), which changes the set of tables without
updating the root. Thus... an optimistic locking strategy predicated
on checking the currently-persisted root hash is not robust to
interleaved Flush() calls from multiple clients.
Fixes#3349
When we clone it is possible to run into loops between named and
unnamed structs. We need to stop cloning the type tree when we see
cycle.
Also update Describe to print the actual type tree, even if it is
invalid.
This moves the type off from the value and instead we compute it as we ask for.
This also changes how we detect cycles. If a named struct contains a struct with the
same name we now create a cycle between them. This also means that cycle types
now take a string and not a number.
For encoding we no longer write the type with the value (unless it is a types.Ref).
This is a format change so this takes us to 7.6
Fixes#3328Fixes#3325Fixes#3324Fixes#3323
* Add HasMany() to the ChunkStore interface
We'll need this as a part of #3180
* Rip out hinting
The hinting mechanism used to assist in server-side validation
of values served us well, but now it's in the way of building a
more suitable validation strategy. Tear it out and go without
validation for a hot minute until #3180 gets done.
Fixes#3178
* Implement server-side lazy ref validation
The server, when handling writeValue, now just keeps track of all the
refs it sees in the novel chunks coming from the client. Once it's
processed all the incoming chunks, it just does a big bulk HasMany to
determine if any of them aren't present in the storage backend.
Fixes#3180
* Remove chunk-write-order requirements
With our old validation strategy, it was critical that
chunk graphs be written bottom-up, during both novel value
creation and sync. With the strategy implemented in #3180,
this is no longer required, which lets us get rid of a bunch
of machinery:
1) The reverse-order hack in httpBatchStore
2) the EnumerationOrder stuff in NomsBlockCache
3) the orderedPutCache in datas/
4) the refHeight arg on SchedulePut()
Fixes#2982
BREAKING CHANGE
This removes the `Type()` method from the `types.Value` interface.
Instead use the `types.TypeOf(v types.Value) bool` function.
Fixes#3324
DynamoDB doesn't allow empty strings in records. We were sending an
empty string in the case where a store had no tables in it. Instead,
the right thing is to leave this attribute out of the record, and then
detect the case where the attribute is empty when reading the record.
* Update struct in memory representation
To contain struct name and field names.
This is in preparation for removing types from the value.
Towards ##3324
* Rename structFields to structTypeFields
* Undo change in vendoring
* searchField and copy