Introduce photo-dedup-by-date
This program deduplicates photos by the date they were taken. It considers two photos a group if they were separated by less than 5 seconds.
This is a side-by-side port, taking inspiration from the old dataspec.go
code. Notably:
- LDB support has been added in Go. It wasn't needed in JS.
- There is an Href() method on Spec now.
- Go now handles IPV6.
- Go no longer treats access_token specially.
- Go now has Pin.
- I found some issues in the JS while doing this, I'll fix later.
I've also updated the config code to use the new API so that basically
all the Go samples use the code, even if they don't really change.
* Add jobs for grouping similar photos in PhotoGroups
Outline:
- The first photo-dhash job adds a dhash field to each photo. The dhash is a 128 bit
downsampled representation of the photo that works well for visual similarity comparisions.
- The second photo-dedup job groups photos that have similar dhash's into PhotoGroups.
fixes: #2787
The big change here is adding a new Spec class in spec.js. This replaces
DatabaseSpec/DatasetSpec/PathSpec in specs.js, but I'm leaving those in
and moving code over in a later patch. For now, only photos UI.
The photos UI change is to plumb through the authorization token through
the Spec code. For now, it's reading it from a URL parameter, but soon
I'll make it session based (probably localStorage).
The demo-server change is to add the Authorization header into CORS.
Private databases begin with "/p/" - for example, "/kalman" is not
private, but "/p/kalman" is private. They are not the same database.
The bulk of this work is the receipt infrastructure.
A receipt is form data that gives access to a database, encrypted using
secretbox. For example, "Database=/p/kalman&Date=12345678" might encrypt
to "SFH5bcIJ3_XgEbtmi_AdCKTItW20fl90czVl5_pF5PAXhNQ366U1yOpYGAjT".
* A new tool receiptkey generates random receipt (secretbox) keys.
* A new tool receipttool generates receipts for databases.
* demo-server has been updated to check for a receipt in the
Authorization header to access private databases.
receipttool and demo-server must be given the same receipt key.
Add optional merging functionality to noms commit.
noms merge <database> <left-dataset-name> <right-dataset-name> <output-dataset-name>
The command above will look in the given Database for the two named
Datasets and, if possible, merge their HeadValue()s and commit the
result back to <output-dataset-name>.
Fixes#2535
Performs face merge functionality, it takes a photo's set of
face center points and face rectangles and returns the
set of faces in which the face rectangle contains the face
center point. We store a new photo object with a set of faces, that
have the names from the face center points and the rectangles from the
face rectangles.
* dropbox/find-photos: encode auth token in photo URLs so they can
work in UI.
* Remove requirement for datePublished from photo-index
Dropbox doesn't have a publish date
* fix test
* review comments
* npm test
demo-server was using a read-through cache to allow it to serve more
concurrent requests more quickly by reducing disk I/O. As seen in issue
across Databases, leading to incorrect sync results in some instances.
Since we're not worried about demo-server load right now, simply delete
the cache.
Fixes#2688
The subsequent runs of url-fetch on jenkins are way faster, and this
appears to be because commiting is much faster on subsequent runs. The
perf tests now use a new database each time.
This patch implements evolving support for configuring aliases and defaults for the noms cli (started with #2131)
For an introduction, please take a look at the sample code here: https://github.com/attic-labs/noms/blob/master/samples/cli/nomsconfig/README.md
Improvements include:
- All go samples now work with .nomsconfig
- Absolute paths in ldb specs are now properly handled
- Add -v|--verbose flag to commands to debug expansion
- Make default just another alias and change [default] section to [db.default]
- Introduce the `.` shorthand to refer to a previously mentioned dataset/object
This just involves changing types.NewBlob(io.MultiReader(files...)) to
types.NewBlob(files...). On my laptop it improves
Test01ImportSfCrimeBlobFromTestdata from 21s to 16s - though much of
this is dominated by commit, which wouldn't be affected by this change.
Blob.Concat is a simple use of the sequence concat code that List.Concat uses.
NewBlob uses Blob.Concat to construct a Blob in parallel.
Perf tests for parallel NewBlob write N temporary files then constructs a Blob
from them, so there is some I/O, but it appears to be mostly CPU bound. NewBlob
doesn't get much more than 50% faster with any P >= 2.
Noms SDK users frequently shoot themselves in the foot because they're
holding onto an "old" Database object. That is, they have a Database
tucked away in some internal state, they call Commit() on it, and
don't replace the object in their internal state with the new Database
returned from Commit.
This PR changes the Database and Dataset Go API to be in line with the
proposal in Issue #2589. JS follows in a separate patch.
* Add "noms commit" command
* Updated csv-import, json-import, xml-import and url-fetch to (optionally) not commit results
* Added helpers for creating commit meta-data struct through command line or function calls
This patch modifies merge.ThreeWay() to take a callback that allows
for custom conflict resolution. The noms-merge command-line tool uses
this to inject a callback that accepts input from the console
dictating whether to accept the value from the 'left' or 'right' merge
candidates.
Toward #2445