The central processor and Google Photos data sources now implement more verbose (but rudimentary) logging. We should probably switch to zap at some point.
Timeliner has always had the ability to merge items: if reprocessing, new items would overwrite data from existing items with the same ID. Now, identical items can be merged, including a new mode called "soft merge" which works on items with different IDs that are similar enough to be considered identical.
For example, Google Photos can be downloaded via the API or imported via a Takeout archive. While the Takeout archive provides location metadata, unfortunately the Takeout archive does not provide IDs, so using both the API and Takeout would duplicate the entire library. Enabling soft merging will compare the timestamp and filename of each item and, if identical, consider them to be identical, and will combine them. Yay!
This also made it necessary to configure which values are preferred for certain fields, for example the old or new ID, the old or new data file, etc.
This is a big refactor and likely introduced some bugs but it worked in my initial, tired testing well after midnight.
This is a very rough implementation that I only tested on a relatively small archive from Google Takeout (~1.7 GB).
It seems to work well enough but it has way less information than the API provides EXCEPT, QUITE NOTABLY, Takeout archives include location metadata and original file uploads!
Items in a Takeout archive do not have IDs, so this will not merge well with API-downloaded items. In fact, it could duplicate your entire library. Eek.
Use with caution, for now.
Checkpoints should only be resumed if the parameters of the command are the same; otherwise some providers return errors when trying to get "next page" using different parameters (Google Photos).
Also add -start and -end flags for get-all (and -end for get-latest) so that you can customize the date range of items to get, either by duration (relative) or date (absolute). This is useful, for example, when you want to only download items that are at least 10 days old (`-end "-240h"`).
* fix incorrect "coordinates as a string" type, remove deprecated "Geo" field
"Geo" and "Coordinates" fields have (lat,lng) in swapped order, therefore we
couldn't use the same structure ("tweetGeo") for them anyway - easier to remove
the deprecated field than to take this into account.
* add test for coordinate fix by decoding a "kitchen sink" API response
* Show link to the auth page even if we couldn't open a browser.
In case if we couldn't open a browser using `xdg-open` (For
example xdg-utils can be not installed in the system) -- user will still able to
follow the link (copy-paste from the terminal) and obtain oauth code.
Message will look like this:
```
Can't open browser (exec: "xdg-open": executable file not found in $PATH: ). Please follow the link: https://accounts.google.com/o/oauth2
```
* Add line break.
Co-Authored-By: Matt Holt <mholt@users.noreply.github.com>
Before these changes, get-latest would always go until the most recent
item downloaded from the given account. This potentially skips items
if get-latest was interrupted, then run again later, because get-latest
will stop once it finds the most recent item downloaded, usually one
of the first things downloaded.
So, this adds a cursor/marker to the DB for the account so that we know
which item ID was most recent as part of the last successful run; that
way, interrupted runs will not move the cursor, and thus no items will
be lost simply because get-latest was too naive to know that it should
keep scanning until an older timestamp.
By setting a token URL but not an auth URL, the credentials flow is
assumed to have a bearer token instead of needing an intermediate app
to authorize with.
Tested with a partial / PoC implementation of the Twitter API.