removed portions of noms we dont need / won't maintain in preparation for moving to our repo

This commit is contained in:
Brian Hendriks
2019-06-20 10:28:18 -07:00
parent b970e80530
commit 17bd99a876
1864 changed files with 0 additions and 758041 deletions
-7
View File
@@ -1,7 +0,0 @@
.git
doc
codecov.yml
CONTRIBUTING.md
LICENSE
README.md
samples
-4
View File
@@ -1,4 +0,0 @@
sudo: required
services:
- docker
script: docker build .
-97
View File
@@ -1,97 +0,0 @@
Contributing to Noms
====================
## Install Go
First setup Go on your machine per https://golang.org/doc/install.
Don't forget to [setup your `$GOPATH` and `$BIN` environment variables](https://golang.org/doc/install) correctly. Everybody forgets that.
You can test your setup like so:
```shell
# This should print something
echo $GOPATH
# We need at least version 1.7
go version
```
## Setup Noms Environment
Add `NOMS_VERSION_NEXT=1` to your environment. The current trunk codebase is a development version of the format and this environment variable is a safety check to ensure people aren't accidentally using this development format against production servers.
## Get and build Noms
```shell
go get github.com/attic-labs/noms/cmd/noms
cd $GOPATH/src/github.com/attic-labs/noms/cmd/noms
go build
go test
```
## License
Noms is open source software, licensed under the [Apache License, Version 2.0](LICENSE).
## Contributing code
Due to legal reasons, all contributors must sign a contributor agreement, either for an [individual](https://attic-labs.github.io/ca/individual.html) or [corporation](https://attic-labs.github.io/ca/corporation.html), before a pull request can be accepted.
## Languages
* Use Go, JS, or Python.
* Shell script is not allowed.
## Coding style
* Go uses `gofmt`, advisable to hook into your editor
* JS follows the [Airbnb Style Guide](https://github.com/airbnb/javascript)
* Tag PRs with either `toward: #<bug>` or `fixes: #<bug>` to help establish context for why a change is happening
* Commit messages follow [Chris Beam's awesome commit message style guide](http://chris.beams.io/posts/git-commit/)
### Go error reporting
In general, for Public API in Noms, we use the Go-style of returning errors by default.
For non-exposed code, we do provide, and use, some wrappers to do Exception-style error handling. There *must* be an overriding rationale for using this style, however. One reason to use the Exception-style is that the current code doesn't know how to proceed and needs to panic, but you want to signal that a calling function somewhere up the stack might be able to recover from the failure and continue.
For these cases, please use the following family of functions to 'raise' a 'catchable' error (see [go/d/try.go](https://godoc.org/github.com/attic-labs/noms/go/d)):
* d.PanicIfError()
* d.PanicIfTrue()
* d.PanicIfFalse()
You might see some old code that uses functions that seem similar starting with `d.Chk`, however we are going to remove those and don't want to use them for new code. See #3258 for details.
## Submitting PRs
We follow a code review protocol dervied from the one that the [Chromium team](https://www.chromium.org/) uses:
1. Create a GitHub fork of the repo you want to modify (e.g., fork `https://github.com/attic-labs/noms` to `https://github.com/<username>/noms`).
2. Add your own fork as a remote to your github repo: `git remote add <username> https://github.com/<username>/noms`.
3. Push your changes to a branch at your fork: `git push <username> <branch>`
4. Create a PR using the branch you just created. Usually you can do this by just navigating to https://github.com/attic-labs/noms in a browser - GitHub recognizes the new branch and offers to create a PR for you.
5. When you're ready for review, make a comment in the issue asking for a review. Sometimes people won't review until you do this because we're not sure if you think the PR is ready for review.
6. Iterate with your reviewer using the normal Github review flow.
7. Once the reviewer is happy with the changes, they will submit them.
## Running the tests
You can use `go test` command, e.g:
* `go test $(go list ./... | grep -v /vendor/)` should run every test except from vendor packages.
If you have commit rights, Jenkins automatically runs the Go tests on every PR, then every subsequent patch. To ask Jenkins to immediately run, any committer can reply (no quotes) "Jenkins: test this" to your PR.
### Perf tests
By default, neither `go test` nor Jenkins run the perf tests, because they take a while.
To run the tests yourself, use the `-perf` and `-v` flag to `go test`, e.g.:
* `go test -v ./samples/go/csv/... -perf mem`
See https://godoc.org/github.com/attic-labs/noms/go/perf/suite for full documentation and flags.
To ask Jenkins to run the perf tests for you, reply (no quotes) "Jenkins: perf this" to your PR. Your results will be viewable at http://perf.noms.io/?ds=http://demo.noms.io/perf::pr_$your-pull-request-number/csv-import. Again, only a committer can do this.
-24
View File
@@ -1,24 +0,0 @@
FROM golang:latest AS build
ENV NOMS_SRC=$GOPATH/src/github.com/attic-labs/noms
ENV CGO_ENABLED=0
ENV GOOS=linux
ENV NOMS_VERSION_NEXT=1
RUN mkdir -pv $NOMS_SRC
COPY . ${NOMS_SRC}
RUN go test github.com/attic-labs/noms/...
RUN go install -v github.com/attic-labs/noms/cmd/noms
RUN cp $GOPATH/bin/noms /bin/noms
FROM alpine:latest
COPY --from=build /bin/noms /bin/noms
VOLUME /data
EXPOSE 8000
ENV NOMS_VERSION_NEXT=1
ENTRYPOINT [ "noms" ]
CMD ["serve", "/data"]
Generated
-696
View File
@@ -1,696 +0,0 @@
# This file is autogenerated, do not edit; changes may be undone by the next 'dep ensure'.
[[projects]]
digest = "1:08636edd4ac1b095a9689b7a07763aa70e035068e0ff0e9dbfe2b6299b98e498"
name = "cloud.google.com/go"
packages = [
"compute/metadata",
"iam",
"internal",
"internal/optional",
"internal/trace",
"internal/version",
"storage",
]
pruneopts = "UT"
revision = "0ebda48a7f143b1cce9eb37a8c1106ac762a3430"
version = "v0.34.0"
[[projects]]
digest = "1:9f3b30d9f8e0d7040f729b82dcbc8f0dead820a133b3147ce355fc451f32d761"
name = "github.com/BurntSushi/toml"
packages = ["."]
pruneopts = "UT"
revision = "3012a1dbe2e4bd1391d42b32f0577cb7bbc7f005"
version = "v0.3.1"
[[projects]]
digest = "1:e92f5581902c345eb4ceffdcd4a854fb8f73cf436d47d837d1ec98ef1fe0a214"
name = "github.com/StackExchange/wmi"
packages = ["."]
pruneopts = "UT"
revision = "5d049714c4a64225c3c79a7cf7d02f7fb5b96338"
version = "1.0.0"
[[projects]]
branch = "master"
digest = "1:315c5f2f60c76d89b871c73f9bd5fe689cad96597afd50fb9992228ef80bdd34"
name = "github.com/alecthomas/template"
packages = [
".",
"parse",
]
pruneopts = "UT"
revision = "a0175ee3bccc567396460bf5acd36800cb10c49c"
[[projects]]
branch = "master"
digest = "1:c198fdc381e898e8fb62b8eb62758195091c313ad18e52a3067366e1dda2fb3c"
name = "github.com/alecthomas/units"
packages = ["."]
pruneopts = "UT"
revision = "2efee857e7cfd4f3d0138cc3cbb1b4966962b93a"
[[projects]]
branch = "master"
digest = "1:f6e569e4a0c5d9c7fab4a9613cf55ac0e2160c17cc1eae1c96b78b842619c64a"
name = "github.com/attic-labs/graphql"
packages = [
".",
"gqlerrors",
"language/ast",
"language/kinds",
"language/lexer",
"language/location",
"language/parser",
"language/printer",
"language/source",
"language/typeInfo",
"language/visitor",
]
pruneopts = "UT"
revision = "917f92ca24a759a0e3bfd1b135850f9b0c04682e"
[[projects]]
branch = "master"
digest = "1:c34bc967eedd84e16c16905429ad84c0de1355c0d16126b35b0eca8eb6581056"
name = "github.com/attic-labs/kingpin"
packages = ["."]
pruneopts = "UT"
revision = "442efcfac769eef3072317c696afe5861c6f7a15"
[[projects]]
digest = "1:7dd0b657dc55cd8ff3e6adfa2f74056d6b840e11897c3be7384ef3e294bf0241"
name = "github.com/aws/aws-sdk-go"
packages = [
"aws",
"aws/awserr",
"aws/awsutil",
"aws/client",
"aws/client/metadata",
"aws/corehandlers",
"aws/credentials",
"aws/credentials/ec2rolecreds",
"aws/credentials/endpointcreds",
"aws/credentials/processcreds",
"aws/credentials/stscreds",
"aws/crr",
"aws/csm",
"aws/defaults",
"aws/ec2metadata",
"aws/endpoints",
"aws/request",
"aws/session",
"aws/signer/v4",
"internal/ini",
"internal/s3err",
"internal/sdkio",
"internal/sdkrand",
"internal/sdkuri",
"internal/shareddefaults",
"private/protocol",
"private/protocol/eventstream",
"private/protocol/eventstream/eventstreamapi",
"private/protocol/json/jsonutil",
"private/protocol/jsonrpc",
"private/protocol/query",
"private/protocol/query/queryutil",
"private/protocol/rest",
"private/protocol/restxml",
"private/protocol/xml/xmlutil",
"service/dynamodb",
"service/s3",
"service/sts",
]
pruneopts = "UT"
revision = "62936e15518acb527a1a9cb4a39d96d94d0fd9a2"
version = "v1.16.15"
[[projects]]
digest = "1:8e18047715934056ed06c61d4d512ffd80787671d95bb1808cebb56adac56d34"
name = "github.com/clbanning/mxj"
packages = ["."]
pruneopts = "UT"
revision = "79cfe7d36986ce108bd1e2c1d0a2a85c895237a2"
version = "v1.8.3"
[[projects]]
branch = "master"
digest = "1:3651a7691180a385540fadaf7ebcc8708c061d3b1a9777312a23f7ba10ff6025"
name = "github.com/codahale/blake2"
packages = ["."]
pruneopts = "UT"
revision = "8d10d0420cbfbdc9c1164c0c4ad3457a6c3771b9"
[[projects]]
digest = "1:ffe9824d294da03b391f44e1ae8281281b4afc1bdaa9588c9097785e3af10cec"
name = "github.com/davecgh/go-spew"
packages = ["spew"]
pruneopts = "UT"
revision = "8991bc29aa16c548c550c7ff78260e27b9ab7c73"
version = "v1.1.1"
[[projects]]
digest = "1:6f9339c912bbdda81302633ad7e99a28dfa5a639c864061f1929510a9a64aa74"
name = "github.com/dustin/go-humanize"
packages = ["."]
pruneopts = "UT"
revision = "9f541cc9db5d55bce703bd99987c9d5cb8eea45e"
version = "v1.0.0"
[[projects]]
digest = "1:edb569dd02419a41ddd98768cc0e7aec922ef19dae139731e5ca750afcf6f4c5"
name = "github.com/edsrzf/mmap-go"
packages = ["."]
pruneopts = "UT"
revision = "188cc3b666ba704534fa4f96e9e61f21f1e1ba7c"
version = "v1.0.0"
[[projects]]
digest = "1:c96d16a4451e48e2c44b2c3531fd8ec9248d822637f1911a88959ca0bcae4a64"
name = "github.com/go-ole/go-ole"
packages = [
".",
"oleutil",
]
pruneopts = "UT"
revision = "39dc8486bd0952279431257138bc428275b86797"
version = "v1.2.2"
[[projects]]
digest = "1:5d1b5a25486fc7d4e133646d834f6fca7ba1cef9903d40e7aa786c41b89e9e91"
name = "github.com/golang/protobuf"
packages = [
"proto",
"protoc-gen-go/descriptor",
"ptypes",
"ptypes/any",
"ptypes/duration",
"ptypes/timestamp",
]
pruneopts = "UT"
revision = "aa810b61a9c79d51363740d207bb46cf8e620ed5"
version = "v1.2.0"
[[projects]]
branch = "master"
digest = "1:4a0c6bb4805508a6287675fac876be2ac1182539ca8a32468d8128882e9d5009"
name = "github.com/golang/snappy"
packages = ["."]
pruneopts = "UT"
revision = "2e65f85255dbc3072edf28d6b5b8efc472979f5a"
[[projects]]
digest = "1:236d7e1bdb50d8f68559af37dbcf9d142d56b431c9b2176d41e2a009b664cda8"
name = "github.com/google/uuid"
packages = ["."]
pruneopts = "UT"
revision = "9b3b1e0f5f99ae461456d768e7d301a7acdaa2d8"
version = "v1.1.0"
[[projects]]
digest = "1:cd9864c6366515827a759931746738ede6079faa08df9c584596370d6add135c"
name = "github.com/googleapis/gax-go"
packages = [
".",
"v2",
]
pruneopts = "UT"
revision = "c8a15bac9b9fe955bd9f900272f9a306465d28cf"
version = "v2.0.3"
[[projects]]
digest = "1:b934afd6ff135f3f1c9a5c573247aaa7c81d9653fb52f59b802ebc7ab809b79f"
name = "github.com/hanwen/go-fuse"
packages = [
"fuse",
"fuse/nodefs",
"fuse/pathfs",
"splice",
]
pruneopts = "UT"
revision = "5690be47d614355a22931c129e1075c25a62e9ac"
version = "v20170619"
[[projects]]
digest = "1:d2d38625c95af7eb19435356d15af129e11869ceaff17150cda8d28e3b25bb8d"
name = "github.com/ipfs/go-ipfs"
packages = [
".",
"core",
"core/coreapi/interface",
"core/coreapi/interface/options",
"dagutils",
"exchange/reprovide",
"filestore",
"filestore/pb",
"fuse/mount",
"keystore",
"namesys",
"namesys/opts",
"namesys/republisher",
"p2p",
"pin",
"pin/internal/pb",
"repo",
"thirdparty/cidv0v1",
"thirdparty/math2",
"thirdparty/verifbs",
]
pruneopts = "UT"
revision = "aefc746f34e5ffdee5fba1915c6603b65a0ebf81"
version = "v0.4.18"
[[projects]]
digest = "1:fa3a7dc0a780fb19838fb96d4e18c0b9a019d9bb618798308d7b6ca48fcb9876"
name = "github.com/jbenet/go-base58"
packages = ["."]
pruneopts = "UT"
revision = "6237cf65f3a6f7111cd8a42be3590df99a66bc7d"
version = "1.0.0"
[[projects]]
digest = "1:bb81097a5b62634f3e9fec1014657855610c82d19b9a40c17612e32651e35dca"
name = "github.com/jmespath/go-jmespath"
packages = ["."]
pruneopts = "UT"
revision = "c2b33e84"
[[projects]]
digest = "1:b6bbd2f9e0724bd81890c8644259f920c6d61c08453978faff0bebd25f3e7d3e"
name = "github.com/jpillora/backoff"
packages = ["."]
pruneopts = "UT"
revision = "8eab2debe79d12b7bd3d10653910df25fa9552ba"
version = "1.0.0"
[[projects]]
digest = "1:114ecad51af93a73ae6781fd0d0bc28e52b433c852b84ab4b4c109c15e6c6b6d"
name = "github.com/jroimartin/gocui"
packages = ["."]
pruneopts = "UT"
revision = "c055c87ae801372cd74a0839b972db4f7697ae5f"
version = "v0.4.0"
[[projects]]
branch = "master"
digest = "1:a34ff13c37101cd363a34dec05ef3ca896c91162cc7e612d9e4768caba9910b3"
name = "github.com/juju/fslock"
packages = ["."]
pruneopts = "UT"
revision = "4d5c94c67b4b207e1ab4ebca6b4e47f174618b86"
[[projects]]
branch = "master"
digest = "1:b8d72d48e77c5a93e09f82d57cd05a30c302ff0835388b0b7745f4f9cf3e0652"
name = "github.com/juju/gnuflag"
packages = ["."]
pruneopts = "UT"
revision = "2ce1bb71843d6d179b3f1c1c9cb4a72cd067fc65"
[[projects]]
digest = "1:f97285a3b0a496dcf8801072622230d513f69175665d94de60eb042d03387f6c"
name = "github.com/julienschmidt/httprouter"
packages = ["."]
pruneopts = "UT"
revision = "348b672cd90d8190f8240323e372ecd1e66b59dc"
version = "v1.2.0"
[[projects]]
branch = "master"
digest = "1:975079ef1a4b94c23122af1c18891ef9518b47f9fa30e8905b34802c5d7c7adc"
name = "github.com/kch42/buzhash"
packages = ["."]
pruneopts = "UT"
revision = "9bdec3dec7c611fa97beadc374d75bdf02cd880e"
[[projects]]
branch = "add-daylon"
digest = "1:2d0f8845c6bb182b7f2d6d5d9f6d2e80569412d19a0470c92183f23adf8aa175"
name = "github.com/liquidata-inc/ld"
packages = [
"go/libraries/ldio",
"go/libraries/ldset",
"go/libraries/textdb",
]
pruneopts = "UT"
revision = "5276363e6eea62b858a43d872b41969e2fbee0f3"
[[projects]]
digest = "1:c658e84ad3916da105a761660dcaeb01e63416c8ec7bc62256a9b411a05fcd67"
name = "github.com/mattn/go-colorable"
packages = ["."]
pruneopts = "UT"
revision = "167de6bfdfba052fa6b2d3664c8f5272e23c9072"
version = "v0.0.9"
[[projects]]
digest = "1:0981502f9816113c9c8c4ac301583841855c8cf4da8c72f696b3ebedf6d0e4e5"
name = "github.com/mattn/go-isatty"
packages = ["."]
pruneopts = "UT"
revision = "6ca4dbf54d38eea1a992b3c722a76a5d1c4cb25c"
version = "v0.0.4"
[[projects]]
digest = "1:0356f3312c9bd1cbeda81505b7fd437501d8e778ab66998ef69f00d7f9b3a0d7"
name = "github.com/mattn/go-runewidth"
packages = ["."]
pruneopts = "UT"
revision = "3ee7d812e62a0804a7d0a324e0249ca2db3476d3"
version = "v0.0.4"
[[projects]]
branch = "master"
digest = "1:2b32af4d2a529083275afc192d1067d8126b578c7a9613b26600e4df9c735155"
name = "github.com/mgutz/ansi"
packages = ["."]
pruneopts = "UT"
revision = "9520e82c474b0a04dd04f8a40959027271bab992"
[[projects]]
branch = "master"
digest = "1:f3fc7efada7606d5abc88372e1f838ed897fa522077957070fbc2207a50d6faa"
name = "github.com/nsf/termbox-go"
packages = ["."]
pruneopts = "UT"
revision = "0938b5187e61bb8c4dcac2b0a9cf4047d83784fc"
[[projects]]
digest = "1:0028cb19b2e4c3112225cd871870f2d9cf49b9b4276531f03438a88e94be86fe"
name = "github.com/pmezard/go-difflib"
packages = ["difflib"]
pruneopts = "UT"
revision = "792786c7400a136282c1664665ae0a8db921c6c2"
version = "v1.0.0"
[[projects]]
digest = "1:5331094ce2c687a921af5ec1367fe96e894e5b6866c2c3b8d415e86b65e69bce"
name = "github.com/shirou/gopsutil"
packages = [
"cpu",
"disk",
"host",
"internal/common",
"mem",
"net",
"process",
]
pruneopts = "UT"
revision = "ccc1c1016bc5d10e803189ee43417c50cdde7f1b"
version = "v2.18.12"
[[projects]]
branch = "master"
digest = "1:99c6a6dab47067c9b898e8c8b13d130c6ab4ffbcc4b7cc6236c2cd0b1e344f5b"
name = "github.com/shirou/w32"
packages = ["."]
pruneopts = "UT"
revision = "bb4de0191aa41b5507caa14b0650cdbddcd9280b"
[[projects]]
branch = "master"
digest = "1:e564a9e23c65422754afbc07ec84252048a83b5c9f0a2e76a761cd35472216e5"
name = "github.com/skratchdot/open-golang"
packages = ["open"]
pruneopts = "UT"
revision = "a2dfa6d0dab6634ecf39251031a3d52db73b5c7e"
[[projects]]
digest = "1:8ff03ccc603abb0d7cce94d34b613f5f6251a9e1931eba1a3f9888a9029b055c"
name = "github.com/stretchr/testify"
packages = [
"assert",
"require",
"suite",
]
pruneopts = "UT"
revision = "ffdc059bfe9ce6a4e144ba849dbedead332c6053"
version = "v1.3.0"
[[projects]]
branch = "master"
digest = "1:685fdfea42d825ebd39ee0994354b46c374cf2c2b2d97a41a8dee1807c6a9b62"
name = "github.com/syndtr/goleveldb"
packages = [
"leveldb",
"leveldb/cache",
"leveldb/comparer",
"leveldb/errors",
"leveldb/filter",
"leveldb/iterator",
"leveldb/journal",
"leveldb/memdb",
"leveldb/opt",
"leveldb/storage",
"leveldb/table",
"leveldb/util",
]
pruneopts = "UT"
revision = "b001fa50d6b27f3f0bb175a87d0cb55426d0a0ae"
[[projects]]
digest = "1:3b5a3bc35810830ded5e26ef9516e933083a2380d8e57371fdfde3c70d7c6952"
name = "go.opencensus.io"
packages = [
".",
"exemplar",
"internal",
"internal/tagencoding",
"plugin/ochttp",
"plugin/ochttp/propagation/b3",
"stats",
"stats/internal",
"stats/view",
"tag",
"trace",
"trace/internal",
"trace/propagation",
"trace/tracestate",
]
pruneopts = "UT"
revision = "b7bf3cdb64150a8c8c53b769fdeb2ba581bd4d4b"
version = "v0.18.0"
[[projects]]
branch = "master"
digest = "1:0303de617dda42e24d7c55ce621bfeb982320396bcacbc5e22966f3552205808"
name = "golang.org/x/net"
packages = [
"context",
"context/ctxhttp",
"html",
"html/atom",
"http/httpguts",
"http2",
"http2/hpack",
"idna",
"internal/timeseries",
"trace",
]
pruneopts = "UT"
revision = "45ffb0cd1ba084b73e26dee67e667e1be5acce83"
[[projects]]
branch = "master"
digest = "1:23443edff0740e348959763085df98400dcfd85528d7771bb0ce9f5f2754ff4a"
name = "golang.org/x/oauth2"
packages = [
".",
"google",
"internal",
"jws",
"jwt",
]
pruneopts = "UT"
revision = "d668ce993890a79bda886613ee587a69dd5da7a6"
[[projects]]
branch = "master"
digest = "1:c4f2af053602f247b8625846cd88dfbf9295e3d02e82c58c27ebe3be06bef80c"
name = "golang.org/x/sys"
packages = [
"unix",
"windows",
]
pruneopts = "UT"
revision = "20be8e55dc7b4b7a1b1660728164a8509d8c9209"
[[projects]]
digest = "1:a2ab62866c75542dd18d2b069fec854577a20211d7c0ea6ae746072a1dccdd18"
name = "golang.org/x/text"
packages = [
"collate",
"collate/build",
"internal/colltab",
"internal/gen",
"internal/tag",
"internal/triegen",
"internal/ucd",
"language",
"secure/bidirule",
"transform",
"unicode/bidi",
"unicode/cldr",
"unicode/norm",
"unicode/rangetable",
]
pruneopts = "UT"
revision = "f21a4dfb5e38f5895301dc265a8def02365cc3d0"
version = "v0.3.0"
[[projects]]
digest = "1:768c35ec83dd17029060ea581d6ca9fdcaef473ec87e93e4bb750949035f6070"
name = "google.golang.org/api"
packages = [
"gensupport",
"googleapi",
"googleapi/internal/uritemplates",
"googleapi/transport",
"internal",
"iterator",
"option",
"storage/v1",
"transport/http",
"transport/http/internal/propagation",
]
pruneopts = "UT"
revision = "19e022d8cf43ce81f046bae8cc18c5397cc7732f"
version = "v0.1.0"
[[projects]]
digest = "1:fa026a5c59bd2df343ec4a3538e6288dcf4e2ec5281d743ae82c120affe6926a"
name = "google.golang.org/appengine"
packages = [
".",
"internal",
"internal/app_identity",
"internal/base",
"internal/datastore",
"internal/log",
"internal/modules",
"internal/remote_api",
"internal/urlfetch",
"urlfetch",
]
pruneopts = "UT"
revision = "e9657d882bb81064595ca3b56cbe2546bbabf7b1"
version = "v1.4.0"
[[projects]]
branch = "master"
digest = "1:a7d48ca460ca1b4f6ccd8c95502443afa05df88aee84de7dbeb667a8754e8fa6"
name = "google.golang.org/genproto"
packages = [
"googleapis/api/annotations",
"googleapis/iam/v1",
"googleapis/rpc/code",
"googleapis/rpc/status",
]
pruneopts = "UT"
revision = "bd9b4fb69e2ffd37621a6caa54dcbead29b546f2"
[[projects]]
digest = "1:9edd250a3c46675d0679d87540b30c9ed253b19bd1fd1af08f4f5fb3c79fc487"
name = "google.golang.org/grpc"
packages = [
".",
"balancer",
"balancer/base",
"balancer/roundrobin",
"binarylog/grpc_binarylog_v1",
"codes",
"connectivity",
"credentials",
"credentials/internal",
"encoding",
"encoding/proto",
"grpclog",
"internal",
"internal/backoff",
"internal/binarylog",
"internal/channelz",
"internal/envconfig",
"internal/grpcrand",
"internal/grpcsync",
"internal/syscall",
"internal/transport",
"keepalive",
"metadata",
"naming",
"peer",
"resolver",
"resolver/dns",
"resolver/passthrough",
"stats",
"status",
"tap",
]
pruneopts = "UT"
revision = "df014850f6dee74ba2fc94874043a9f3f75fbfd8"
version = "v1.17.0"
[[projects]]
digest = "1:c06d9e11d955af78ac3bbb26bd02e01d2f61f689e1a3bce2ef6fb683ef8a7f2d"
name = "gopkg.in/alecthomas/kingpin.v2"
packages = ["."]
pruneopts = "UT"
revision = "947dcec5ba9c011838740e680966fd7087a71d0d"
version = "v2.2.6"
[solve-meta]
analyzer-name = "dep"
analyzer-version = 1
input-imports = [
"cloud.google.com/go/storage",
"github.com/BurntSushi/toml",
"github.com/attic-labs/graphql",
"github.com/attic-labs/graphql/gqlerrors",
"github.com/attic-labs/kingpin",
"github.com/aws/aws-sdk-go/aws",
"github.com/aws/aws-sdk-go/aws/awserr",
"github.com/aws/aws-sdk-go/aws/credentials",
"github.com/aws/aws-sdk-go/aws/session",
"github.com/aws/aws-sdk-go/service/dynamodb",
"github.com/aws/aws-sdk-go/service/s3",
"github.com/clbanning/mxj",
"github.com/codahale/blake2",
"github.com/dustin/go-humanize",
"github.com/edsrzf/mmap-go",
"github.com/golang/snappy",
"github.com/google/uuid",
"github.com/hanwen/go-fuse/fuse",
"github.com/hanwen/go-fuse/fuse/nodefs",
"github.com/hanwen/go-fuse/fuse/pathfs",
"github.com/ipfs/go-ipfs/core",
"github.com/jbenet/go-base58",
"github.com/jpillora/backoff",
"github.com/jroimartin/gocui",
"github.com/juju/fslock",
"github.com/juju/gnuflag",
"github.com/julienschmidt/httprouter",
"github.com/kch42/buzhash",
"github.com/liquidata-inc/ld/go/libraries/ldio",
"github.com/liquidata-inc/ld/go/libraries/textdb",
"github.com/mattn/go-isatty",
"github.com/mgutz/ansi",
"github.com/shirou/gopsutil/cpu",
"github.com/shirou/gopsutil/disk",
"github.com/shirou/gopsutil/host",
"github.com/shirou/gopsutil/mem",
"github.com/skratchdot/open-golang/open",
"github.com/stretchr/testify/assert",
"github.com/stretchr/testify/suite",
"github.com/syndtr/goleveldb/leveldb",
"github.com/syndtr/goleveldb/leveldb/iterator",
"github.com/syndtr/goleveldb/leveldb/opt",
"github.com/syndtr/goleveldb/leveldb/util",
"golang.org/x/net/context",
"golang.org/x/net/html",
"golang.org/x/oauth2",
"google.golang.org/api/googleapi",
"gopkg.in/alecthomas/kingpin.v2",
]
solver-name = "gps-cdcl"
solver-version = 1
-158
View File
@@ -1,158 +0,0 @@
# Gopkg.toml example
#
# Refer to https://golang.github.io/dep/docs/Gopkg.toml.html
# for detailed Gopkg.toml documentation.
#
# required = ["github.com/user/thing/cmd/thing"]
# ignored = ["github.com/user/project/pkgX", "bitbucket.org/user/project/pkgA/pkgY"]
#
# [[constraint]]
# name = "github.com/user/project"
# version = "1.0.0"
#
# [[constraint]]
# name = "github.com/user/project2"
# branch = "dev"
# source = "github.com/myfork/project2"
#
# [[override]]
# name = "github.com/x/y"
# version = "2.4.0"
#
# [prune]
# non-go = false
# go-tests = true
# unused-packages = true
[[constraint]]
name = "cloud.google.com/go"
version = "0.34.0"
[[constraint]]
name = "github.com/BurntSushi/toml"
version = "0.3.1"
[[override]]
name = "github.com/attic-labs/graphql"
branch = "master"
[[constraint]]
name = "github.com/attic-labs/kingpin"
branch = "master"
[[constraint]]
name = "github.com/aws/aws-sdk-go"
version = "1.16.15"
[[constraint]]
name = "github.com/clbanning/mxj"
version = "1.8.3"
[[constraint]]
branch = "master"
name = "github.com/codahale/blake2"
[[constraint]]
name = "github.com/dustin/go-humanize"
version = "1.0.0"
[[constraint]]
name = "github.com/edsrzf/mmap-go"
version = "1.0.0"
[[constraint]]
branch = "master"
name = "github.com/golang/snappy"
[[constraint]]
name = "github.com/google/uuid"
version = "1.1.0"
[[constraint]]
name = "github.com/hanwen/go-fuse"
version = "20170619.0.0"
[[constraint]]
name = "github.com/ipfs/go-ipfs"
version = "0.4.18"
[[constraint]]
name = "github.com/jbenet/go-base58"
version = "1.0.0"
[[constraint]]
name = "github.com/jpillora/backoff"
version = "1.0.0"
[[constraint]]
name = "github.com/jroimartin/gocui"
version = "0.4.0"
[[constraint]]
branch = "master"
name = "github.com/juju/fslock"
[[constraint]]
branch = "master"
name = "github.com/juju/gnuflag"
[[constraint]]
name = "github.com/julienschmidt/httprouter"
version = "1.2.0"
[[constraint]]
branch = "master"
name = "github.com/kch42/buzhash"
[[constraint]]
branch = "add-daylon"
name = "github.com/liquidata-inc/ld"
[[constraint]]
name = "github.com/mattn/go-isatty"
version = "0.0.4"
[[constraint]]
branch = "master"
name = "github.com/mgutz/ansi"
[[constraint]]
name = "github.com/shirou/gopsutil"
version = "2.18.12"
[[constraint]]
branch = "master"
name = "github.com/skratchdot/open-golang"
[[constraint]]
name = "github.com/stretchr/testify"
version = "1.3.0"
[[constraint]]
branch = "master"
name = "github.com/syndtr/goleveldb"
[[constraint]]
branch = "master"
name = "golang.org/x/net"
[[constraint]]
branch = "master"
name = "golang.org/x/oauth2"
[[constraint]]
branch = "master"
name = "golang.org/x/sys"
[[constraint]]
name = "google.golang.org/api"
version = "0.1.0"
[[constraint]]
name = "gopkg.in/alecthomas/kingpin.v2"
version = "2.2.6"
[prune]
go-tests = true
unused-packages = true
-135
View File
@@ -1,135 +0,0 @@
<img src='doc/nommy_cropped_smaller.png' width='350' title='Nommy, the snacky otter'>
[Use Cases](#use-cases)&nbsp; | &nbsp;[Setup](#setup)&nbsp; | &nbsp;[Status](#status)&nbsp; | &nbsp;[Documentation](./doc/intro.md)&nbsp; | &nbsp;[Contact](#contact-us)
<br><br>
[![Docker Build Status](https://img.shields.io/docker/build/noms/noms.svg)](https://hub.docker.com/r/noms/noms/)
[![GoDoc](https://godoc.org/github.com/attic-labs/noms?status.svg)](https://godoc.org/github.com/attic-labs/noms)
# Welcome
*Noms* is a decentralized database philosophically descendant from the Git version control system.
Like Git, Noms is:
* **Versioned:** By default, all previous versions of the database are retained. You can trivially track how the database evolved to its current state, easily and efficiently compare any two versions, or even rewind and branch from any previous version.
* **Synchronizable:** Instances of a single Noms database can be disconnected from each other for any amount of time, then later reconcile their changes efficiently and correctly.
Unlike Git, Noms is a database, so it also:
* Primarily **stores structured data**, not files and directories (see: [the Noms type system](https://github.com/attic-labs/noms/blob/master/doc/intro.md#types))
* **Scales well** to large amounts of data and concurrent clients
* Supports **atomic transactions** (a single instance of Noms is CP, but Noms is typically run in production backed by S3, in which case it is "[effectively CA](https://cloud.google.com/spanner/docs/whitepapers/SpannerAndCap.pdf)")
* Supports **efficient indexes** (see: [Noms prolly-trees](https://github.com/attic-labs/noms/blob/master/doc/intro.md#prolly-trees-probabilistic-b-trees))
* Features a **flexible query model** (see: [GraphQL](./go/ngql/README.md))
A Noms database can reside within a file system or in the cloud:
* The (built-in) [NBS](./go/nbs) `ChunkStore` implementation provides two back-ends which provide persistence for Noms databases: one for storage in a file system and one for storage in an S3 bucket.
Finally, because Noms is content-addressed, it yields a very pleasant programming model.
Working with Noms is ***declarative***. You don't `INSERT` new data, `UPDATE` existing data, or `DELETE` old data. You simply *declare* what the data ought to be right now. If you commit the same data twice, it will be deduplicated because of content-addressing. If you commit _almost_ the same data, only the part that is different will be written.
<br>
## Use Cases
#### [Decentralization](./doc/decent/about.md)
Because Noms is very good at sync, it makes a decent basis for rich, collaborative, fully-decentralized applications.
#### ClientDB (coming someday)
Embed Noms into mobile applications, making it easier to build offline-first, fully synchronizing mobile applications.
<br>
## Setup
```shell
# You probably want to add this to your environment
export NOMS_VERSION_NEXT=1
go get github.com/attic-labs/noms/cmd/noms
go install github.com/attic-labs/noms/cmd/noms
```
<br>
## Run
Import some data:
```shell
go install github.com/attic-labs/noms/samples/go/csv/csv-import
curl 'https://data.cityofnewyork.us/api/views/kku6-nxdu/rows.csv?accessType=DOWNLOAD' > /tmp/data.csv
csv-import /tmp/data.csv /tmp/noms::nycdemo
```
Explore:
```shell
noms show /tmp/noms::nycdemo
```
Should show:
```go
struct Commit {
meta: struct Meta {
date: "2017-09-19T19:33:01Z",
inputFile: "/tmp/data.csv",
},
parents: set {},
value: [ // 236 items
struct Row {
countAmericanIndian: "0",
countAsianNonHispanic: "3",
countBlackNonHispanic: "21",
countCitizenStatusTotal: "44",
countCitizenStatusUnknown: "0",
countEthnicityTotal: "44",
...
```
<br>
## Status
### Data Format
We are fairly confident in the core data format, and plan to support Noms database [version `7`](https://github.com/attic-labs/noms/blob/v7/go/constants/version.go#L9) and forward. If you create a database with Noms today, future versions will have migration tools to pull your databases forward.
### Roadmap
We plan to implement the following for Noms version 8:
- [x] Horizontal scalability (Done! See: [nbs](./go/nbs/README.md))
- [x] Automatic merge (Done! See: [CommitOptions.Policy](https://godoc.org/github.com/attic-labs/noms/go/datas#CommitOptions) and the `noms merge` subcommand).
- [x] Query language (Done! See [ngql](./go/ngql/README.md))
- [ ] Garbage Collection (https://github.com/attic-labs/noms/issues/3374)
- [x] Optional fields (https://github.com/attic-labs/noms/issues/2327)
- [ ] Implement migration (https://github.com/attic-labs/noms/issues/3363)
- [ ] Fix sync performance with long commit chains (https://github.com/attic-labs/noms/issues/2233)
- [ ] [Various other smaller bugs and improvements](https://github.com/attic-labs/noms/issues?q=is%3Aissue+is%3Aopen+label%3AP0)
<br>
## Learn More About Noms
For the decentralized web: [The Decentralized Database](doc/decent/about.md)
Learn the basics: [Technical Overview](doc/intro.md)
Tour the CLI: [Command-Line Interface Tour](doc/cli-tour.md)
Tour the Go API: [Go SDK Tour](doc/go-tour.md)
<br>
## Contact Us
Interested in using Noms? Awesome! We would be happy to work with you to help understand whether Noms is a fit for your problem. Reach out at:
- [Mailing List](https://groups.google.com/forum/#!forum/nomsdb)
- [Twitter](https://twitter.com/nomsdb)
-52
View File
@@ -1,52 +0,0 @@
codecov:
branch: master
bot: "mikegray"
ci:
- "jenkins3.noms.io"
coverage:
precision: 2 # how many decimal places to display in the UI: 0 <= value <= 4
round: down # how coverage is rounded: down/up/nearest
range: 70...100 # custom range of coverage colors from red -> yellow -> green
notify:
slack:
default:
url: "secret:n+BYhIXTXsaCiMKB3vOf6yP68ytdKd3WpXtJFWPEUsEWXDiGnU5dTB5DO2yv8tR0COdxvs7K31hVpEfHEXdoXOaQhUw3FKf3fh8KZDLN7CGTbeDhw1uNGGyBr2d2TWnopzYtcXomdwMmuckARtiWQx0YXJiZY9YyCrIoDK9HIJQ="
branches: null
threshold: 5.0
attachments: "tree, diff"
status:
project:
default:
enabled: yes
target: auto
branches: null
threshold: null
if_no_uploads: error
if_not_found: success
if_ci_failed: error
patch:
default:
enabled: yes
target: auto
branches: null
threshold: null
if_no_uploads: error
if_not_found: success
if_ci_failed: error
changes:
default:
enabled: yes
branches: null
if_no_uploads: error
if_not_found: success
if_ci_failed: error
comment:
layout: "tree"
branches: null
behavior: default
Binary file not shown.

Before

Width:  |  Height:  |  Size: 127 KiB

-161
View File
@@ -1,161 +0,0 @@
[Home](../README.md) »
[Technical Overview](intro.md)&nbsp; | &nbsp;[Use Cases](../README.md#use-cases)&nbsp; | &nbsp;**Command-Line Interface**&nbsp; | &nbsp;[Go bindings Tour](go-tour.md) | &nbsp;[Path Syntax](spelling.md)&nbsp; | &nbsp;[FAQ](faq.md)&nbsp;
<br><br>
# A Short Tour of the Noms CLI
This is a quick introduction to the Noms command-line interface. It should only take a few minutes to read, but there's also a screencast if you prefer:
[<img src="cli-screencast.png" width="500">](https://www.youtube.com/watch?v=NeBsaNdAn68)
## Install Noms
... if you haven't already. Follow the instructions [here](https://github.com/attic-labs/noms#setup).
## The `noms` command
Now you should be able to run `noms`:
```shell
> noms
Noms is a tool for goofing with Noms data.
Usage:
noms command [arguments]
The commands are:
diff Shows the difference between two objects
ds Noms dataset management
log Displays the history of a Noms dataset
serve Serves a Noms database over HTTP
show Shows a serialization of a Noms object
sync Moves datasets between or within databases
version Display noms version
Use "noms help [command]" for more information about a command.
```
Without any arguments, `noms` lists out all available commands. To get information on a specific command, we can use `noms help [command]`:
```shell
> noms help sync
usage: noms sync [options] <source-object> <dest-dataset>
See Spelling Objects at https://github.com/attic-labs/noms/blob/master/doc/spelling.md for details on the object and dataset arguments.
...
```
## noms ds
The `noms ds` command lists the _datasets_ within a particular database:
```shell
> noms ds http://demo.noms.io
...
sf-film-locations/raw
sf-film-locations
...
```
## noms log
Noms datasets are versioned. You can see the history with `log`:
```shell
> !? noms log http://demo.noms.io::sf-film-locations
commit aprsmg0j2eegk8eehbgj7cd3tmmd1be8
Parent: None
Date: "2017-09-19T21:42:46Z"
InputPath: "http://localhost:8000::#dksek6tuf8ens06bi4culq85tfp5q4cg.value"
...
```
Note that Noms is a typed system. What is being shown here for each entry is not text, but a serialization of the diff between two datasets.
## noms show
You can see the entire serialization of any object in the database with `noms show`:
```shell
> noms show 'http://demo.noms.io::#aprsmg0j2eegk8eehbgj7cd3tmmd1be8'
struct Commit {
meta: struct {},
parents: Set<Ref<Cycle<Commit>>>,
value: List<struct Row {
Actor1: String,
Actor2: String,
Actor3: String,
Director: String,
Distributor: String,
FunFacts: String,
Locations: String,
ProductionCompany: String,
ReleaseYear: Number,
Title: String,
Writer: String,
}>,
}({
meta: Meta {
date: "2016-07-25T18:34:00+0000",
inputPath: "http://localhost:8000::sf-film-locations/raw.value",
},
parents: {
c506ta03786j48a07he83ju669u78qa2,
},
value: [ // 1,241 items
Row {
Actor1: "Siddarth",
...
```
## noms sync
You can work with Noms databases that are remote exactly the same as you work with local databases. But it's frequently useful to move data to a local machine, for example, to make a private fork or to work with the data disconnected from the source database.
Moving data in Noms is done with the `sync` command. Note that unlike Git, we do not make a distinction between _push_ and _pull_. It's the same operation in both directions:
```shell
> noms sync http://demo.noms.io::sf-film-locations /tmp/noms::films
> noms ds /tmp/noms
films
```
We can now make an edit locally:
```shell
> go install github.com/attic-labs/noms/samples/go/csv/...
> csv-export /tmp/noms::films > /tmp/film-locations.csv
```
open /tmp/film-location.csv and edit it, then:
```shell
> csv-import --column-types=String,String,String,String,String,String,String,String,Number,String,String \
/tmp/film-locations.csv /tmp/noms::films
```
## noms diff
The `noms diff` command can show you the differences between any two values. Let's see our change:
```shell
> noms diff http://demo.noms.io::sf-film-locations /tmp/noms::films
./.meta {
- "date": "2016-07-25T18:51:23+0000"
+ "date": "2016-07-25T22:51:14+0000"
+ "inputFile": "/tmp/film-locations.csv"
- "inputPath": "http://demo.noms.io::sf-film-locations/raw.value"
./.parents {
- pckdvpvr9br1fie6c3pjudrlthe7na18
+ q4jcc2i7kntkjiipvjgpr5r02ldroj0g
}
./.value[0] {
- "Locations": "Epic Roasthouse (399 Embarcadero)"
+ "Locations": "Epic Roadhouse (399 Embarcadero)"
```
Binary file not shown.

Before

Width:  |  Height:  |  Size: 49 KiB

-77
View File
@@ -1,77 +0,0 @@
[Home](../../README.md) » [Use Cases](../../README.md#use-cases) » **Decentralized** »
**About**&nbsp; | &nbsp;[Quickstart](quickstart.md)&nbsp; | &nbsp;[Architectures](architectures.md)&nbsp; | &nbsp;[P2P Chat Demo](demo-p2p-chat.md)&nbsp; | &nbsp;[IPFS Chat Demo](demo-ipfs-chat.md)
<br><br>
# Noms — The Decentralized Database
[Noms](http://noms.io) makes it ~~easy~~ tractable to create rich,
multiuser, collaborative, fully-decentralized applications.
Like most databases, Noms features a rich data model, atomic
transactions, support for large-scale data, and efficient searches,
scans, reads, and updates.
Unlike any other database, Noms has built-in multiparty sync and
conflict resolution. This feature makes Noms a very good fit for P2P
decentralized applications.
Any number of dapp peers in a P2P network can
concurrently modify the same logical Noms database, and continuously
and efficiently sync their changes with each other. All peers will
converge to the same state.
For many applications, peers can store an entire local copy of the
data they are interested in. For larger applications, it should be
possible to back Noms by a decentralized blockstore like IPFS, Swarm,
or Sia (or in the future, Filecoin), and store large-scale data in a
completely decentralized way, without replicating it on every
node. Noms also has a blockstore for S3, which is ideal for
applications that have some centralized components.
**We'd love to talk to you about the possibility of using noms in your project** so please don't hestitate to contact us at [noms@attic.io](mailto:noms@attic.io).
## How it Works
Think of Noms like a programmable Git: changes are bundled as commits
which reference previous states of the database. Apps pull changes
from peers and merge them using a principled set of APIs and
strategies. Except that rather than users manually pulling and
merging, applications typically do this continuously, automatically
converging to a shared state.
Your application uses a [Go client
library](https://github.com/attic-labs/noms/blob/master/doc/go-tour.md)
to interact with Noms data. There is also a [command-line
interface](https://github.com/attic-labs/noms/blob/master/doc/cli-tour.md)
for working with data and initial support for a [GraphQL-based query
language](https://github.com/attic-labs/noms/blob/master/go/ngql/README.md).
Some additional features include:
* **Versioning**: Its easy to use, compare, or revert to older database versions
* **Efficient diffs**: diffing even huge datasets is efficient due to
noms use of a novel BTree-like data structure called a [Prolly
Tree](https://github.com/attic-labs/noms/blob/master/doc/intro.md#prolly-trees-probabilistic-b-trees)
* **Efficient storage**: data are chunked and content-addressable, so
there is exactly one copy of each chunk in the database, shared by
other data that reference it. Small changes to massive data
structures always result in small operations.
* **Verifiable**: The entire database rolls up to a single 20-byte hash
that uniquely represents the database at that moment - anyone can
verify that a particular database hashes to the same value
Read the [Noms design overview](https://github.com/attic-labs/noms/blob/master/doc/decent/intro.md).
## Status
For overall status of the database, see [Noms Status](../../README.md#status).
For the decentralized use case in particular: we are fairly confident in this approach and are actively looking for partners to work with to build it out.
- [x] Demonstrate core concept of using Noms to continuously sync across many users (Done! See noms-chat demos)
- [ ] Demonstrate using libp2p or similar to traverse NATs
- [ ] Investigate backing IPFS with Noms rather than the reverse - this should improve stability and dramatically improve local performance
- [ ] Demonstrate using IPFS with a schema that permits nodes to disappear
**_If you would like to use noms in your project wed love to hear from you_**:
drop us an email ([noms@attic.io](mailto:noms@attic.io)) or send us a
message in slack ([slack.noms.io](http://slack.noms.io)).
-71
View File
@@ -1,71 +0,0 @@
[Home](../../README.md) » [Use Cases](../../README.md#use-cases) » **Decentralized** »
[About](about.md)&nbsp; | &nbsp;[Quickstart](quickstart.md)&nbsp; | **Architectures**&nbsp; | &nbsp;[P2P Chat Demo](demo-p2p-chat.md)&nbsp; | &nbsp;[IPFS Chat Demo](demo-ipfs-chat.md)
<br><br>
# Architectures
There are many possible ways to use Noms as part of a decentralized application. Noms can naturally be mixed and matched with other decentralized tools like blockchains, IPFS, etc. This page lists a few approaches we find promising.
## Classic P2P Architecture
Noms can be used to implement apps in a peer-to-peer configuration. Each instance of the application (i.e., each "node") maintains a database locally with the data that is relevant to it. When a node creates new data, it commits that data to it's database and broadcasts a message to it's peers that contains the hash of it's lastest commit.
![P2P Architecture](./p2p-arch.png)
Peers that are listening for these message can decide if that data is relevent to them. Those that are interested can pull the new data from the publisher. The two clients efficiently communicate so that only data that isn't present in the requesting client is transmitted (much the same way that one git client sends source changes to another).
Peers can use a flow similar to the following in order to sync changes with one another:
```nohighlight
for {
listen for new message
if new msg is relevant {
if new msg is ancestor of current commit {
// nothing to do
continue
}
pull new data from sender of msg
if current head is ancestor of new msg {
// fast forward to the new commit
set head of dataset to new commit
continue
}
merge new with current head and commit
publish new commit
}
}
```
Noms has a default [merge policy](https://github.com/attic-labs/noms/blob/2d0e9e738370d49cc09e8fa6e290ceca1c3e2005/go/merge/three_way.go#L14) that covers many classes of concurrent operations. If the application restricts itself to only operations that are mergeable by this policy, then Noms can automatically merge all concurrent changes. In this case, the entire database is effectively a CRDT.
If this is not sufficient, then applications can create their own merge policies, implementing whatever merge is appropriate for their use case.
# Decentralized Chunkstore Architecture
Another potential architecture for decentralized apps uses a decentralized chunkstore (such as IPFS, Swarm, or Sia) rather than local databases. In this case, rather than each node maintaining a local datastore, Noms chunks are stored in a decentralized chunkstore. The underlying chunkstore is responsible for making chunks available when needed.
![Decentralized Architecture](./dist-arch.png)
The flow used by peers to sync with one another is similar to the peer-to-peer architecture. The main difference is data is not duplicated on local machines and doesn't have to be pulled during sync. Each app keeps track of it's latest commit in the chunk store.
```nohighlight
for {
listen for new message
if new msg is relevant {
if new msg is ancestor of current commit {
// nothing to do
continue
}
// No pull necessary
if current head is ancestor of new msg {
// fast forward to the new commit
set head of dataset to new commit
continue
}
merge new with current head and commit
publish new commit
}
}
```
We have a prototype implementation of an IPFS-based chunkstore. If you are interested in pursuing this direction, let us know!
-49
View File
@@ -1,49 +0,0 @@
[Home](../../README.md) » [Use Cases](../../README.md#use-cases) » **Decentralized** »
[About](about.md)&nbsp; | &nbsp;[Quickstart](quickstart.md)&nbsp; | &nbsp;[Architectures](architectures.md)&nbsp; | &nbsp;[P2P Chat Demo](demo-p2p-chat.md)&nbsp; | &nbsp;**IPFS Chat Demo**
<br><br>
# Demo App: IPFS-based Decentralized Chat
This sample app demonstrates backing a P2P noms app by a decentralized blockstore (in this case, IPFS). Data is pulled off the network dynamically as needed - each client doesn't need a complete copy.
# Build and Run
Demo app code is in the
[ipfs-chat](https://github.com/attic-labs/noms/tree/master/samples/go/decent/ipfs-chat/)
directory. To get it up and running take the following steps:
* Use git to clone the noms repository onto your computer:
```shell
go get github.com/attic-labs/noms/samples/go/decent/ipfs-chat
```
* From the noms/samples/go/decent/ipfs-chat directory, build the program with the following command:
```shell
go build
```
* Run the ipfs-chat client with the following command:
```shell
./ipfs-chat client --username <aname1> --node-idx=1 ipfs:/tmp/ipfs1::chat >& /tmp/err1
```
* Run a second ipfs-chat client with the following command:
```shell
./ipfs-chat client --username <aname2> --node-idx=2 ipfs:/tmp/ipfs2::chat >& /tmp/err2
```
If desired, ipfs-chat can be run as a daemon which will replicate all
chat content in a local store which will enable clients to go offline
without causing data to become unavailable to other clients:
```shell
./ipfs-chat daemon --node-idx=3 ipfs:/tmp/ipfs3::chat
```
Note: the 'node-idx' argument ensures that each IPFS-based program
uses a distinct set of ports. This is useful when running multiple
IPFS-based programs on the same machine.
-46
View File
@@ -1,46 +0,0 @@
[Home](../../README.md) » [Use Cases](../../README.md#use-cases) » **Decentralized** »
[About](about.md)&nbsp; | &nbsp;[Quickstart](quickstart.md)&nbsp; | &nbsp;[Architectures](architectures.md)&nbsp; | &nbsp;**P2P Chat Demo**&nbsp; | &nbsp;[IPFS Chat Demo](demo-ipfs-chat.md)
<br><br>
# Demo App: P2P Decentralized Chat
This sample demonstrates the simplest possible case of building a p2p app on top of Noms. Each node stores a complete copy of the data it is interested in, and peers find each other using [IPFS pubsub](https://ipfs.io/blog/25-pubsub/).
Currently, nodes have to have a publicly routable IP, but it should be possible to use [libP2P](https://github.com/libp2p) or similar to connect to most nodes.
# Build and Run
Demo app code is in the
[p2p](https://github.com/attic-labs/noms/tree/master/samples/go/decent/p2p-chat)
directory. To get it up and running take the following steps:
* Use git to clone the noms repository onto your computer:
```shell
go get github.com/attic-labs/noms/samples/go/decent/p2p-chat
```
* From the noms/samples/go/decent/p2p-chat directory, build the program with the following command:
```shell
go build
```
* Run the p2p client with the following command:
```shell
mkdir /tmp/noms1
./p2p-chat client --username=<aname1> --node-idx=1 /tmp/noms1 >& /tmp/err1
```
* Run a second p2p client with the following command:
```shell
mkdir /tmp/noms2
./p2p-chat client --username=<aname2> --node-idx=2 /tmp/noms2 >& /tmp/err2
```
Note: the p2p client relies on IPFS for it's pub/sub implementation. The
'node-idx' argument ensures that each IPFS-based node uses a distinct set
of ports. This is useful when running multiple IPFS-based programs on
the same machine.
Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 31 KiB

-133
View File
@@ -1,133 +0,0 @@
[Home](../../README.md) » [Use Cases](../../README.md#use-cases) » **Decentralized** »
[About](about.md)&nbsp; | &nbsp;**Quickstart**&nbsp; | &nbsp;[Architectures](architectures.md)&nbsp; | &nbsp;[P2P Chat Demo](demo-p2p-chat.md)&nbsp; | &nbsp;[IPFS Chat Demo](demo-ipfs-chat.md)
<br><br>
# How to Use Noms in a Decentralized App
If youd like to use noms in your project wed love to hear from you:
drop us an email ([noms@attic.io](mailto:noms@attic.io)) or send us a
message in slack ([slack.noms.io](http://slack.noms.io)).
The steps youll need to take are:
1. Decide how youll model your problem using noms datatypes: boolean,
number, string, blob, map, list, set, structs, ref, and
union. (Note: if you are interested in using CRDTs as an alternative
to classic datatypes please let us know.)
2. Consider...
* How peers will discover each other
* How peers will notify each other of changes
* How and when they will pull changes, and
* What potential there is for conflicting changes. Consider modeling
your problem so that changes commute in order to make merging
easier.
In our [p2p sample](https://github.com/attic-labs/noms/blob/master/doc/decent/demo-p2p-chat.md) application, all peers periodically broadcast their HEAD on a known channel using [IPFS pubsub](https://ipfs.io/blog/25-pubsub/), pull each others' changes immediately, and avoid conflicts by using operations that can be resolved with Noms' built in merge policies.
This is basically the simplest possible approach, but lots of options are possible. For example, an alternate approach for discoverability could be to keep a registry of all participating nodes in a blockchain (e.g., by storing them in an Ethereum smart contract). One could store either the current HEAD of each node (updated whenever the node changes state), or just an IPNS name that the node is writing to.
As an example of changes that commute consider modeling a stream
of chat messages. Appending messages from both parties to a list
is not commutative; the result depends on the order in which
messages are added to the list. An example of a commutative
strategy is adding the messages to a `Map` keyed by
`Struct{sender, ordinal}`: the resulting `Map` is the same no
matter what order messages are added.
3. Vendor the code into your project.
4. Set `NOMS_VERSION_NEXT=1` in your environment.
5. Decide which type of storage you'd like to use: memory (convenient for playing around), disk, IPFS, or S3. (If you want to implement a store on top of another type of storage that's possible too; email us or reach out on slack and we can help.)
6. Set up and instantiate a database for your storage. Generally, you use the spec package to parse a [dataset spec](https://github.com/attic-labs/noms/blob/master/doc/spelling.md) like `mem::mydataset` which you can then ask for [`Database`](https://github.com/attic-labs/noms/blob/master/go/datas/database.go) and [`Dataset`](https://github.com/attic-labs/noms/blob/master/go/datas/dataset.go).
* **Memory**: no setup required, just instantiate it:
```go
sp := spec.ForDataset("mem::test") // Dataset name is "test"
```
* **Disk**: identify a directory for storage, say `/path/to/chunks`, and then instantiate:
```go
sp := spec.ForDataset("/path/to/chunks::test") // Dataset name is "test"
```
* **IPFS**: identify an IPFS node by directory. If an IPFS node doesn't exist at that directory, one will be created:
```go
sp := spec.ForDataset("ipfs:/path/to/ipfs_repo::test") // Dataset name is "test"
```
* **S3**: Follow the [S3 setup instructions](https://github.com/attic-labs/noms/blob/master/go/nbs/NBS-on-AWS.md) then instantiate a database and dataset:
```go
sess := session.Must(session.NewSession(aws.NewConfig().WithRegion("us-west-2")))
store := nbs.NewAWSStore("dynamo-table", "store-name", "s3-bucket", s3.New(sess), dynamodb.New(sess), 1<<28))
database := datas.NewDatabase(store)
dataset := database.GetDataset("aws://dynamo-table:s3-bucket/store-name::test") // Dataset name is "test"
```
7. Implement using the [Go API](https://github.com/attic-labs/noms/blob/master/doc/go-tour.md). If you're just playing around you could try something like this:
```go
package main
import (
"fmt"
"os"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/types"
)
// Usage: quickstart /path/to/store::ds
func main() {
sp, err := spec.ForDataset(os.Args[1])
if err != nil {
fmt.Fprintf(os.Stderr, "Unable to parse spec: %s, error: %s\n", sp, err)
os.Exit(1)
}
defer sp.Close()
db := sp.GetDatabase()
if headValue, ok := sp.GetDataset().MaybeHeadValue(); !ok {
data := types.NewList(sp.GetDatabase(),
newPerson("Rickon", true),
newPerson("Bran", true),
newPerson("Arya", false),
newPerson("Sansa", false),
)
fmt.Fprintf(os.Stdout, "data type: %v\n", types.TypeOf(data).Describe())
_, err = db.CommitValue(sp.GetDataset(), data)
if err != nil {
fmt.Fprint(os.Stderr, "Error commiting: %s\n", err)
os.Exit(1)
}
} else {
// type assertion to convert Head to List
personList := headValue.(types.List)
// type assertion to convert List Value to Struct
personStruct := personList.Get(0).(types.Struct)
// prints: Rickon
fmt.Fprintf(os.Stdout, "given: %v\n", personStruct.Get("given"))
}
}
func newPerson(givenName string, male bool) types.Struct {
return types.NewStruct("Person", types.StructData{
"given": types.String(givenName),
"male": types.Bool(male),
})
}
```
8. You can inspect data that you've committed via the [noms command-line interface](https://github.com/attic-labs/noms/blob/master/doc/cli-tour.md). For example:
```shell
noms log /path/to/store::ds
noms show /path/to/store::ds
```
> Note that Memory tables won't be inspectable because they exist only in the memory of the process that created them.
9. Implement pull and merge. The [pull API](../../go/datas/pull.go) is used pull changes from a peer and the [merge API](../../go/merge/) is used to merge changes before commit. There's an [example of merging in the IPFS-based-chat sample
app](https://github.com/attic-labs/noms/blob/master/samples/go/ipfs-chat/pubsub.go).
-66
View File
@@ -1,66 +0,0 @@
[Home](../README.md) »
[Technical Overview](intro.md)&nbsp; | &nbsp;[Use Cases](../README.md#use-cases)&nbsp; | &nbsp;[Command-Line Interface](cli-tour.md)&nbsp; | &nbsp;[Go bindings Tour](go-tour.md) | &nbsp;[Path Syntax](spelling.md)&nbsp; | &nbsp;**FAQ**&nbsp;
<br><br>
# Frequently Asked Questions
### Decentralized like BitTorrent?
No, decentralized like Git.
Specifically, Noms isn't itself a peer-to-peer network. If you can get two instances to share data, somehow, then they can synchronize. Noms doesn't define how this should happen though.
Currently, instances mainly share data via either HTTP/DNS or a filesystem. But it should be easy to add other mechanisms. For example, it seems like Noms could run well on top of BitTorrent, or IPFS. You should [look into it](https://github.com/attic-labs/noms/issues/2123).
### Isn't it wasteful to store every version?
Noms deduplicates chunks of data that are identical within one database. So if multiple versions of one dataset share a lot of data, or if the same data is present in multiple datasets, Noms only stores one copy.
That said, it is definitely possible to have write patterns that defeat this. Deduplication is done at the chunk level, and chunks are currently set to an average size of 4KB. So if you change about 1 byte in every 4096 in a single commit, and those changed bytes are well-distributed throughout the dataset, then we will end up making a complete copy of the dataset.
### Is there a way to not store the entire history?
Theoretically, definitely. In Git, for example, the concept of "shallow clones" exists, and we could do something similar in Noms. This has not been implemented yet.
### How does Noms handle conflicts?
Noms provides several built-in policies that can automatically merge common cases of conflicts. For example concurrent edits to sets are always mergeable and concurrent edits to different keys in a map or struct are also mergeable.
The conflict resolution system is pluggable so new policies that are application-specific can be added. However, it's possible to build surprisingly complex applications with just the built-in policies.
### Why don't you just use CRDTs?
[Convergent (or Commutative) Replicated Data Types (CRDTs)](http://hal.upmc.fr/inria-00555588/document) are a class of distributed data structures that provably converge to some agreed-upon state with no synchronization. Stated differently: CRDTs define a merge policy that is commutative over all their operations.
CRDTs are nice because they require no custom conflict/merge code from the developer.
Noms defines a set of intutive built-in merge policies for its core datatypes. For example, the default policy makes all operations on Noms Sets commute (add wins in the case of concurrent remove/add). This means that with the default policy, Noms Sets are a CRDT.
If your application uses only operations on Noms datatypes that can be merged with whatever merge policy you are using, then your schema is a CRDT. It's possible to build surprisingly complex applications this way with just the default policy.
Noms also allows you to provide your own custom policy. If your policy commutes, then the resulting datatype will be a CRDT.
However, it would be nice if application developers could more easily opt-in to using only mergeable operations, thereby enforcing that their schema is a CRDT, and providing confidence that custom merge logic doesn't need to be implemented.
More generally, perhaps there could be a way to test that all possible conflict cases have been handled by the developer. This would allow developers to implement their own custom CRDTs. This is something we'd like to research in the future.
### Why don't you support Windows?
We are a tiny team and we all personally use Macs as our development machines, and we use Linux in production. These two platforms are very close to identical, and so we can generally test on Mac and assume it will work on Linux. Adding Windows would add significant complexity to our code and build processes which we're not willing to take on.
### But you'll accept patches for Windows, right?
No, because then we'll have to maintain those patches.
### Are there any workaround for Windows?
You can use it in a virtual machine. We have also heard Noms works OK with gitbash or cygwin, but that's coincidence.
### Why is it called Noms?
1. It's insert-only. OMNOMNOM.
2. It's content addressed. Every value has its own hash, or [name](http://dictionary.reverso.net/french-english/nom).
### Are you sure Noms doesn't stand for something?
Pretty sure. But if you like, you can pretend it stands for Non-Mutable Store.
-315
View File
@@ -1,315 +0,0 @@
[Home](../README.md) »
[Technical Overview](intro.md)&nbsp; | &nbsp;[Use Cases](../README.md#use-cases)&nbsp; | &nbsp;[Command-Line Interface](cli-tour.md)&nbsp; | &nbsp;**Go bindings Tour** | &nbsp;[Path Syntax](spelling.md)&nbsp; | &nbsp;[FAQ](faq.md)&nbsp;
<br><br>
# A Short Tour of Noms for Go
This is a short introduction to using Noms from Go. It should only take a few minutes if you have some familiarity with Go.
During the tour, you can refer to the complete [Go SDK Reference](https://godoc.org/github.com/attic-labs/noms) for more information on anything you see.
## Requirements
* [Noms command-line tools](https://github.com/attic-labs/noms#setup)
* [Go v1.6+](https://golang.org/dl/)
* Ensure your [$GOPATH](https://github.com/golang/go/wiki/GOPATH) is configured
## Start a Local Database
Let's create a local database to play with:
```sh
> mkdir /tmp/noms-go-tour
> noms serve /tmp/noms-go-tour
```
## [Database](https://github.com/attic-labs/noms/blob/master/go/datas/database.go)
Leave the server running, and in a separate terminal:
```sh
> mkdir noms-tour
> cd noms-tour
```
Then use your favorite editor so that we can start to play with code. To get started with Noms, first create a Database:
```go
package main
import (
"fmt"
"os"
"github.com/attic-labs/noms/go/spec"
)
func main() {
sp, err := spec.ForDatabase("http://localhost:8000")
if err != nil {
fmt.Fprintf(os.Stderr, "Could not access database: %s\n", err)
return
}
defer sp.Close()
}
```
Now let's run it:
```sh
> go run noms-tour.go
```
If you did not leave the server running you would see output of ```Could not access database``` here, otherwise your program should exit cleanly.
See [Spelling in Noms](https://github.com/attic-labs/noms/blob/master/doc/spelling.md) for more information on database spec strings.
## [Dataset](https://github.com/attic-labs/noms/blob/master/go/dataset/dataset.go)
Datasets are the main interface you'll use to work with Noms. Let's update our example to use a Dataset spec string:
```go
package main
import (
"fmt"
"os"
"github.com/attic-labs/noms/go/spec"
)
func main() {
sp, err := spec.ForDataset("http://localhost:8000::people")
if err != nil {
fmt.Fprintf(os.Stderr, "Could not create dataset: %s\n", err)
return
}
defer sp.Close()
if _, ok := sp.GetDataset().MaybeHeadValue(); !ok {
fmt.Fprintf(os.Stdout, "head is empty\n")
}
}
```
Now let's run it:
```sh
> go run noms-tour.go
head is empty
```
Since the dataset does not yet have any values you see ```head is empty```. Let's add some data to make it more interesting:
```go
package main
import (
"fmt"
"os"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/types"
)
func newPerson(givenName string, male bool) types.Struct {
return types.NewStruct("Person", types.StructData{
"given": types.String(givenName),
"male": types.Bool(male),
})
}
func main() {
sp, err := spec.ForDataset("http://localhost:8000::people")
if err != nil {
fmt.Fprintf(os.Stderr, "Could not create dataset: %s\n", err)
return
}
defer sp.Close()
db := sp.GetDatabase()
data := types.NewList(db,
newPerson("Rickon", true),
newPerson("Bran", true),
newPerson("Arya", false),
newPerson("Sansa", false),
)
fmt.Fprintf(os.Stdout, "data type: %v\n", types.TypeOf(data).Describe())
_, err = db.CommitValue(sp.GetDataset(), data)
if err != nil {
fmt.Fprint(os.Stderr, "Error commiting: %s\n", err)
}
}
```
Now you will get output of the data type of our Dataset value:
```shell
> go run noms-tour.go
data type: List<struct {
given: String
male: Bool
}>
```
Now you can access the data via your program:
```go
package main
import (
"fmt"
"os"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/types"
)
func main() {
sp, err := spec.ForDataset("http://localhost:8000::people")
if err != nil {
fmt.Fprintf(os.Stderr, "Could not create dataset: %s\n", err)
return
}
defer sp.Close()
if headValue, ok := sp.GetDataset().MaybeHeadValue(); !ok {
fmt.Fprintf(os.Stdout, "head is empty\n")
} else {
// type assertion to convert Head to List
personList := headValue.(types.List)
// type assertion to convert List Value to Struct
personStruct := personList.Get(0).(types.Struct)
// prints: Rickon
fmt.Fprintf(os.Stdout, "given: %v\n", personStruct.Get("given"))
}
}
```
Running it now:
```sh
> go run noms-tour.go
given: Rickon
```
You can see this data using the command-line too:
```sh
> noms ds http://localhost:8000
people
> noms show http://localhost:8000::people
struct Commit {
meta: struct {},
parents: set {},
value: [ // 4 items
struct Person {
given: "Rickon",
male: true,
},
struct Person {
given: "Bran",
male: true,
},
struct Person {
given: "Arya",
male: false,
},
struct Person {
given: "Sansa",
male: false,
},
],
}
```
Let's add some more data.
```go
package main
import (
"fmt"
"os"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/types"
)
func main() {
sp, err := spec.ForDataset("http://localhost:8000::people")
if err != nil {
fmt.Fprintf(os.Stderr, "Could not create dataset: %s\n", err)
return
}
defer sp.Close()
if headValue, ok := sp.GetDataset().MaybeHeadValue(); !ok {
fmt.Fprintf(os.Stdout, "head is empty\n")
} else {
// type assertion to convert Head to List
personList := headValue.(types.List)
personEditor := personList.Edit()
data := personEditor.Append(
types.NewStruct("Person", types.StructData{
"given": types.String("Jon"),
"family": types.String("Snow"),
"male": types.Bool(true),
}),
).List()
fmt.Fprintf(os.Stdout, "data type: %v\n", types.TypeOf(data).Describe())
_, err = sp.GetDatabase().CommitValue(sp.GetDataset(), data)
if err != nil {
fmt.Fprint(os.Stderr, "Error commiting: %s\n", err)
}
}
}
```
Running this:
```sh
> go run noms-tour.go
data type: List<Struct Person {
family?: String,
given: String,
male: Bool,
}>
```
Datasets are versioned. When you *commit* a new value, you aren't overwriting the old value, but adding to a historical log of values:
```sh
> noms log http://localhost:8000::people
commit ba3lvopbgcqqnofm3qk7sk4j2doroj1l
Parent: f0b1befu9jp82r1vcd4gmuhdno27uobi
(root) {
+ struct Person {
+ family: "Snow",
+ given: "Jon",
+ male: true,
+ }
}
commit f0b1befu9jp82r1vcd4gmuhdno27uobi
Parent: hshltip9kss28uu910qadq04mhk9kuko
commit hshltip9kss28uu910qadq04mhk9kuko
Parent: None
```
## Values
Noms supports a [variety of datatypes](https://github.com/attic-labs/noms/blob/master/doc/intro.md#types) beyond List, Struct, String, and Bool we used above.
## Samples
You can continue learning more about the Noms Go SDK by looking at the documentation and by reviewing the [samples](https://github.com/attic-labs/noms/blob/master/samples/go). The [hr sample](https://github.com/attic-labs/noms/blob/master/samples/go/hr) is a more complete implementation of our example above and will help you to see further usage of the other datatypes.
-200
View File
@@ -1,200 +0,0 @@
[Home](../README.md) »
**Technical Overview**&nbsp; | &nbsp;[Use Cases](../README.md#use-cases)&nbsp; | &nbsp;[Command-Line Interface](cli-tour.md)&nbsp; | &nbsp;[Go bindings Tour](go-tour.md) | &nbsp;[Path Syntax](spelling.md)&nbsp; | &nbsp;[FAQ](faq.md)&nbsp;
<br><br>
# Noms Technical Overview
Most conventional database systems share two central properties:
1. Data is modeled as a single point-in-time. Once a transaction commits, the previous state of the database is either lost, or available only as a fallback by reconstructing from transaction logs.
2. Data is modeled as a single source of truth. Even large-scale distributed databases which are internally a fault-tolerant network of nodes, present the abstraction to clients of being a single logical master, with which clients must coordinate in order to change state.
Noms blends the properties of decentralized systems, such as [Git](https://git-scm.com/), with properties of traditional databases in order to create a general-purpose decentralized database, in which:
1. Any peers state is as valid as any other.
2. All commits of the database are retained and available at any time.
3. Any peer is free to move forward independently of communication from any other—while retaining the ability to reconcile changes at some point in the future.
4. The basic properties of structured databases (efficient queries, updates, and range scans) are retained.
5. Diffs between any two sets of data can be computed efficiently.
6. Synchronization between disconnected copies of the database can be performed efficiently and correctly.
## Basics
As in Git, [Bitcoin](https://bitcoin.org/en/), [Ethereum](https://www.ethereum.org/), [IPFS](https://ipfs.io/), [Camlistore](https://camlistore.org/), [bup](https://bup.github.io/), and other systems, Noms models data as a [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) of nodes in which every node has a _hash_. A node's hash is derived from the values encoded in the node and (transitively) from the values encoded in all nodes which are reachable from that node.
In other words, a Noms database is a single large [Merkle DAG](https://github.com/jbenet/random-ideas/issues/20).
When two nodes have the same hash, they represent identical logical values and the respective subgraph of nodes reachable from each are topologically equivalent. Importantly, in Noms, the reverse is also true: a single logical value has one and only one hash. When two nodes have differnet hashes, they represent different logical values.
Noms extends the ideas of prior systems to enable efficiently computing and reconciling differences, synchronizing state, and building indexes over large-scale, structured data.
## Databases and Datasets
A _database_ is the top-level abstraction in Noms.
A database has two responsibilities: it provides storage of [content-addressed](https://en.wikipedia.org/wiki/Content-addressable_storage) chunks of data, and it keeps track of zero or more _datasets_.
A Noms database can be implemented on top of any underlying storage system that provides key/value storage with at least optional optimistic concurrency. We only use optimistic concurrency to store the current value of each dataset. Chunks themselves are immutable.
We have implementations of Noms databases on top of our own file-backed store [Noms Block Store (NBS)](https://github.com/attic-labs/noms/tree/master/go/nbs) (usually used locally), our own [HTTP protocol](https://github.com/attic-labs/noms/blob/master/go/datas/database_server.go) (used for working with a remote database), [Amazon DynamoDB](https://aws.amazon.com/dynamodb/), and [memory](https://github.com/attic-labs/noms/blob/master/go/chunks/memory_store.go) (mainly used for testing).
Here's an example of creating an http-backed database using the [Go Noms SDK](go-tour.md):
```go
package main
import (
"fmt"
"os"
"github.com/attic-labs/noms/go/spec"
)
func main() {
sp, err := spec.ForDatabase("http://localhost:8000")
if err != nil {
fmt.Fprintf(os.Stderr, "Could not access database: %s\n", err)
return
}
defer sp.Close()
}
```
A dataset is nothing more than a named pointer into the DAG. Consider the following command to copy the dataset named `foo` to the dataset named `bar` within a database:
```shell
noms sync http://localhost:8000::foo http://localhost:8000::bar
```
This command is trivial and causes basically zero IO. Noms first resolves the dataset name `foo` in `http://localhost:8000`. This results in a hash. Noms then checks whether that hash exists in the destination database (which in this case is the same as the source database), finds that it does, and then adds a new dataset pointing at that chunk.
Syncs across database can be efficient by the same logic if the destination database already has all or most of the chunks required chunks.
## Time
All data in Noms is immutable. Once a piece of data is stored, it is never changed. To represent state changes, Noms uses a progression of `Commit` structures.
[TODO - diagram]
As in Git, Commits typically have one _parent_, which is the previous commit in time. But in the cases of merges, a Noms commit can have multiple parents.
### Chunks
When a value is stored in Noms, it is stored as one or more chunks of data. Chunk boundaries are typically created implicitly, as a way to store large collections efficiently (see [Prolly Trees](#prolly-trees-probabilistic-b-trees)). Programmers can also create explicit chunk boundaries using the `Ref` type (see [Types](#types )).
[TODO - Diagram]
Every chunk encodes a single logical value (which may be a component of another value and/or be composed of sub-values). Chunks are [addressed](https://en.wikipedia.org/wiki/Content-addressable_storage) in the Noms persistence layer by the hash of the value they encode.
## Types
Noms is a typed system, meaning that every Noms value is classified into one of the following _types_:
* `Boolean`
* `Number` (arbitrary precision binary)
* `String` (utf8-encoded)
* `Blob` (raw binary data)
* `Set<T>`
* `List<T>`
* `Map<K,V>`
* Unions: `T|U|V|...`
* `Ref<T>` (explicit out-of-line references)
* `Struct` (user-defined record types, e.g., `Struct Person { name: String, age?: Number })`
* `Type` (A value that stores a Noms type)
Blobs, sets, lists, and maps can be gigantic - Noms will _chunk_ these types into reasonable sized parts internally for efficient storage, searching, and updating (see [Prolly Trees](#prolly-trees-probabilistic-b-trees) below for more on this).
Strings, numbers, unions, and structs are not chunked, and should be used for "reasonably-sized" values. Use `Ref` if you need to force a particular value to be in a different chunk for some reason.
Types serve several purposes in Noms:
1. Most importantly, types make Noms data self-describing. You can use the `types.TypeOf` function on any Noms `Value`, no matter how large, and get a very precise description of the entire value and all values reachable from it. This allows software to interoperate without prior agreement or planning.
2. Users of Noms can define their own structures and publish data that uses them. This allows for ad-hoc standardization of types within communities working on similar data.
3. Types can be used _structurally_. A program can check incoming data against a required type. If the incoming root chunk matches the type, or is a superset of it, then the program can proceed with certainty of the shape of all accessible data. This enables richer interoperability between software, since schemas can be expanded over time as long as a compatible subset remains.
4. Eventually, we plan to add type restrictions to datasets, which would enforce the allowed types that can be committed to a dataset. This would allow something akin to schema validation in traditional databases.
### Refs vs Hashes
A _hash_ in Noms is just like the hashes used elsewhere in computing: a short string of bytes that uniquely identifies a larger value. Every value in Noms has a hash. Noms currently uses the [sha2-512](https://github.com/attic-labs/noms/blob/master/go/hash/hash.go#L7) hash function, but that can change in future versions of the system.
A _ref_ is different in subtle, but important ways. A `Ref` is a part of the type system - a `Ref` is a value. Anywhere you can find a Noms value, you can find a `Ref`. For example, you can commit a `Ref<T>` to a dataset, but you can't commit a bare hash.
The difference is that `Ref` carries the type of its target, along with the hash. This allows us to efficiently validate commits that include `Ref`, among other things.
### Type Accretion
Noms is an immutable database, which leads to the question: How do you change the schema? If I have a dataset containing `Set<Number>`, and I later decide that it should be `Set<String>`, what do I do?
You might say that you just commit the new type, but that would mean that users can't look at a dataset and understand what types previous versions contained, without manually exploring every one of those commits.
We call our solution to this problem _Type Accretion_.
If you construct a `Set` containing only `Number`s, its type will be `Set<Number>`. If you then insert a string into this set, the type of the resulting value is `Set<Number|String>`.
This is usually completely implicit, done based on the data you store (you can set types explicitly though, which is useful in some cases).
We do the same thing for datasets. If you commit a `Set<Number>`, the type of the commit we create for you is:
```go
Struct Commit {
Value: Set<Number>
Parents: Set<Ref<Cycle<Commit>>>
}
```
This tells you that the current and all previous commits have values of type `Set<Number>`.
But if you then commit a `Set<String>` to this same dataset, then the type of that commit will be:
```go
Struct Commit {
Value: Set<String>
Parents: Set<Ref<Cycle<Commit>> |
Ref<Struct Commit {
Value: Set<Number>
Parents: Cycle<Commit>
}>>
}
}
```
This tells you that the dataset's current commit has a value of type `Set<String>` and that previous commits are either the same, or else have a value of type `Set<Number>`.
Type accretion has a number of benefits related to schema changes:
1. You can widen the type of any container (list, set, map) without rewriting any existing data. `Set<Struct { name: String }>` becomes `Set<Struct { name: String }> | Struct { name: String, age: Number }>>` and all existing data is reused.
2. You can widen containers in ways that other databases wouldn't allow. For example, you can go from `Set<Number>` to `Set<Number|String>`. Existing data is still reused.
3. You can change the type of a dataset in either direction - either widening or narrowing it, and the dataset remains self-documenting as to its current and previous types.
## Prolly Trees: Probabilistic B-Trees
A critical invariant of Noms is that the same value will be represented by the same graph, having the same chunk boundaries, regardless of what past sequence of logical mutations resulted in the value. This is the essence of content-addressing and it is what makes deduplication, efficient sync, indexing, and and other features of Noms possible.
But this invariant also rules out the use of classical B-Trees, because a B-Trees internal state depends upon its mutation history. In order to model large mutable collections in Noms, of the type where B-Trees would typically be used, Noms instead introduces _Prolly Trees_.
A Prolly Tree is a [search tree](https://en.wikipedia.org/wiki/Search_tree) where the number of values stored in each node is determined probabilistically, based on the data which is stored in the tree.
A Prolly Tree is similar in many ways to a B-Tree, except that the number of values in each node has a probabilistic average rather than an enforced upper and lower bound, and the set of values in each node is determined by the output of a rolling hash function over the values, rather than via split and join operations when upper and lower bounds are exceeded.
### Indexing and Searching with Prolly Trees
Like B-Trees, Prolly Trees are sorted. Keys of type Boolean, Number, and String sort in their natural order. Other types sort by their hash.
Because of this sorting, Noms collections can be used as efficient indexes, in the same manner as primary and secondary indexes in traditional databases.
For example, say you want to quickly be able to find `Person` structs by their age. You could build a map of type `Map<Number, Set<Person>>`. This would allow you to quickly (~log<sub>k</sub>(n) seeks, where `k` is average prolly tree width, which is currently 64) find all the people of an exact age. But it would _also_ allow you to find all people within a range of ages efficiently (~num_results/log<sub>k</sub>(n) seeks), even if the ages are non-integral.
Also, because Noms collections are ordered search trees, it is possible to implement set operations like union and intersect efficiently on them.
So, for example, if you wanted to find all the people of a particular age AND having a particular hair color, you could construct a second map having type `Map<String, Set<Person>>`, and intersect the two sets.
Over time, we plan to develop this basic capability into support for some kind of generalized query system.
Binary file not shown.

Before

Width:  |  Height:  |  Size: 86 KiB

-108
View File
@@ -1,108 +0,0 @@
[Home](../README.md) »
[Technical Overview](intro.md)&nbsp; | &nbsp;[Use Cases](../README.md#use-cases)&nbsp; | &nbsp;[Command-Line Interface](cli-tour.md)&nbsp; | &nbsp;[Go bindings Tour](go-tour.md) | &nbsp;**Path Syntax**&nbsp; | &nbsp;[FAQ](faq.md)&nbsp;
<br><br>
# Spelling in Noms
Many commands and APIs in Noms accept database, dataset, or value specifications as arguments. This document describes how to construct these specifications.
## Spelling Databases
database specifications take the form:
```nohighlight
<protocol>[:<path>]
```
The `path` part of the name is interpreted differently depending on the protocol:
- **http(s)** specs describe a remote database to be accessed over HTTP. In this case, the entire database spec is a normal http(s) URL. For example: `https://dev.noms.io/aa`.
- **mem** specs describe an ephemeral memory-backed database. In this case, the path component is not used and must be empty.
- **nbs** specs describe a local [Noms Block Store (NBS)](https://github.com/attic-labs/noms/tree/master/go/nbs)-backed database. In this case, the path component should be a relative or absolute path on disk to a directory in which to store the data, e.g. `nbs:/tmp/noms-data`.
- In Go, `nbs:` can be ommitted (just `/tmp/noms-data` will work).
- **aws** specs describe a remote Noms Block Store backed directly by Amazon Web Services, specifically DynamoDB and S3. The format is a URI containing the names of the DynamoDB table to use, the S3 bucket to use, and the database to serve. For example: `aws://dynamo-table:s3-bucket/database`.
## Spelling Datasets
Dataset specifications take the form:
```nohighlight
<database>::<dataset>
```
See [spelling databases](#spelling-databases) for how to build the `database` part of the name. The `dataset` part is just any string matching the regex `^[a-zA-Z0-9\-_/]+$`.
Example datasets:
```nohighlight
/tmp/test-db::my-dataset
nbs:/tmp/test-db::my-dataset
http://localhost:8000::registered-businesses
https://demo.noms.io/aa::music
```
## Spelling Values
Value specifications take the form:
```nohighlight
<database>::<root><path>
```
See [spelling databases](#spelling-databases) for how to build the database part of the name.
The `root` part can be either a hash or a dataset name. If `root` begins with `#` it will be interpreted as a hash otherwise it is used as a dataset name. See [spelling datasets](#spelling-datasets) for how to build the dataset part of the name.
The `path` part is relative to the `root` provided.
### Specifying Struct Fields
Elements of a Noms struct can be referenced using a period `.`.
For example, if the `root` is a dataset, then one can use `.value` to get the root of the data in the dataset. In this case `.value` selects the `value` field from the `Commit` struct at the top of the dataset. One could instead use `.meta` to select the `meta` struct from the `Commit` struct. The `root` does not need to be a dataset though, so if it is a hash that references a struct, the same notation still works: `#o38hugtf3l1e8rqtj89mijj1dq57eh4m.field`.
### Specifying Collection Values
Elements of a Noms list, map, or set can be retrieved using brackets `[...]`.
For example, if the dataset is a Noms map of number to struct then one could use `.value[42]` to get the Noms struct associated with the key 42. Similarly selecting the first element from a Noms list would be `.value[0]`. If the Noms map was keyed by string, then using `.value["0000024-02-999"]` would reference the Noms struct associated with key "0000024-02-999".
Noms lists also support indexing from the back, using `.value[-1]` to mean the last element of a last, `.value[-2]` for the 2nd last, and so on.
If the key of a Noms map or set is a Noms struct or a more complex value, then indexing into the collection can be done using the hash of that more complex value. For example, if the `root` of our dataset is a Noms set of Noms structs, then if you provide the hash of the struct element then you can index into the map using the brackets as described above. e.g. http://localhost:8000::dataset.value[#o38hugtf3l1e8rqtj89mijj1dq57eh4m].field
Similarly, the key is addressable using `@key` syntax. One use for this is when you have the hash of a complex value, but want need to retrieve the key (rather than or in addition to the value) in a Noms map. The syntax is to append `@key` after the closing bracket of the index specifier. e.g. http://localhost:8000::dataset.value[#o38hugtf3l1e8rqtj89mijj1dq57eh4m]@key would retrieve the key element specified by the hash key `#o38hugtf3l1e8rqtj89mijj1dq57eh4m` from the `dataset.value` collection.
### Specifying Collection Positions
Elements of a Noms list, map, or set can be retrived _by their position_ using the `@at(index)` annotation.
For lists, this is exactly equivalent to `[index]`. For sets and maps, note that Noms has a stable ordering, so `@at(0)` will always return the smallest element, `@at(1)` the 2nd smallest, and so on. `@at(-1)` will return the largest. For maps, adding the `@key` annotation will retrieve the key of the map entry instead of the value.
### Examples
```sh
# “sf-registered-business” dataset at https://demo.noms.io/cli-tour
https://demo.noms.io/cli-tour::sf-registered-business
# value o38hugtf3l1e8rqtj89mijj1dq57eh4m at https://localhost:8000
https://localhost:8000/monkey::#o38hugtf3l1e8rqtj89mijj1dq57eh4m
# “bonk” dataset at /foo/bar
/foo/bar::bonk
# from https://demo.noms.io/cli-tour, select the "sf-registered-business" dataset,
# the root value is a Noms map, select the value of the Noms map identified by string
# key "0000024-02-999", then from that resulting struct select the Ownership_Name field
https://demo.noms.io/cli-tour::sf-registered-business.value["0000024-02-999"].Ownership_Name
```
Be careful with shell escaping. Your shell might require escaping of the double quotes and other characters or use single quotes around the entire command line argument. e.g.:
```sh
> noms show https://demo.noms.io/cli-tour::sf-registered-business.value["0000024-02-999"].Ownership_Name
error: Invalid index: 0000024-02-999
> noms show https://demo.noms.io/cli-tour::sf-registered-business.value[\"0000024-02-999\"].Ownership_Name
"EASTMAN KODAK CO"
> noms show 'https://demo.noms.io/cli-tour::sf-registered-business.value["0000024-02-999"].Ownership_Name'
"EASTMAN KODAK CO"
```
BIN
View File
Binary file not shown.

Before

Width:  |  Height:  |  Size: 393 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 49 KiB

BIN
View File
Binary file not shown.
-15
View File
@@ -1,15 +0,0 @@
# Default database URL to be used whenever a database is not explictly provided
[db.default]
url = "ldb:.noms/tour" # This path is relative to the location of .nomsconfig
# DB alias named `origin` that refers to the remote cli-tour db
[db.origin]
url = "http://demo.noms.io/cli-tour"
# DB alias named `temp` that refers to a noms db stored under /tmp
[db.temp]
url = "ldb:/tmp/noms/shared"
# DB alias named `http` that refers to the local http db
[db.http]
url = "http://localhost:8000"
-75
View File
@@ -1,75 +0,0 @@
# nomsconfig
The noms cli now provides experimental support for configuring a convenient default database and database aliases.
You can enable this support by placing a *.nomsconfig* config file (like the [one](.nomsconfig) in this sample) in the directory where you'd like to use the configuration. Like git, any noms command issued from that directory or below will use it.
# Features
- *Database Aliases* - Define simple names to be used in place of database URLs
- *Default Database* - Define one database to be used by default when no database in mentioned
- *Dot (`.`) Shorthand* - Use `.` instead of repeating dataset/object name in destination
# Example
This example defines a simple [.nomsconfig](.nomsconfig) to try:
```shell
# Default database URL to be used whenever a database is not explictly provided
[db.default]
url = "ldb:.noms/tour"
# DB alias named `origin` that refers to the remote cli-tour db
[db.origin]
url = "http://demo.noms.io/cli-tour"
# DB alias named `temp` that refers to a noms db stored under /tmp
[db.temp]
url = "ldb:/tmp/noms/shared
```
The *[db.default]* section:
- Defines a default database
- It will be used implicitly whenever a database url is omitted in a command
The *[db.origin]* and *[db.shared]* sections:
- Define aliases that can be used wherever a db url is required
- You can define additional aliases by adding *[db.**alias**]* sections using any **alias** you prefer
Dot (`.`) shorthand:
- When issuing a command that requires a source and destination (like `noms sync`),
you can use `.` in place of the dataset/object in the destination. This is shorthand
that repeats whatever was used in the source (see below).
You can kick the tires by running noms commands from this directory. Here are some examples and what to expect:
```shell
noms ds # -> noms ds ldb:.noms/tour
noms ds default # -> noms ds ldb:.noms/tour
noms ds origin # -> noms ds http://demo.noms.io/cli-tour
noms sync origin::sf-film-locations sf-films # sync ds from origin to default
noms log sf-films # -> noms log ldb:.noms/tour::sf-films
noms log origin::sf-film-locations # -> noms log http://demo.noms.io/cli-tour::sf-film-locations
noms show '#1a2aj8svslsu7g8hplsva6oq6iq3ib6c' # -> noms show ldb:.noms/tour::'#1a2a...'
noms show origin::'#1a2aj8svslsu7g8hplsva6oq6iq3ib6c' # -> noms show http://demo.noms.io/cli-tour::'#1a2a...'
noms diff '#1a2aj8svslsu7g8hplsva6oq6iq3ib6c' origin::. # diff default::object with origin::object
noms sync origin::sf-bike-parking . # sync origin::sf-bike-parking to default::sf-bike-parking
```
A few more things to note:
- Relative paths will be expanded relative to the directory where the *.nomsconfg* is defined
- Use `noms config` to see the current alias definitions with expanded paths
- Use `-v` or `--verbose` on any command to see how the command arguments are being resolved
- Explicit DB urls are still fully supported
-1
View File
@@ -1 +0,0 @@
counter
-52
View File
@@ -1,52 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"os"
"github.com/attic-labs/noms/go/config"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/verbose"
flag "github.com/juju/gnuflag"
)
func main() {
flag.Usage = func() {
fmt.Fprintf(os.Stderr, "usage: %s [options] <dataset>\n", os.Args[0])
flag.PrintDefaults()
}
verbose.RegisterVerboseFlags(flag.CommandLine)
flag.Parse(true)
if flag.NArg() != 1 {
fmt.Fprintln(os.Stderr, "Missing required dataset argument")
return
}
cfg := config.NewResolver()
db, ds, err := cfg.GetDataset(flag.Arg(0))
if err != nil {
fmt.Fprintf(os.Stderr, "Could not create dataset: %s\n", err)
return
}
defer db.Close()
newVal := uint64(1)
if lastVal, ok := ds.MaybeHeadValue(); ok {
newVal = uint64(lastVal.(types.Float)) + 1
}
_, err = db.CommitValue(ds, types.Float(newVal))
if err != nil {
fmt.Fprintf(os.Stderr, "Error committing: %s\n", err)
return
}
fmt.Println(newVal)
}
-35
View File
@@ -1,35 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"testing"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/util/clienttest"
"github.com/stretchr/testify/suite"
)
func TestCounter(t *testing.T) {
suite.Run(t, &counterTestSuite{})
}
type counterTestSuite struct {
clienttest.ClientTestSuite
}
func (s *counterTestSuite) TestCounter() {
spec := spec.CreateValueSpecString("nbs", s.DBDir, "counter")
args := []string{spec}
stdout, stderr := s.MustRun(main, args)
s.Equal("1\n", stdout)
s.Equal("", stderr)
stdout, stderr = s.MustRun(main, args)
s.Equal("2\n", stdout)
s.Equal("", stderr)
stdout, stderr = s.MustRun(main, args)
s.Equal("3\n", stdout)
s.Equal("", stderr)
}
-28
View File
@@ -1,28 +0,0 @@
# CSV Importer
Imports a CSV file as `List<T>` where `T` is a struct with fields corresponding to the CSV's column headers. The struct spec can also be set manually with the `-header` flag.
## Usage
```shell
$ cd csv-import
$ go build
$ ./csv-import <PATH> http://localhost:8000::foo
```
## Some places for CSV files
- https://data.cityofnewyork.us/api/views/kku6-nxdu/rows.csv?accessType=DOWNLOAD
- http://www.opendatacache.com/
# CSV Exporter
Export a dataset in CSV format to stdout with column headers.
## Usage
```shell
$ cd csv-export
$ go build
$ ./csv-export http://localhost:8000:foo
```
-27
View File
@@ -1,27 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package csv
import (
"fmt"
"unicode/utf8"
)
// StringToRune returns the rune contained in delimiter or an error.
func StringToRune(delimiter string) (rune, error) {
dlimLen := len(delimiter)
if dlimLen == 0 {
return 0, fmt.Errorf("delimiter flag must contain exactly one character (rune), not an empty string")
}
d, runeSize := utf8.DecodeRuneInString(delimiter)
if d == utf8.RuneError {
return 0, fmt.Errorf("Invalid utf8 string in delimiter flag: %s", delimiter)
}
if dlimLen != runeSize {
return 0, fmt.Errorf("delimiter flag is too long. It must contain exactly one character (rune), but instead it is: %s", delimiter)
}
return d, nil
}
-1
View File
@@ -1 +0,0 @@
csv-analyze
-82
View File
@@ -1,82 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"io"
"os"
"strings"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/profile"
"github.com/attic-labs/noms/samples/go/csv"
flag "github.com/juju/gnuflag"
)
func main() {
// Actually the delimiter uses runes, which can be multiple characters long.
// https://blog.golang.org/strings
delimiter := flag.String("delimiter", ",", "field delimiter for csv file, must be exactly one character long.")
header := flag.String("header", "", "header row. If empty, we'll use the first row of the file")
skipRecords := flag.Uint("skip-records", 0, "number of records to skip at beginning of file")
detectColumnTypes := flag.Bool("detect-column-types", false, "detect column types by analyzing a portion of csv file")
detectPrimaryKeys := flag.Bool("detect-pk", false, "detect primary key candidates by analyzing a portion of csv file")
numSamples := flag.Int("num-samples", 1000000, "number of records to use for samples")
numFieldsInPK := flag.Int("num-fields-pk", 3, "maximum number of columns to consider when detecting PKs")
profile.RegisterProfileFlags(flag.CommandLine)
flag.Usage = func() {
fmt.Fprintf(os.Stderr, "Usage: csv-analyze [options] <csvfile>\n\n")
flag.PrintDefaults()
}
flag.Parse(true)
if flag.NArg() != 1 {
flag.Usage()
return
}
defer profile.MaybeStartProfile().Stop()
var r io.Reader
var filePath string
filePath = flag.Arg(0)
res, err := os.Open(filePath)
d.CheckError(err)
defer res.Close()
r = res
comma, err := csv.StringToRune(*delimiter)
d.CheckErrorNoUsage(err)
cr := csv.NewCSVReader(r, comma)
csv.SkipRecords(cr, *skipRecords)
var headers []string
if *header == "" {
headers, err = cr.Read()
d.PanicIfError(err)
} else {
headers = strings.Split(*header, string(comma))
}
kinds := []types.NomsKind{}
if *detectColumnTypes {
kinds = csv.GetSchema(cr, *numSamples, len(headers))
fmt.Fprintf(os.Stdout, "%s\n", strings.Join(csv.KindsToStrings(kinds), ","))
}
if *detectPrimaryKeys {
pks := csv.FindPrimaryKeys(cr, *numSamples, *numFieldsInPK, len(headers))
for _, pk := range pks {
fmt.Fprintf(os.Stdout, "%s\n", strings.Join(csv.GetFieldNamesFromIndices(headers, pk), ","))
}
}
}
@@ -1,93 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"io"
"io/ioutil"
"os"
"testing"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/util/clienttest"
"github.com/stretchr/testify/suite"
)
func TestCSVAnalyze(t *testing.T) {
suite.Run(t, &csvAnalyzeTestSuite{})
}
type csvAnalyzeTestSuite struct {
clienttest.ClientTestSuite
tmpFileName string
}
func writeSampleData(w io.Writer) {
_, err := io.WriteString(w, "Date,Time,Temperature\n")
d.Chk.NoError(err)
// 30 samples of String,String,Number
for i := 0; i < 30; i++ {
_, err = io.WriteString(w, fmt.Sprintf("08/14/2016,12:%d,73.4%d\n", i, i))
d.Chk.NoError(err)
}
// an extra sample of String,String,String to have detect types with smaller samples only find Number
_, err = io.WriteString(w, fmt.Sprintf("08/14/2016,13:01,none\n"))
d.Chk.NoError(err)
// an extra sample of a duplicate Date,Temperature to have detect pk rule it out (with smaller samples)
_, err = io.WriteString(w, fmt.Sprintf("08/14/2016,13:02,73.42\n"))
d.Chk.NoError(err)
}
func (s *csvAnalyzeTestSuite) SetupTest() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
s.tmpFileName = input.Name()
writeSampleData(input)
defer input.Close()
}
func (s *csvAnalyzeTestSuite) TearDownTest() {
os.Remove(s.tmpFileName)
}
func (s *csvAnalyzeTestSuite) TestCSVAnalyzeDetectColumnTypes() {
stdout, stderr := s.MustRun(main, []string{"--detect-column-types=1", s.tmpFileName})
s.Equal("String,String,String\n", stdout)
s.Equal("", stderr)
}
func (s *csvAnalyzeTestSuite) TestCSVAnalyzeDetectColumnTypesSamples20() {
stdout, stderr := s.MustRun(main, []string{"--detect-column-types=1", "--num-samples=20", s.tmpFileName})
s.Equal("String,String,Number\n", stdout)
s.Equal("", stderr)
}
func (s *csvAnalyzeTestSuite) TestCSVAnalyzeDetectPrimaryKeys() {
stdout, stderr := s.MustRun(main, []string{"--detect-pk=1", s.tmpFileName})
s.Equal("Time\nDate,Time\nTime,Temperature\nDate,Time,Temperature\n", stdout)
s.Equal("", stderr)
}
func (s *csvAnalyzeTestSuite) TestCSVAnalyzeDetectPrimaryKeysSamples20() {
stdout, stderr := s.MustRun(main, []string{"--detect-pk=1", "--num-samples=20", s.tmpFileName})
s.Equal("Time\nTemperature\nDate,Time\nDate,Temperature\nTime,Temperature\nDate,Time,Temperature\n", stdout)
s.Equal("", stderr)
}
func (s *csvAnalyzeTestSuite) TestCSVAnalyzeDetectPrimaryKeysSingleField() {
stdout, stderr := s.MustRun(main, []string{"--detect-pk=1", "--num-fields-pk=1", s.tmpFileName})
s.Equal("Time\n", stdout)
s.Equal("", stderr)
}
func (s *csvAnalyzeTestSuite) TestCSVAnalyzeDetectPrimaryKeysTwoFields() {
stdout, stderr := s.MustRun(main, []string{"--detect-pk=1", "--num-fields-pk=2", s.tmpFileName})
s.Equal("Time\nDate,Time\nTime,Temperature\n", stdout)
s.Equal("", stderr)
}
-1
View File
@@ -1 +0,0 @@
csv-export
-67
View File
@@ -1,67 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"errors"
"fmt"
"os"
"github.com/attic-labs/noms/go/config"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/profile"
"github.com/attic-labs/noms/go/util/verbose"
"github.com/attic-labs/noms/samples/go/csv"
flag "github.com/juju/gnuflag"
)
func main() {
// Actually the delimiter uses runes, which can be multiple characters long.
// https://blog.golang.org/strings
delimiter := flag.String("delimiter", ",", "field delimiter for csv file, must be exactly one character long.")
verbose.RegisterVerboseFlags(flag.CommandLine)
profile.RegisterProfileFlags(flag.CommandLine)
flag.Usage = func() {
fmt.Fprintln(os.Stderr, "Usage: csv-export [options] dataset > filename")
flag.PrintDefaults()
}
flag.Parse(true)
if flag.NArg() != 1 {
d.CheckError(errors.New("expected dataset arg"))
}
cfg := config.NewResolver()
db, ds, err := cfg.GetDataset(flag.Arg(0))
d.CheckError(err)
defer db.Close()
comma, err := csv.StringToRune(*delimiter)
d.CheckError(err)
err = d.Try(func() {
defer profile.MaybeStartProfile().Stop()
hv := ds.HeadValue()
if l, ok := hv.(types.List); ok {
structDesc := csv.GetListElemDesc(l, db)
csv.WriteList(l, structDesc, comma, os.Stdout)
} else if m, ok := hv.(types.Map); ok {
structDesc := csv.GetMapElemDesc(m, db)
csv.WriteMap(m, structDesc, comma, os.Stdout)
} else {
panic(fmt.Sprintf("Expected ListKind or MapKind, found %s", hv.Kind()))
}
})
if err != nil {
fmt.Println("Failed to export dataset as CSV:")
fmt.Println(err)
}
}
-126
View File
@@ -1,126 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"encoding/csv"
"io"
"strings"
"testing"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/nbs"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/clienttest"
"github.com/stretchr/testify/suite"
)
func TestCSVExporter(t *testing.T) {
suite.Run(t, &testSuite{})
}
type testSuite struct {
clienttest.ClientTestSuite
header []string
payload [][]string
}
func (s *testSuite) createTestData(buildAsMap bool) []types.Value {
s.header = []string{"a", "b", "c"}
structName := "SomeStruct"
s.payload = [][]string{
{"4", "10", "255"},
{"5", "7", "100"},
{"512", "12", "55"},
}
sliceLen := len(s.payload)
if buildAsMap {
sliceLen *= 2
}
structs := make([]types.Value, sliceLen)
for i, row := range s.payload {
fields := make(types.ValueSlice, len(s.header))
for j, v := range row {
fields[j] = types.String(v)
}
if buildAsMap {
structs[i*2] = fields[0]
structs[i*2+1] = types.NewStruct(structName, types.StructData{
"a": fields[0],
"b": fields[1],
"c": fields[2],
})
} else {
structs[i] = types.NewStruct(structName, types.StructData{
"a": fields[0],
"b": fields[1],
"c": fields[2],
})
}
}
return structs
}
func verifyOutput(s *testSuite, stdout string) {
csvReader := csv.NewReader(strings.NewReader(stdout))
row, err := csvReader.Read()
d.Chk.NoError(err)
s.Equal(s.header, row)
for i := 0; i < len(s.payload); i++ {
row, err := csvReader.Read()
d.Chk.NoError(err)
s.Equal(s.payload[i], row)
}
_, err = csvReader.Read()
s.Equal(io.EOF, err)
}
// FIXME: run with pipe
func (s *testSuite) TestCSVExportFromList() {
setName := "csvlist"
// Setup data store
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
ds := db.GetDataset(setName)
// Build data rows
structs := s.createTestData(false)
db.CommitValue(ds, types.NewList(db, structs...))
db.Close()
// Run exporter
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{dataspec})
s.Equal("", stderr)
verifyOutput(s, stdout)
}
func (s *testSuite) TestCSVExportFromMap() {
setName := "csvmap"
// Setup data store
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
ds := db.GetDataset(setName)
// Build data rows
structs := s.createTestData(true)
db.CommitValue(ds, types.NewMap(db, structs...))
db.Close()
// Run exporter
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{dataspec})
s.Equal("", stderr)
verifyOutput(s, stdout)
}
-1
View File
@@ -1 +0,0 @@
csv-import
-258
View File
@@ -1,258 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"errors"
"fmt"
"io"
"math"
"os"
"strings"
"time"
"github.com/attic-labs/noms/go/config"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/profile"
"github.com/attic-labs/noms/go/util/progressreader"
"github.com/attic-labs/noms/go/util/status"
"github.com/attic-labs/noms/go/util/verbose"
"github.com/attic-labs/noms/samples/go/csv"
humanize "github.com/dustin/go-humanize"
flag "github.com/juju/gnuflag"
)
const (
destList = iota
destMap = iota
)
func main() {
// Actually the delimiter uses runes, which can be multiple characters long.
// https://blog.golang.org/strings
delimiter := flag.String("delimiter", ",", "field delimiter for csv file, must be exactly one character long.")
header := flag.String("header", "", "header row. If empty, we'll use the first row of the file")
lowercase := flag.Bool("lowercase", false, "convert column names to lowercase (otherwise preserve the case in the resulting struct fields)")
name := flag.String("name", "Row", "struct name. The user-visible name to give to the struct type that will hold each row of data.")
columnTypes := flag.String("column-types", "", "a comma-separated list of types representing the desired type of each column. if absent all types default to be String")
pathDescription := "noms path to blob to import"
path := flag.String("path", "", pathDescription)
flag.StringVar(path, "p", "", pathDescription)
noProgress := flag.Bool("no-progress", false, "prevents progress from being output if true")
destType := flag.String("dest-type", "list", "the destination type to import to. can be 'list' or 'map:<pk>', where <pk> is a list of comma-delimited column headers or indexes (0-based) used to uniquely identify a row")
skipRecords := flag.Uint("skip-records", 0, "number of records to skip at beginning of file")
limit := flag.Uint64("limit-records", math.MaxUint64, "maximum number of records to process")
performCommit := flag.Bool("commit", true, "commit the data to head of the dataset (otherwise only write the data to the dataset)")
append := flag.Bool("append", false, "append new data to list at head of specified dataset.")
invert := flag.Bool("invert", false, "import rows in column major format rather than row major")
spec.RegisterCommitMetaFlags(flag.CommandLine)
verbose.RegisterVerboseFlags(flag.CommandLine)
profile.RegisterProfileFlags(flag.CommandLine)
flag.Usage = func() {
fmt.Fprintf(os.Stderr, "Usage: csv-import [options] <csvfile> <dataset>\n\n")
flag.PrintDefaults()
}
flag.Parse(true)
var err error
switch {
case flag.NArg() == 0:
err = errors.New("Maybe you put options after the dataset?")
case flag.NArg() == 1 && *path == "":
err = errors.New("If <csvfile> isn't specified, you must specify a noms path with -p")
case flag.NArg() == 2 && *path != "":
err = errors.New("Cannot specify both <csvfile> and a noms path with -p")
case flag.NArg() > 2:
err = errors.New("Too many arguments")
case strings.HasPrefix(*destType, "map") && *append:
err = errors.New("--append is only compatible with list imports")
case strings.HasPrefix(*destType, "map") && *invert:
err = errors.New("--invert is only compatible with list imports")
}
d.CheckError(err)
defer profile.MaybeStartProfile().Stop()
var r io.Reader
var size uint64
var filePath string
var dataSetArgN int
cfg := config.NewResolver()
if *path != "" {
db, val, err := cfg.GetPath(*path)
d.CheckError(err)
if val == nil {
d.CheckError(fmt.Errorf("Path %s not found\n", *path))
}
blob, ok := val.(types.Blob)
if !ok {
d.CheckError(fmt.Errorf("Path %s not a Blob: %s\n", *path, types.EncodedValue(types.TypeOf(val))))
}
defer db.Close()
preader, pwriter := io.Pipe()
go func() {
blob.Copy(pwriter)
pwriter.Close()
}()
r = preader
size = blob.Len()
dataSetArgN = 0
} else {
filePath = flag.Arg(0)
res, err := os.Open(filePath)
d.CheckError(err)
defer res.Close()
fi, err := res.Stat()
d.CheckError(err)
r = res
size = uint64(fi.Size())
dataSetArgN = 1
}
if !*noProgress {
r = progressreader.New(r, getStatusPrinter(size))
}
delim, err := csv.StringToRune(*delimiter)
d.CheckErrorNoUsage(err)
var dest int
var strPks []string
if *destType == "list" {
dest = destList
} else if strings.HasPrefix(*destType, "map:") {
dest = destMap
strPks = strings.Split(strings.TrimPrefix(*destType, "map:"), ",")
if len(strPks) == 0 {
fmt.Println("Invalid dest-type map: ", *destType)
return
}
} else {
fmt.Println("Invalid dest-type: ", *destType)
return
}
cr := csv.NewCSVReader(r, delim)
err = csv.SkipRecords(cr, *skipRecords)
if err == io.EOF {
err = fmt.Errorf("skip-records skipped past EOF")
}
d.CheckErrorNoUsage(err)
var headers []string
if *header == "" {
headers, err = cr.Read()
d.PanicIfError(err)
} else {
headers = strings.Split(*header, ",")
}
if *lowercase {
for i, _ := range headers {
headers[i] = strings.ToLower(headers[i])
}
}
uniqueHeaders := make(map[string]bool)
for _, header := range headers {
uniqueHeaders[header] = true
}
if len(uniqueHeaders) != len(headers) {
d.CheckErrorNoUsage(fmt.Errorf("Invalid headers specified, headers must be unique"))
}
kinds := []types.NomsKind{}
if *columnTypes != "" {
kinds = csv.StringsToKinds(strings.Split(*columnTypes, ","))
if len(kinds) != len(uniqueHeaders) {
d.CheckErrorNoUsage(fmt.Errorf("Invalid column-types specified, column types do not correspond to number of headers"))
}
}
db, ds, err := cfg.GetDataset(flag.Arg(dataSetArgN))
d.CheckError(err)
defer db.Close()
var value types.Value
if dest == destMap {
value = csv.ReadToMap(cr, *name, headers, strPks, kinds, db, *limit)
} else if *invert {
value = csv.ReadToColumnar(cr, *name, headers, kinds, db, *limit)
} else {
value = csv.ReadToList(cr, *name, headers, kinds, db, *limit)
}
if *performCommit {
meta, err := spec.CreateCommitMetaStruct(ds.Database(), "", "", additionalMetaInfo(filePath, *path), nil)
d.CheckErrorNoUsage(err)
if *append {
if headVal, present := ds.MaybeHeadValue(); present {
switch headVal.Kind() {
case types.ListKind:
l, isList := headVal.(types.List)
d.PanicIfFalse(isList)
ref := db.WriteValue(value)
value = l.Concat(ref.TargetValue(db).(types.List))
case types.StructKind:
hstr, isStruct := headVal.(types.Struct)
d.PanicIfFalse(isStruct)
d.PanicIfFalse(hstr.Name() == "Columnar")
str := value.(types.Struct)
hstr.IterFields(func(fieldname string, v types.Value) {
hl := v.(types.Ref).TargetValue(db).(types.List)
nl := str.Get(fieldname).(types.Ref).TargetValue(db).(types.List)
l := hl.Concat(nl)
r := db.WriteValue(l)
str = str.Set(fieldname, r)
})
value = str
default:
d.Panic("append can only be used with list or columnar")
}
}
}
_, err = db.Commit(ds, value, datas.CommitOptions{Meta: meta})
if !*noProgress {
status.Clear()
}
d.PanicIfError(err)
} else {
ref := db.WriteValue(value)
if !*noProgress {
status.Clear()
}
fmt.Fprintf(os.Stdout, "#%s\n", ref.TargetHash().String())
}
}
func additionalMetaInfo(filePath, nomsPath string) map[string]string {
fileOrNomsPath := "inputPath"
path := nomsPath
if path == "" {
path = filePath
fileOrNomsPath = "inputFile"
}
return map[string]string{fileOrNomsPath: path}
}
func getStatusPrinter(expected uint64) progressreader.Callback {
startTime := time.Now()
return func(seen uint64) {
percent := float64(seen) / float64(expected) * 100
elapsed := time.Since(startTime)
rate := float64(seen) / elapsed.Seconds()
status.Printf("%.2f%% of %s (%s/s)...",
percent,
humanize.Bytes(expected),
humanize.Bytes(uint64(rate)))
}
}
-478
View File
@@ -1,478 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"bytes"
"fmt"
"io"
"io/ioutil"
"os"
"testing"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/nbs"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/clienttest"
"github.com/stretchr/testify/suite"
)
const (
TEST_DATA_SIZE = 100
TEST_YEAR = 2012
TEST_FIELDS = "Float,String,Float,Float"
)
func TestCSVImporter(t *testing.T) {
suite.Run(t, &testSuite{})
}
type testSuite struct {
clienttest.ClientTestSuite
tmpFileName string
}
func (s *testSuite) SetupTest() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
defer input.Close()
s.tmpFileName = input.Name()
writeCSV(input)
}
func (s *testSuite) TearDownTest() {
os.Remove(s.tmpFileName)
}
func writeCSV(w io.Writer) {
writeCSVWithHeader(w, "year,a,b,c\n", 0)
}
func writeCSVWithHeader(w io.Writer, header string, startingValue int) {
_, err := io.WriteString(w, header)
d.Chk.NoError(err)
for i := 0; i < TEST_DATA_SIZE; i++ {
j := i + startingValue
_, err = io.WriteString(w, fmt.Sprintf("%d,a%d,%d,%d\n", TEST_YEAR+j%3, j, j, j*2))
d.Chk.NoError(err)
}
}
func (s *testSuite) validateList(l types.List) {
s.Equal(uint64(TEST_DATA_SIZE), l.Len())
i := uint64(0)
l.IterAll(func(v types.Value, j uint64) {
s.Equal(i, j)
st := v.(types.Struct)
s.Equal(types.Float(TEST_YEAR+i%3), st.Get("year"))
s.Equal(types.String(fmt.Sprintf("a%d", i)), st.Get("a"))
s.Equal(types.Float(i), st.Get("b"))
s.Equal(types.Float(i*2), st.Get("c"))
i++
})
}
func (s *testSuite) validateMap(vrw types.ValueReadWriter, m types.Map) {
// --dest-type=map:1 so key is field "a"
s.Equal(uint64(TEST_DATA_SIZE), m.Len())
for i := 0; i < TEST_DATA_SIZE; i++ {
v := m.Get(types.String(fmt.Sprintf("a%d", i))).(types.Struct)
s.True(v.Equals(
types.NewStruct("Row", types.StructData{
"year": types.Float(TEST_YEAR + i%3),
"a": types.String(fmt.Sprintf("a%d", i)),
"b": types.Float(i),
"c": types.Float(i * 2),
})))
}
}
func (s *testSuite) validateNestedMap(vrw types.ValueReadWriter, m types.Map) {
// --dest-type=map:0,1 so keys are fields "year", then field "a"
s.Equal(uint64(3), m.Len())
for i := 0; i < TEST_DATA_SIZE; i++ {
n := m.Get(types.Float(TEST_YEAR + i%3)).(types.Map)
o := n.Get(types.String(fmt.Sprintf("a%d", i))).(types.Struct)
s.True(o.Equals(types.NewStruct("Row", types.StructData{
"year": types.Float(TEST_YEAR + i%3),
"a": types.String(fmt.Sprintf("a%d", i)),
"b": types.Float(i),
"c": types.Float(i * 2),
})))
}
}
func (s *testSuite) validateColumnar(vrw types.ValueReadWriter, str types.Struct, reps int) {
s.Equal("Columnar", str.Name())
lists := map[string]types.List{}
for _, nm := range []string{"year", "a", "b", "c"} {
l := str.Get(nm).(types.Ref).TargetValue(vrw).(types.List)
s.Equal(uint64(reps*TEST_DATA_SIZE), l.Len())
lists[nm] = l
}
for i := 0; i < reps*TEST_DATA_SIZE; i++ {
s.Equal(types.Float(TEST_YEAR+i%3), lists["year"].Get(uint64(i)))
s.Equal(types.String(fmt.Sprintf("a%d", i)), lists["a"].Get(uint64(i)))
s.Equal(types.Float(i), lists["b"].Get(uint64(i)))
s.Equal(types.Float(i*2), lists["c"].Get(uint64(i)))
}
}
func (s *testSuite) TestCSVImporter() {
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{"--no-progress", "--column-types", TEST_FIELDS, s.tmpFileName, dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
defer os.RemoveAll(s.DBDir)
defer db.Close()
ds := db.GetDataset(setName)
s.validateList(ds.HeadValue().(types.List))
}
func (s *testSuite) TestCSVImporterLowercase() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
defer input.Close()
writeCSVWithHeader(input, "YeAr,a,B,c\n", 0)
defer os.Remove(input.Name())
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{"--no-progress", "--lowercase", "--column-types", TEST_FIELDS, input.Name(), dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
defer os.RemoveAll(s.DBDir)
defer db.Close()
ds := db.GetDataset(setName)
s.validateList(ds.HeadValue().(types.List))
}
func (s *testSuite) TestCSVImporterLowercaseDuplicate() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
defer input.Close()
writeCSVWithHeader(input, "YeAr,a,B,year\n", 0)
defer os.Remove(input.Name())
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
_, stderr, _ := s.Run(main, []string{"--no-progress", "--lowercase", "--column-types", TEST_FIELDS, input.Name(), dataspec})
s.Contains(stderr, "must be unique")
}
func (s *testSuite) TestCSVImporterFromBlob() {
test := func(pathFlag string) {
defer os.RemoveAll(s.DBDir)
newDB := func() datas.Database {
os.Mkdir(s.DBDir, 0777)
cs := nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize)
return datas.NewDatabase(cs)
}
db := newDB()
rawDS := db.GetDataset("raw")
csv := &bytes.Buffer{}
writeCSV(csv)
db.CommitValue(rawDS, types.NewBlob(db, csv))
db.Close()
stdout, stderr := s.MustRun(main, []string{
"--no-progress", "--column-types", TEST_FIELDS,
pathFlag, spec.CreateValueSpecString("nbs", s.DBDir, "raw.value"),
spec.CreateValueSpecString("nbs", s.DBDir, "csv"),
})
s.Equal("", stdout)
s.Equal("", stderr)
db = newDB()
defer db.Close()
csvDS := db.GetDataset("csv")
s.validateList(csvDS.HeadValue().(types.List))
}
test("--path")
test("-p")
}
func (s *testSuite) TestCSVImporterToMap() {
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{"--no-progress", "--column-types", TEST_FIELDS, "--dest-type", "map:1", s.tmpFileName, dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
defer os.RemoveAll(s.DBDir)
defer db.Close()
ds := db.GetDataset(setName)
m := ds.HeadValue().(types.Map)
s.validateMap(db, m)
}
func (s *testSuite) TestCSVImporterToNestedMap() {
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{"--no-progress", "--column-types", TEST_FIELDS, "--dest-type", "map:0,1", s.tmpFileName, dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
defer os.RemoveAll(s.DBDir)
defer db.Close()
ds := db.GetDataset(setName)
m := ds.HeadValue().(types.Map)
s.validateNestedMap(db, m)
}
func (s *testSuite) TestCSVImporterToNestedMapByName() {
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{"--no-progress", "--column-types", TEST_FIELDS, "--dest-type", "map:year,a", s.tmpFileName, dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
defer os.RemoveAll(s.DBDir)
defer db.Close()
ds := db.GetDataset(setName)
m := ds.HeadValue().(types.Map)
s.validateNestedMap(db, m)
}
func (s *testSuite) TestCSVImporterToColumnar() {
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{"--no-progress", "--invert", "--column-types", TEST_FIELDS, s.tmpFileName, dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
defer os.RemoveAll(s.DBDir)
defer db.Close()
ds := db.GetDataset(setName)
str := ds.HeadValue().(types.Struct)
s.validateColumnar(db, str, 1)
}
func (s *testSuite) TestCSVImporterToColumnarAppend() {
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{"--no-progress", "--invert", "--column-types", TEST_FIELDS, s.tmpFileName, dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
defer input.Close()
writeCSVWithHeader(input, "year,a,b,c\n", 100)
defer os.Remove(input.Name())
stdout, stderr = s.MustRun(main, []string{"--no-progress", "--invert", "--append", "--column-types", TEST_FIELDS, input.Name(), dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
defer os.RemoveAll(s.DBDir)
defer db.Close()
ds := db.GetDataset(setName)
str := ds.HeadValue().(types.Struct)
s.validateColumnar(db, str, 2)
}
func (s *testSuite) TestCSVImporterWithPipe() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
defer input.Close()
defer os.Remove(input.Name())
_, err = input.WriteString("a|b\n1|2\n")
d.Chk.NoError(err)
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{"--no-progress", "--column-types", "String,Float", "--delimiter", "|", input.Name(), dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
defer os.RemoveAll(s.DBDir)
defer db.Close()
ds := db.GetDataset(setName)
l := ds.HeadValue().(types.List)
s.Equal(uint64(1), l.Len())
v := l.Get(0)
st := v.(types.Struct)
s.Equal(types.String("1"), st.Get("a"))
s.Equal(types.Float(2), st.Get("b"))
}
func (s *testSuite) TestCSVImporterWithExternalHeader() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
defer input.Close()
defer os.Remove(input.Name())
_, err = input.WriteString("7,8\n")
d.Chk.NoError(err)
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{"--no-progress", "--column-types", "String,Float", "--header", "x,y", input.Name(), dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
defer os.RemoveAll(s.DBDir)
defer db.Close()
ds := db.GetDataset(setName)
l := ds.HeadValue().(types.List)
s.Equal(uint64(1), l.Len())
v := l.Get(0)
st := v.(types.Struct)
s.Equal(types.String("7"), st.Get("x"))
s.Equal(types.Float(8), st.Get("y"))
}
func (s *testSuite) TestCSVImporterWithInvalidExternalHeader() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
defer input.Close()
defer os.Remove(input.Name())
_, err = input.WriteString("7#8\n")
d.Chk.NoError(err)
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr, exitErr := s.Run(main, []string{"--no-progress", "--column-types", "String,Float", "--header", "x,x", input.Name(), dataspec})
s.Equal("", stdout)
s.Equal("error: Invalid headers specified, headers must be unique\n", stderr)
s.Equal(clienttest.ExitError{1}, exitErr)
}
func (s *testSuite) TestCSVImporterWithInvalidNumColumnTypeSpec() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
defer input.Close()
defer os.Remove(input.Name())
_, err = input.WriteString("7,8\n")
d.Chk.NoError(err)
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr, exitErr := s.Run(main, []string{"--no-progress", "--column-types", "String", "--header", "x,y", input.Name(), dataspec})
s.Equal("", stdout)
s.Equal("error: Invalid column-types specified, column types do not correspond to number of headers\n", stderr)
s.Equal(clienttest.ExitError{1}, exitErr)
}
func (s *testSuite) TestCSVImportSkipRecords() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
defer input.Close()
defer os.Remove(input.Name())
_, err = input.WriteString("garbage foo\n")
d.Chk.NoError(err)
_, err = input.WriteString("garbage bar\n")
d.Chk.NoError(err)
_, err = input.WriteString("a,b\n")
d.Chk.NoError(err)
_, err = input.WriteString("7,8\n")
d.Chk.NoError(err)
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{"--no-progress", "--skip-records", "2", input.Name(), dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
defer os.RemoveAll(s.DBDir)
defer db.Close()
ds := db.GetDataset(setName)
l := ds.HeadValue().(types.List)
s.Equal(uint64(1), l.Len())
v := l.Get(0)
st := v.(types.Struct)
s.Equal(types.String("7"), st.Get("a"))
s.Equal(types.String("8"), st.Get("b"))
}
func (s *testSuite) TestCSVImportSkipRecordsTooMany() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
defer input.Close()
defer os.Remove(input.Name())
_, err = input.WriteString("a,b\n")
d.Chk.NoError(err)
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr, recoveredErr := s.Run(main, []string{"--no-progress", "--skip-records", "100", input.Name(), dataspec})
s.Equal("", stdout)
s.Equal("error: skip-records skipped past EOF\n", stderr)
s.Equal(clienttest.ExitError{1}, recoveredErr)
}
func (s *testSuite) TestCSVImportSkipRecordsCustomHeader() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
defer input.Close()
defer os.Remove(input.Name())
_, err = input.WriteString("a,b\n")
d.Chk.NoError(err)
_, err = input.WriteString("7,8\n")
d.Chk.NoError(err)
setName := "csv"
dataspec := spec.CreateValueSpecString("nbs", s.DBDir, setName)
stdout, stderr := s.MustRun(main, []string{"--no-progress", "--skip-records", "1", "--header", "x,y", input.Name(), dataspec})
s.Equal("", stdout)
s.Equal("", stderr)
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
defer os.RemoveAll(s.DBDir)
defer db.Close()
ds := db.GetDataset(setName)
l := ds.HeadValue().(types.List)
s.Equal(uint64(1), l.Len())
v := l.Get(0)
st := v.(types.Struct)
s.Equal(types.String("7"), st.Get("x"))
s.Equal(types.String("8"), st.Get("y"))
}
-109
View File
@@ -1,109 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"io"
"os"
"os/exec"
"path"
"testing"
"github.com/attic-labs/noms/go/perf/suite"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/samples/go/csv"
humanize "github.com/dustin/go-humanize"
"github.com/stretchr/testify/assert"
)
// CSV perf suites require the testdata directory to be checked out at $GOPATH/src/github.com/attic-labs/testdata (i.e. ../testdata relative to the noms directory).
type perfSuite struct {
suite.PerfSuite
csvImportExe string
}
func (s *perfSuite) SetupSuite() {
// Trick the temp file logic into creating a unique path for the csv-import binary.
f := s.TempFile()
f.Close()
os.Remove(f.Name())
s.csvImportExe = f.Name()
err := exec.Command("go", "build", "-o", s.csvImportExe, "github.com/attic-labs/noms/samples/go/csv/csv-import").Run()
assert.NoError(s.T, err)
}
func (s *perfSuite) Test01ImportSfCrimeBlobFromTestdata() {
assert := s.NewAssert()
files := s.OpenGlob(s.Testdata, "sf-crime", "2016-07-28.*")
defer s.CloseGlob(files)
blob := types.NewBlob(s.Database, files...)
fmt.Fprintf(s.W, "\tsf-crime is %s\n", humanize.Bytes(blob.Len()))
ds := s.Database.GetDataset("sf-crime/raw")
_, err := s.Database.CommitValue(ds, blob)
assert.NoError(err)
}
func (s *perfSuite) Test02ImportSfCrimeCSVFromBlob() {
s.execCsvImportExe("sf-crime")
}
func (s *perfSuite) Test03ImportSfRegisteredBusinessesFromBlobAsMap() {
assert := s.NewAssert()
files := s.OpenGlob(s.Testdata, "sf-registered-businesses", "2016-07-25.csv")
defer s.CloseGlob(files)
blob := types.NewBlob(s.Database, files...)
fmt.Fprintf(s.W, "\tsf-reg-bus is %s\n", humanize.Bytes(blob.Len()))
ds := s.Database.GetDataset("sf-reg-bus/raw")
_, err := s.Database.CommitValue(ds, blob)
assert.NoError(err)
s.execCsvImportExe("sf-reg-bus", "--dest-type", "map:0")
}
func (s *perfSuite) Test04ImportSfRegisteredBusinessesFromBlobAsMultiKeyMap() {
s.execCsvImportExe("sf-reg-bus", "--dest-type", "map:Zip_Code,Business_Start_Date")
}
func (s *perfSuite) execCsvImportExe(dsName string, args ...string) {
assert := s.NewAssert()
blobSpec := fmt.Sprintf("%s::%s/raw.value", s.DatabaseSpec, dsName)
destSpec := fmt.Sprintf("%s::%s", s.DatabaseSpec, dsName)
args = append(args, "-p", blobSpec, destSpec)
importCmd := exec.Command(s.csvImportExe, args...)
importCmd.Stdout = s.W
importCmd.Stderr = os.Stderr
assert.NoError(importCmd.Run())
}
func (s *perfSuite) TestParseSfCrime() {
assert := s.NewAssert()
files := s.OpenGlob(path.Join(s.Testdata, "sf-crime", "2016-07-28.*"))
defer s.CloseGlob(files)
reader := csv.NewCSVReader(io.MultiReader(files...), ',')
for {
_, err := reader.Read()
if err != nil {
assert.Equal(io.EOF, err)
break
}
}
}
func TestPerf(t *testing.T) {
suite.Run("csv-import", t, &perfSuite{})
}
-1
View File
@@ -1 +0,0 @@
csv-invert
-119
View File
@@ -1,119 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"os"
"strings"
"github.com/attic-labs/noms/go/config"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/profile"
flag "github.com/juju/gnuflag"
)
func main() {
flag.Usage = func() {
fmt.Fprintf(os.Stderr, "Usage: %s [options] <dataset-to-invert> <output-dataset>\n", os.Args[0])
flag.PrintDefaults()
}
profile.RegisterProfileFlags(flag.CommandLine)
flag.Parse(true)
if flag.NArg() != 2 {
flag.Usage()
return
}
cfg := config.NewResolver()
inDB, inDS, err := cfg.GetDataset(flag.Arg(0))
d.CheckError(err)
defer inDB.Close()
head, present := inDS.MaybeHead()
if !present {
d.CheckErrorNoUsage(fmt.Errorf("The dataset %s has no head", flag.Arg(0)))
}
v := head.Get(datas.ValueField)
l, isList := v.(types.List)
if !isList {
d.CheckErrorNoUsage(fmt.Errorf("The head value of %s is not a list, but rather %s", flag.Arg(0), types.TypeOf(v).Describe()))
}
outDB, outDS, err := cfg.GetDataset(flag.Arg(1))
defer outDB.Close()
// I don't want to allocate a new types.Value every time someone calls zeroVal(), so instead have a map of canned Values to reference.
zeroVals := map[types.NomsKind]types.Value{
types.BoolKind: types.Bool(false),
types.FloatKind: types.Float(0),
types.StringKind: types.String(""),
}
zeroVal := func(t *types.Type) types.Value {
v, present := zeroVals[t.TargetKind()]
if !present {
d.CheckErrorNoUsage(fmt.Errorf("csv-invert doesn't support values of type %s", t.Describe()))
}
return v
}
defer profile.MaybeStartProfile().Stop()
type stream struct {
ch chan types.Value
zeroVal types.Value
}
streams := map[string]*stream{}
lists := map[string]<-chan types.List{}
lowers := map[string]string{}
sDesc := types.TypeOf(l).Desc.(types.CompoundDesc).ElemTypes[0].Desc.(types.StructDesc)
sDesc.IterFields(func(name string, t *types.Type, optional bool) {
lowerName := strings.ToLower(name)
if _, present := streams[lowerName]; !present {
s := &stream{make(chan types.Value, 1024), zeroVal(t)}
streams[lowerName] = s
lists[lowerName] = types.NewStreamingList(outDB, s.ch)
}
lowers[name] = lowerName
})
filledCols := make(map[string]struct{}, len(streams))
l.IterAll(func(v types.Value, index uint64) {
// First, iterate the fields that are present in |v| and append values to the correct lists
v.(types.Struct).IterFields(func(name string, value types.Value) {
ln := lowers[name]
filledCols[ln] = struct{}{}
streams[ln].ch <- value
})
// Second, iterate all the streams, skipping the ones we already sent a value for, and send an empty String for the remaining ones.
for lowerName, stream := range streams {
if _, present := filledCols[lowerName]; present {
delete(filledCols, lowerName)
continue
}
stream.ch <- stream.zeroVal
}
})
invertedStructData := types.StructData{}
for lowerName, stream := range streams {
close(stream.ch)
invertedStructData[lowerName] = <-lists[lowerName]
}
str := types.NewStruct("Columnar", invertedStructData)
parents := types.NewSet(outDB)
if headRef, present := outDS.MaybeHeadRef(); present {
parents = types.NewSet(outDB, headRef)
}
_, err = outDB.Commit(outDS, str, datas.CommitOptions{Parents: parents, Meta: head.Get(datas.MetaField).(types.Struct)})
d.PanicIfError(err)
}
-53
View File
@@ -1,53 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package csv
import (
"bufio"
"encoding/csv"
"io"
)
var (
rByte byte = 13 // the byte that corresponds to the '\r' rune.
nByte byte = 10 // the byte that corresponds to the '\n' rune.
)
type reader struct {
r *bufio.Reader
}
// Read replaces CR line endings in the source reader with LF line endings if the CR is not followed by a LF.
func (r reader) Read(p []byte) (n int, err error) {
n, err = r.r.Read(p)
bn, err := r.r.Peek(1)
for i, b := range p {
// if the current byte is a CR and the next byte is NOT a LF then replace the current byte with a LF
if j := i + 1; b == rByte && ((j < len(p) && p[j] != nByte) || (len(bn) > 0 && bn[0] != nByte)) {
p[i] = nByte
}
}
return
}
func SkipRecords(r *csv.Reader, n uint) error {
var err error
for i := uint(0); i < n; i++ {
_, err = r.Read()
if err != nil {
return err
}
}
return err
}
// NewCSVReader returns a new csv.Reader that splits on comma
func NewCSVReader(res io.Reader, comma rune) *csv.Reader {
bufRes := bufio.NewReader(res)
r := csv.NewReader(reader{r: bufRes})
r.Comma = comma
r.FieldsPerRecord = -1 // Don't enforce number of fields.
return r
}
-83
View File
@@ -1,83 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package csv
import (
"bytes"
"strings"
"testing"
"github.com/stretchr/testify/assert"
)
func TestCR(t *testing.T) {
testFile := []byte("a,b,c\r1,2,3\r")
delimiter, err := StringToRune(",")
r := NewCSVReader(bytes.NewReader(testFile), delimiter)
lines, err := r.ReadAll()
assert.NoError(t, err, "An error occurred while reading the data: %v", err)
if len(lines) != 2 {
t.Errorf("Wrong number of lines. Expected 2, got %d", len(lines))
}
}
func TestLF(t *testing.T) {
testFile := []byte("a,b,c\n1,2,3\n")
delimiter, err := StringToRune(",")
r := NewCSVReader(bytes.NewReader(testFile), delimiter)
lines, err := r.ReadAll()
assert.NoError(t, err, "An error occurred while reading the data: %v", err)
if len(lines) != 2 {
t.Errorf("Wrong number of lines. Expected 2, got %d", len(lines))
}
}
func TestCRLF(t *testing.T) {
testFile := []byte("a,b,c\r\n1,2,3\r\n")
delimiter, err := StringToRune(",")
r := NewCSVReader(bytes.NewReader(testFile), delimiter)
lines, err := r.ReadAll()
assert.NoError(t, err, "An error occurred while reading the data: %v", err)
if len(lines) != 2 {
t.Errorf("Wrong number of lines. Expected 2, got %d", len(lines))
}
}
func TestCRInQuote(t *testing.T) {
testFile := []byte("a,\"foo,\rbar\",c\r1,\"2\r\n2\",3\r")
delimiter, err := StringToRune(",")
r := NewCSVReader(bytes.NewReader(testFile), delimiter)
lines, err := r.ReadAll()
assert.NoError(t, err, "An error occurred while reading the data: %v", err)
if len(lines) != 2 {
t.Errorf("Wrong number of lines. Expected 2, got %d", len(lines))
}
if strings.Contains(lines[1][1], "\n\n") {
t.Error("The CRLF was converted to a LFLF")
}
}
func TestCRLFEndOfBufferLength(t *testing.T) {
testFile := make([]byte, 4096*2, 4096*2)
testFile[4095] = 13 // \r byte
testFile[4096] = 10 // \n byte
delimiter, err := StringToRune(",")
r := NewCSVReader(bytes.NewReader(testFile), delimiter)
lines, err := r.ReadAll()
assert.NoError(t, err, "An error occurred while reading the data: %v", err)
if len(lines) != 2 {
t.Errorf("Wrong number of lines. Expected 2, got %d", len(lines))
}
}
-37
View File
@@ -1,37 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package csv
import (
"fmt"
"strconv"
"strings"
"github.com/attic-labs/noms/go/types"
)
// KindSlice is an alias for []types.NomsKind. It's needed because types.NomsKind are really just 8 bit unsigned ints, which are what Go uses to represent 'byte', and this confuses the Go JSON marshal/unmarshal code -- it treats them as byte arrays and base64 encodes them!
type KindSlice []types.NomsKind
func (ks KindSlice) MarshalJSON() ([]byte, error) {
elems := make([]string, len(ks))
for i, k := range ks {
elems[i] = fmt.Sprintf("%d", k)
}
return []byte("[" + strings.Join(elems, ",") + "]"), nil
}
func (ks *KindSlice) UnmarshalJSON(value []byte) error {
elems := strings.Split(string(value[1:len(value)-1]), ",")
*ks = make(KindSlice, len(elems))
for i, e := range elems {
ival, err := strconv.ParseUint(e, 10, 8)
if err != nil {
return err
}
(*ks)[i] = types.NomsKind(ival)
}
return nil
}
-29
View File
@@ -1,29 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package csv
import (
"encoding/json"
"fmt"
"testing"
"github.com/attic-labs/noms/go/types"
"github.com/stretchr/testify/assert"
)
func TestKindSliceJSON(t *testing.T) {
assert := assert.New(t)
ks := KindSlice{types.FloatKind, types.StringKind, types.BoolKind}
b, err := json.Marshal(&ks)
assert.NoError(err)
assert.Equal(fmt.Sprintf("[%d,%d,%d]", ks[0], ks[1], ks[2]), string(b))
var uks KindSlice
err = json.Unmarshal(b, &uks)
assert.NoError(err, "error with json.Unmarshal")
assert.Equal(ks, uks)
}
-280
View File
@@ -1,280 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package csv
import (
"encoding/csv"
"fmt"
"io"
"sort"
"strconv"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/types"
)
// StringToKind maps names of valid NomsKinds (e.g. Bool, Float, etc) to their associated types.NomsKind
var StringToKind = func(kindMap map[types.NomsKind]string) map[string]types.NomsKind {
m := map[string]types.NomsKind{}
for k, v := range kindMap {
m[v] = k
}
return m
}(types.KindToString)
// StringsToKinds looks up each element of strs in the StringToKind map and returns a slice of answers
func StringsToKinds(strs []string) KindSlice {
kinds := make(KindSlice, len(strs))
for i, str := range strs {
k, ok := StringToKind[str]
if !ok {
d.Panic("StringToKind[%s] failed", str)
}
kinds[i] = k
}
return kinds
}
// KindsToStrings looks up each element of kinds in the types.KindToString map and returns a slice of answers
func KindsToStrings(kinds KindSlice) []string {
strs := make([]string, len(kinds))
for i, k := range kinds {
strs[i] = k.String()
}
return strs
}
//EscapeStructFieldFromCSV removes special characters and replaces spaces with camelCasing (camel case turns to camelCase)
func EscapeStructFieldFromCSV(input string) string {
if types.IsValidStructFieldName(input) {
return input
}
return types.CamelCaseFieldName(input)
}
// MakeStructTemplateFromHeaders creates a struct type from the headers using |kinds| as the type of each field. If |kinds| is empty, default to strings.
func MakeStructTemplateFromHeaders(headers []string, structName string, kinds KindSlice) (temp types.StructTemplate, fieldOrder []int, kindMap []types.NomsKind) {
useStringType := len(kinds) == 0
d.PanicIfFalse(useStringType || len(headers) == len(kinds))
fieldMap := make(map[string]types.NomsKind, len(headers))
origOrder := make(map[string]int, len(headers))
fieldNames := make(sort.StringSlice, len(headers))
for i, key := range headers {
fn := EscapeStructFieldFromCSV(key)
origOrder[fn] = i
kind := types.StringKind
if !useStringType {
kind = kinds[i]
}
_, ok := fieldMap[fn]
if ok {
d.Panic(`Duplicate field name "%s"`, key)
}
fieldMap[fn] = kind
fieldNames[i] = fn
}
sort.Sort(fieldNames)
kindMap = make([]types.NomsKind, len(fieldMap))
fieldOrder = make([]int, len(fieldMap))
for i, fn := range fieldNames {
kindMap[i] = fieldMap[fn]
fieldOrder[origOrder[fn]] = i
}
temp = types.MakeStructTemplate(structName, fieldNames)
return
}
// ReadToList takes a CSV reader and reads data into a typed List of structs.
// Each row gets read into a struct named structName, described by headers. If
// the original data contained headers it is expected that the input reader has
// already read those and are pointing at the first data row.
// If kinds is non-empty, it will be used to type the fields in the generated
// structs; otherwise, they will be left as string-fields.
// In addition to the list, ReadToList returns the typeDef of the structs in the
// list.
func ReadToList(r *csv.Reader, structName string, headers []string, kinds KindSlice, vrw types.ValueReadWriter, limit uint64) (l types.List) {
temp, fieldOrder, kindMap := MakeStructTemplateFromHeaders(headers, structName, kinds)
valueChan := make(chan types.Value, 128) // TODO: Make this a function param?
listChan := types.NewStreamingList(vrw, valueChan)
cnt := uint64(0)
for {
row, err := r.Read()
if cnt >= limit || err == io.EOF {
close(valueChan)
break
} else if err != nil {
panic(err)
}
cnt++
fields := readFieldsFromRow(row, headers, fieldOrder, kindMap)
valueChan <- temp.NewStruct(fields)
}
return <-listChan
}
type column struct {
ch chan types.Value
list <-chan types.List
zeroValue types.Value
hdr string
}
// ReadToColumnar takes a CSV reader and reads data from each column into a
// separate list. Values from columns in each successive row are appended to the
// column-specific lists whose type is described by headers. Finally, a new
// "Columnar" struct is created that consists of one field for each column and
// each field contains a list of values.
// If the original data contained headers it is expected that the input reader
// has already read those and are pointing at the first data row.
// If kinds is non-empty, it will be used to type the fields in the generated
// structs; otherwise, they will be left as string-fields.
// In addition to the list, ReadToList returns the typeDef of the structs in the
// list.
func ReadToColumnar(r *csv.Reader, structName string, headers []string, kinds KindSlice, vrw types.ValueReadWriter, limit uint64) (s types.Struct) {
valueChan := make(chan types.Value, 128) // TODO: Make this a function param?
cols := []column{}
fieldOrder := []int{}
for i, hdr := range headers {
ch := make(chan types.Value, 1024)
cols = append(cols, column{
ch: ch,
list: types.NewStreamingList(vrw, ch),
hdr: hdr,
})
fieldOrder = append(fieldOrder, i)
}
cnt := uint64(0)
for {
row, err := r.Read()
if cnt >= limit || err == io.EOF {
close(valueChan)
break
} else if err != nil {
panic(err)
}
cnt++
fields := readFieldsFromRow(row, headers, fieldOrder, kinds)
for i, v := range fields {
cols[i].ch <- v
}
}
sd := types.StructData{}
for _, col := range cols {
close(col.ch)
r := vrw.WriteValue(<-col.list)
sd[col.hdr] = r
}
return types.NewStruct("Columnar", sd)
}
// getFieldIndexByHeaderName takes the collection of headers and the name to search for and returns the index of name within the headers or -1 if not found
func getFieldIndexByHeaderName(headers []string, name string) int {
for i, header := range headers {
if header == name {
return i
}
}
return -1
}
// getPkIndices takes collection of primary keys as strings and determines if they are integers, if so then use those ints as the indices, otherwise it looks up the strings in the headers to find the indices; returning the collection of int indices representing the primary keys maintaining the order of strPks to the return collection
func getPkIndices(strPks []string, headers []string) []int {
result := make([]int, len(strPks))
for i, pk := range strPks {
pkIdx, ok := strconv.Atoi(pk)
if ok == nil {
result[i] = pkIdx
} else {
result[i] = getFieldIndexByHeaderName(headers, pk)
}
if result[i] < 0 {
d.Chk.Fail(fmt.Sprintf("Invalid pk: %v", pk))
}
}
return result
}
func readFieldsFromRow(row []string, headers []string, fieldOrder []int, kindMap []types.NomsKind) types.ValueSlice {
fields := make(types.ValueSlice, len(headers))
for i, v := range row {
if i < len(headers) {
fieldOrigIndex := fieldOrder[i]
val, err := StringToValue(v, kindMap[fieldOrigIndex])
if err != nil {
d.Chk.Fail(fmt.Sprintf("Error parsing value for column '%s': %s", headers[i], err))
}
fields[fieldOrigIndex] = val
}
}
return fields
}
// primaryKeyValuesFromFields extracts the values of the primaryKey fields into
// array. The values are in the user-specified order. This function returns 2
// objects:
// 1) a ValueSlice containing the first n-1 keys.
// 2) a single Value which will be used as the key in the leaf map created by
// GraphBuilder
func primaryKeyValuesFromFields(fields types.ValueSlice, fieldOrder, pkIndices []int) (types.ValueSlice, types.Value) {
numPrimaryKeys := len(pkIndices)
if numPrimaryKeys == 1 {
return nil, fields[fieldOrder[pkIndices[0]]]
}
keys := make(types.ValueSlice, numPrimaryKeys-1)
var value types.Value
for i, idx := range pkIndices {
k := fields[fieldOrder[idx]]
if i < numPrimaryKeys-1 {
keys[i] = k
} else {
value = k
}
}
return keys, value
}
// ReadToMap takes a CSV reader and reads data into a typed Map of structs. Each
// row gets read into a struct named structName, described by headers. If the
// original data contained headers it is expected that the input reader has
// already read those and are pointing at the first data row.
// If kinds is non-empty, it will be used to type the fields in the generated
// structs; otherwise, they will be left as string-fields.
func ReadToMap(r *csv.Reader, structName string, headersRaw []string, primaryKeys []string, kinds KindSlice, vrw types.ValueReadWriter, limit uint64) types.Map {
temp, fieldOrder, kindMap := MakeStructTemplateFromHeaders(headersRaw, structName, kinds)
pkIndices := getPkIndices(primaryKeys, headersRaw)
d.Chk.True(len(pkIndices) >= 1, "No primary key defined when reading into map")
gb := types.NewGraphBuilder(vrw, types.MapKind)
cnt := uint64(0)
for {
row, err := r.Read()
if cnt >= limit || err == io.EOF {
break
} else if err != nil {
panic(err)
}
cnt++
fields := readFieldsFromRow(row, headersRaw, fieldOrder, kindMap)
graphKeys, mapKey := primaryKeyValuesFromFields(fields, fieldOrder, pkIndices)
st := temp.NewStruct(fields)
gb.MapSet(graphKeys, mapKey, st)
}
return gb.Build().(types.Map)
}
-236
View File
@@ -1,236 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package csv
import (
"bytes"
"encoding/csv"
"math"
"testing"
"github.com/attic-labs/noms/go/chunks"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/types"
"github.com/stretchr/testify/assert"
)
var LIMIT = uint64(math.MaxUint64)
func TestReadToList(t *testing.T) {
assert := assert.New(t)
storage := &chunks.MemoryStorage{}
db := datas.NewDatabase(storage.NewView())
dataString := `a,1,true
b,2,false
`
r := NewCSVReader(bytes.NewBufferString(dataString), ',')
headers := []string{"A", "B", "C"}
kinds := KindSlice{types.StringKind, types.FloatKind, types.BoolKind}
l := ReadToList(r, "test", headers, kinds, db, LIMIT)
assert.Equal(uint64(2), l.Len())
assert.True(l.Get(0).(types.Struct).Get("A").Equals(types.String("a")))
assert.True(l.Get(1).(types.Struct).Get("A").Equals(types.String("b")))
assert.True(l.Get(0).(types.Struct).Get("B").Equals(types.Float(1)))
assert.True(l.Get(1).(types.Struct).Get("B").Equals(types.Float(2)))
assert.True(l.Get(0).(types.Struct).Get("C").Equals(types.Bool(true)))
assert.True(l.Get(1).(types.Struct).Get("C").Equals(types.Bool(false)))
}
func TestReadToMap(t *testing.T) {
assert := assert.New(t)
storage := &chunks.MemoryStorage{}
db := datas.NewDatabase(storage.NewView())
dataString := `a,1,true
b,2,false
`
r := NewCSVReader(bytes.NewBufferString(dataString), ',')
headers := []string{"A", "B", "C"}
kinds := KindSlice{types.StringKind, types.FloatKind, types.BoolKind}
m := ReadToMap(r, "test", headers, []string{"0"}, kinds, db, LIMIT)
assert.Equal(uint64(2), m.Len())
assert.True(types.TypeOf(m).Equals(
types.MakeMapType(types.StringType, types.MakeStructType("test",
types.StructField{"A", types.StringType, false},
types.StructField{"B", types.FloaTType, false},
types.StructField{"C", types.BoolType, false},
))))
assert.True(m.Get(types.String("a")).Equals(types.NewStruct("test", types.StructData{
"A": types.String("a"),
"B": types.Float(1),
"C": types.Bool(true),
})))
assert.True(m.Get(types.String("b")).Equals(types.NewStruct("test", types.StructData{
"A": types.String("b"),
"B": types.Float(2),
"C": types.Bool(false),
})))
}
func testTrailingHelper(t *testing.T, dataString string) {
assert := assert.New(t)
storage := &chunks.MemoryStorage{}
db1 := datas.NewDatabase(storage.NewView())
defer db1.Close()
r := NewCSVReader(bytes.NewBufferString(dataString), ',')
headers := []string{"A", "B"}
kinds := KindSlice{types.StringKind, types.StringKind}
l := ReadToList(r, "test", headers, kinds, db1, LIMIT)
assert.Equal(uint64(3), l.Len())
storage = &chunks.MemoryStorage{}
db2 := datas.NewDatabase(storage.NewView())
defer db2.Close()
r = NewCSVReader(bytes.NewBufferString(dataString), ',')
m := ReadToMap(r, "test", headers, []string{"0"}, kinds, db2, LIMIT)
assert.Equal(uint64(3), m.Len())
}
func TestReadTrailingHole(t *testing.T) {
dataString := `a,b,
d,e,
g,h,
`
testTrailingHelper(t, dataString)
}
func TestReadTrailingHoles(t *testing.T) {
dataString := `a,b,,
d,e
g,h
`
testTrailingHelper(t, dataString)
}
func TestReadTrailingValues(t *testing.T) {
dataString := `a,b
d,e,f
g,h,i,j
`
testTrailingHelper(t, dataString)
}
func TestEscapeStructFieldFromCSV(t *testing.T) {
assert := assert.New(t)
cases := []string{
"a", "a",
"1a", "a",
"AaZz19_", "AaZz19_",
"Q", "Q",
"AQ", "AQ",
"_content", "content",
"Few ¢ents Short", "fewEntsShort",
"CAMEL💩case letTerS", "camelcaseLetters",
"https://picasaweb.google.com/data", "httpspicasawebgooglecomdata",
"💩", "",
"11 1💩", "",
"-- A B", "aB",
"-- A --", "a",
"-- A -- B", "aB",
}
for i := 0; i < len(cases); i += 2 {
orig, expected := cases[i], cases[i+1]
assert.Equal(expected, EscapeStructFieldFromCSV(orig))
}
}
func TestReadParseError(t *testing.T) {
assert := assert.New(t)
storage := &chunks.MemoryStorage{}
db := datas.NewDatabase(storage.NewView())
dataString := `a,"b`
r := NewCSVReader(bytes.NewBufferString(dataString), ',')
headers := []string{"A", "B"}
kinds := KindSlice{types.StringKind, types.StringKind}
func() {
defer func() {
r := recover()
assert.NotNil(r)
_, ok := r.(*csv.ParseError)
assert.True(ok, "Should be a ParseError")
}()
ReadToList(r, "test", headers, kinds, db, LIMIT)
}()
}
func TestDuplicateHeaderName(t *testing.T) {
assert := assert.New(t)
storage := &chunks.MemoryStorage{}
db := datas.NewDatabase(storage.NewView())
dataString := "1,2\n3,4\n"
r := NewCSVReader(bytes.NewBufferString(dataString), ',')
headers := []string{"A", "A"}
kinds := KindSlice{types.StringKind, types.StringKind}
assert.Panics(func() { ReadToList(r, "test", headers, kinds, db, LIMIT) })
}
func TestEscapeFieldNames(t *testing.T) {
assert := assert.New(t)
storage := &chunks.MemoryStorage{}
db := datas.NewDatabase(storage.NewView())
dataString := "1,2\n"
r := NewCSVReader(bytes.NewBufferString(dataString), ',')
headers := []string{"A A", "B"}
kinds := KindSlice{types.FloatKind, types.FloatKind}
l := ReadToList(r, "test", headers, kinds, db, LIMIT)
assert.Equal(uint64(1), l.Len())
assert.Equal(types.Float(1), l.Get(0).(types.Struct).Get(EscapeStructFieldFromCSV("A A")))
r = NewCSVReader(bytes.NewBufferString(dataString), ',')
m := ReadToMap(r, "test", headers, []string{"1"}, kinds, db, LIMIT)
assert.Equal(uint64(1), l.Len())
assert.Equal(types.Float(1), m.Get(types.Float(2)).(types.Struct).Get(EscapeStructFieldFromCSV("A A")))
}
func TestDefaults(t *testing.T) {
assert := assert.New(t)
storage := &chunks.MemoryStorage{}
db := datas.NewDatabase(storage.NewView())
dataString := "42,,,\n"
r := NewCSVReader(bytes.NewBufferString(dataString), ',')
headers := []string{"A", "B", "C", "D"}
kinds := KindSlice{types.FloatKind, types.FloatKind, types.BoolKind, types.StringKind}
l := ReadToList(r, "test", headers, kinds, db, LIMIT)
assert.Equal(uint64(1), l.Len())
row := l.Get(0).(types.Struct)
assert.Equal(types.Float(42), row.Get("A"))
assert.Equal(types.Float(0), row.Get("B"))
assert.Equal(types.Bool(false), row.Get("C"))
assert.Equal(types.String(""), row.Get("D"))
}
func TestBooleanStrings(t *testing.T) {
assert := assert.New(t)
storage := &chunks.MemoryStorage{}
db := datas.NewDatabase(storage.NewView())
dataString := "true,false\n1,0\ny,n\nY,N\nY,\n"
r := NewCSVReader(bytes.NewBufferString(dataString), ',')
headers := []string{"T", "F"}
kinds := KindSlice{types.BoolKind, types.BoolKind}
l := ReadToList(r, "test", headers, kinds, db, LIMIT)
assert.Equal(uint64(5), l.Len())
for i := uint64(0); i < l.Len(); i++ {
row := l.Get(i).(types.Struct)
assert.True(types.Bool(true).Equals(row.Get("T")))
assert.True(types.Bool(false).Equals(row.Get("F")))
}
}
-244
View File
@@ -1,244 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package csv
import (
"encoding/csv"
"fmt"
"io"
"math"
"strconv"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/types"
)
type schemaOptions []*typeCanFit
func newSchemaOptions(fieldCount int) schemaOptions {
options := make([]*typeCanFit, fieldCount, fieldCount)
for i := 0; i < fieldCount; i++ {
options[i] = &typeCanFit{true, true, true}
}
return options
}
func (so schemaOptions) Test(fields []string) {
for i, t := range so {
if i < len(fields) {
t.Test(fields[i])
}
}
}
func (so schemaOptions) MostSpecificKinds() KindSlice {
kinds := make(KindSlice, len(so))
for i, t := range so {
kinds[i] = t.MostSpecificKind()
}
return kinds
}
func (so schemaOptions) ValidKinds() []KindSlice {
kinds := make([]KindSlice, len(so))
for i, t := range so {
kinds[i] = t.ValidKinds()
}
return kinds
}
type typeCanFit struct {
boolType bool
numberType bool
stringType bool
}
func (tc *typeCanFit) MostSpecificKind() types.NomsKind {
if tc.boolType {
return types.BoolKind
} else if tc.numberType {
return types.FloatKind
} else {
return types.StringKind
}
}
func (tc *typeCanFit) ValidKinds() (kinds KindSlice) {
if tc.numberType {
kinds = append(kinds, types.FloatKind)
}
if tc.boolType {
kinds = append(kinds, types.BoolKind)
}
kinds = append(kinds, types.StringKind)
return kinds
}
func (tc *typeCanFit) Test(value string) {
tc.testNumbers(value)
tc.testBool(value)
}
func (tc *typeCanFit) testNumbers(value string) {
if !tc.numberType {
return
}
fval, err := strconv.ParseFloat(value, 64)
if err != nil {
tc.numberType = false
return
}
if fval > math.MaxFloat64 {
tc.numberType = false
}
}
func (tc *typeCanFit) testBool(value string) {
if !tc.boolType {
return
}
_, err := strconv.ParseBool(value)
tc.boolType = err == nil
}
func GetSchema(r *csv.Reader, numSamples int, numFields int) KindSlice {
so := newSchemaOptions(numFields)
for i := 0; i < numSamples; i++ {
row, err := r.Read()
if err == io.EOF {
break
}
so.Test(row)
}
return so.MostSpecificKinds()
}
func GetFieldNamesFromIndices(headers []string, indices []int) []string {
result := make([]string, len(indices))
for i, idx := range indices {
result[i] = headers[idx]
}
return result
}
// combinations - n choose m combination without repeat - emit all possible `length` combinations from values
func combinationsWithLength(values []int, length int, emit func([]int)) {
n := len(values)
if length > n {
return
}
indices := make([]int, length)
for i := range indices {
indices[i] = i
}
result := make([]int, length)
for i, l := range indices {
result[i] = values[l]
}
emit(result)
for {
i := length - 1
for ; i >= 0 && indices[i] == i+n-length; i -= 1 {
}
if i < 0 {
return
}
indices[i] += 1
for j := i + 1; j < length; j += 1 {
indices[j] = indices[j-1] + 1
}
for ; i < len(indices); i += 1 {
result[i] = values[indices[i]]
}
emit(result)
}
}
// combinationsLengthsFromTo - n choose m combination without repeat - emit all possible combinations of all lengths from smallestLength to largestLength (inclusive)
func combinationsLengthsFromTo(values []int, smallestLength, largestLength int, emit func([]int)) {
for i := smallestLength; i <= largestLength; i++ {
combinationsWithLength(values, i, emit)
}
}
func makeKeyString(row []string, indices []int, separator string) string {
var result string
for _, i := range indices {
result += separator
result += row[i]
}
return result
}
// FindPrimaryKeys reads numSamples from r, using the first numFields and returns slices of []int indices that are primary keys for those samples
func FindPrimaryKeys(r *csv.Reader, numSamples, maxLenPrimaryKeyList, numFields int) [][]int {
dataToTest := make([][]string, 0, numSamples)
for i := int(0); i < numSamples; i++ {
row, err := r.Read()
if err == io.EOF {
break
}
dataToTest = append(dataToTest, row)
}
indices := make([]int, numFields)
for i := int(0); i < numFields; i++ {
indices[i] = i
}
pksFound := make([][]int, 0)
combinationsLengthsFromTo(indices, 1, maxLenPrimaryKeyList, func(combination []int) {
keys := make(map[string]bool, numSamples)
for _, row := range dataToTest {
key := makeKeyString(row, combination, "$&$")
if _, ok := keys[key]; ok {
return
}
keys[key] = true
}
// need to copy the combination because it will be changed by caller
pksFound = append(pksFound, append([]int{}, combination...))
})
return pksFound
}
// StringToValue takes a piece of data as a string and attempts to convert it to a types.Value of the appropriate types.NomsKind.
func StringToValue(s string, k types.NomsKind) (types.Value, error) {
switch k {
case types.FloatKind:
if s == "" {
return types.Float(float64(0)), nil
}
fval, err := strconv.ParseFloat(s, 64)
if err != nil {
return nil, fmt.Errorf("Could not parse '%s' into number (%s)", s, err)
}
return types.Float(fval), nil
case types.BoolKind:
// TODO: This should probably be configurable.
switch s {
case "true", "1", "y", "yes", "Y", "YES":
return types.Bool(true), nil
case "false", "0", "n", "no", "N", "NO", "":
return types.Bool(false), nil
default:
return nil, fmt.Errorf("Could not parse '%s' into bool", s)
}
case types.StringKind:
return types.String(s), nil
default:
d.Panic("Invalid column type kind:", k)
}
panic("not reached")
}
-351
View File
@@ -1,351 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package csv
import (
"fmt"
"testing"
"github.com/attic-labs/noms/go/types"
"github.com/stretchr/testify/assert"
)
func TestSchemaDetection(t *testing.T) {
assert := assert.New(t)
test := func(input [][]string, expect []KindSlice) {
options := newSchemaOptions(len(input[0]))
for _, values := range input {
options.Test(values)
}
assert.Equal(expect, options.ValidKinds())
}
test(
[][]string{
{"foo", "1", "5"},
{"bar", "0", "10"},
{"true", "1", "23"},
{"1", "1", "60"},
{"1.1", "false", "75"},
},
[]KindSlice{
{types.StringKind},
{types.BoolKind, types.StringKind},
{
types.FloatKind,
types.StringKind,
},
},
)
test(
[][]string{
{"foo"},
{"bar"},
{"true"},
{"1"},
{"1.1"},
},
[]KindSlice{
{types.StringKind},
},
)
test(
[][]string{
{"true"},
{"1"},
{"1.1"},
},
[]KindSlice{
{types.StringKind},
},
)
test(
[][]string{
{"true"},
{"false"},
{"True"},
{"False"},
{"TRUE"},
{"FALSE"},
{"1"},
{"0"},
},
[]KindSlice{
{types.BoolKind, types.StringKind},
},
)
test(
[][]string{
{"1"},
{"1.1"},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{"1"},
{"1.1"},
{"4.940656458412465441765687928682213723651e-50"},
{"-4.940656458412465441765687928682213723651e-50"},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{"1"},
{"1.1"},
{"1.797693134862315708145274237317043567981e+102"},
{"-1.797693134862315708145274237317043567981e+102"},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{"1"},
{"1.1"},
{"1.797693134862315708145274237317043567981e+309"},
{"-1.797693134862315708145274237317043567981e+309"},
},
[]KindSlice{
{
types.StringKind},
},
)
test(
[][]string{
{"1"},
{"0"},
},
[]KindSlice{
{
types.FloatKind,
types.BoolKind,
types.StringKind},
},
)
test(
[][]string{
{"1"},
{"0"},
{"-1"},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{"0"},
{"-0"},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{"1"},
{"280"},
{"0"},
{"-1"},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{"1"},
{"-180"},
{"0"},
{"-1"},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{"1"},
{"33000"},
{"0"},
{"-1"},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{"1"},
{"-44000"},
{"0"},
{"-1"},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{"1"},
{"2547483648"},
{"0"},
{"-1"},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{"1"},
{"-4347483648"},
{"0"},
{"-1"},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{fmt.Sprintf("%d", uint64(1<<63))},
{fmt.Sprintf("%d", uint64(1<<63)+1)},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
test(
[][]string{
{fmt.Sprintf("%d", uint64(1<<32))},
{fmt.Sprintf("%d", uint64(1<<32)+1)},
},
[]KindSlice{
{
types.FloatKind,
types.StringKind},
},
)
}
func TestCombinationsWithLength(t *testing.T) {
assert := assert.New(t)
test := func(input []int, length int, expect [][]int) {
combinations := make([][]int, 0)
combinationsWithLength(input, length, func(combination []int) {
combinations = append(combinations, append([]int{}, combination...))
})
assert.Equal(expect, combinations)
}
test([]int{0}, 1, [][]int{
{0},
})
test([]int{1}, 1, [][]int{
{1},
})
test([]int{0, 1}, 1, [][]int{
{0},
{1},
})
test([]int{0, 1}, 2, [][]int{
{0, 1},
})
test([]int{70, 80, 90, 100}, 1, [][]int{
{70},
{80},
{90},
{100},
})
test([]int{70, 80, 90, 100}, 2, [][]int{
{70, 80},
{70, 90},
{70, 100},
{80, 90},
{80, 100},
{90, 100},
})
test([]int{70, 80, 90, 100}, 3, [][]int{
{70, 80, 90},
{70, 80, 100},
{70, 90, 100},
{80, 90, 100},
})
}
func TestCombinationsWithLengthFromTo(t *testing.T) {
assert := assert.New(t)
test := func(input []int, smallestLength, largestLength int, expect [][]int) {
combinations := make([][]int, 0)
combinationsLengthsFromTo(input, smallestLength, largestLength, func(combination []int) {
combinations = append(combinations, append([]int{}, combination...))
})
assert.Equal(expect, combinations)
}
test([]int{0}, 1, 1, [][]int{
{0},
})
test([]int{1}, 1, 1, [][]int{
{1},
})
test([]int{0, 1}, 1, 2, [][]int{
{0},
{1},
{0, 1},
})
test([]int{0, 1}, 2, 2, [][]int{
{0, 1},
})
test([]int{70, 80, 90, 100}, 1, 3, [][]int{
{70},
{80},
{90},
{100},
{70, 80},
{70, 90},
{70, 100},
{80, 90},
{80, 100},
{90, 100},
{70, 80, 90},
{70, 80, 100},
{70, 90, 100},
{80, 90, 100},
})
}
-107
View File
@@ -1,107 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package csv
import (
"encoding/csv"
"fmt"
"io"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/types"
)
func getElemDesc(s types.Collection, index int) types.StructDesc {
t := types.TypeOf(s).Desc.(types.CompoundDesc).ElemTypes[index]
if types.StructKind != t.TargetKind() {
d.Panic("Expected StructKind, found %s", t.Kind())
}
return t.Desc.(types.StructDesc)
}
// GetListElemDesc ensures that l is a types.List of structs, pulls the types.StructDesc that describes the elements of l out of vr, and returns the StructDesc.
func GetListElemDesc(l types.List, vr types.ValueReader) types.StructDesc {
return getElemDesc(l, 0)
}
// GetMapElemDesc ensures that m is a types.Map of structs, pulls the types.StructDesc that describes the elements of m out of vr, and returns the StructDesc.
// If m is a nested types.Map of types.Map, then GetMapElemDesc will descend the levels of the enclosed types.Maps to get to a types.Struct
func GetMapElemDesc(m types.Map, vr types.ValueReader) types.StructDesc {
t := types.TypeOf(m).Desc.(types.CompoundDesc).ElemTypes[1]
if types.StructKind == t.TargetKind() {
return t.Desc.(types.StructDesc)
} else if types.MapKind == t.TargetKind() {
_, v := m.First()
return GetMapElemDesc(v.(types.Map), vr)
}
panic(fmt.Sprintf("Expected StructKind or MapKind, found %s", t.Kind().String()))
}
func writeValuesFromChan(structChan chan types.Struct, sd types.StructDesc, comma rune, output io.Writer) {
fieldNames := getFieldNamesFromStruct(sd)
csvWriter := csv.NewWriter(output)
csvWriter.Comma = comma
if csvWriter.Write(fieldNames) != nil {
d.Panic("Failed to write header %v", fieldNames)
}
record := make([]string, len(fieldNames))
for s := range structChan {
i := 0
s.WalkValues(func(v types.Value) {
record[i] = fmt.Sprintf("%v", v)
i++
})
if csvWriter.Write(record) != nil {
d.Panic("Failed to write record %v", record)
}
}
csvWriter.Flush()
if csvWriter.Error() != nil {
d.Panic("error flushing csv")
}
}
// WriteList takes a types.List l of structs (described by sd) and writes it to output as comma-delineated values.
func WriteList(l types.List, sd types.StructDesc, comma rune, output io.Writer) {
structChan := make(chan types.Struct, 1024)
go func() {
l.IterAll(func(v types.Value, index uint64) {
structChan <- v.(types.Struct)
})
close(structChan)
}()
writeValuesFromChan(structChan, sd, comma, output)
}
func sendMapValuesToChan(m types.Map, structChan chan<- types.Struct) {
m.IterAll(func(k, v types.Value) {
if subMap, ok := v.(types.Map); ok {
sendMapValuesToChan(subMap, structChan)
} else {
structChan <- v.(types.Struct)
}
})
}
// WriteMap takes a types.Map m of structs (described by sd) and writes it to output as comma-delineated values.
func WriteMap(m types.Map, sd types.StructDesc, comma rune, output io.Writer) {
structChan := make(chan types.Struct, 1024)
go func() {
sendMapValuesToChan(m, structChan)
close(structChan)
}()
writeValuesFromChan(structChan, sd, comma, output)
}
func getFieldNamesFromStruct(structDesc types.StructDesc) (fieldNames []string) {
structDesc.IterFields(func(name string, t *types.Type, optional bool) {
if !types.IsPrimitiveKind(t.TargetKind()) {
d.Panic("Expected primitive kind, found %s", t.TargetKind().String())
}
fieldNames = append(fieldNames, name)
})
return
}
-151
View File
@@ -1,151 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package csv
import (
"bytes"
"encoding/csv"
"fmt"
"io"
"io/ioutil"
"os"
"strings"
"testing"
"github.com/attic-labs/noms/go/chunks"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/clienttest"
"github.com/stretchr/testify/suite"
)
const (
TEST_ROW_STRUCT_NAME = "row"
TEST_ROW_FIELDS = "anid,month,rainfall,year"
TEST_DATA_SIZE = 200
TEST_YEAR = 2012
)
func TestCSVWrite(t *testing.T) {
suite.Run(t, &csvWriteTestSuite{})
}
type csvWriteTestSuite struct {
clienttest.ClientTestSuite
fieldTypes []*types.Type
rowStructDesc types.StructDesc
comma rune
tmpFileName string
}
func typesToKinds(ts []*types.Type) KindSlice {
kinds := make(KindSlice, len(ts))
for i, t := range ts {
kinds[i] = t.TargetKind()
}
return kinds
}
func (s *csvWriteTestSuite) SetupTest() {
input, err := ioutil.TempFile(s.TempDir, "")
d.Chk.NoError(err)
s.tmpFileName = input.Name()
defer input.Close()
fieldNames := strings.Split(TEST_ROW_FIELDS, ",")
s.fieldTypes = []*types.Type{types.StringType, types.FloaTType, types.FloaTType, types.FloaTType}
fields := make([]types.StructField, len(fieldNames))
for i, name := range fieldNames {
fields[i] = types.StructField{
Name: name,
Type: s.fieldTypes[i],
}
}
rowStructType := types.MakeStructType(TEST_ROW_STRUCT_NAME, fields...)
s.rowStructDesc = rowStructType.Desc.(types.StructDesc)
s.comma, _ = StringToRune(",")
createCsvTestExpectationFile(input)
}
func (s *csvWriteTestSuite) TearDownTest() {
os.Remove(s.tmpFileName)
}
func createCsvTestExpectationFile(w io.Writer) {
_, err := io.WriteString(w, TEST_ROW_FIELDS)
d.Chk.NoError(err)
_, err = io.WriteString(w, "\n")
d.Chk.NoError(err)
for i := 0; i < TEST_DATA_SIZE; i++ {
_, err = io.WriteString(w, fmt.Sprintf("a - %3d,%d,%d,%d\n", i, i%12, i%32, TEST_YEAR+i%4))
d.Chk.NoError(err)
}
}
func startReadingCsvTestExpectationFile(s *csvWriteTestSuite) (cr *csv.Reader, headers []string) {
res, err := os.Open(s.tmpFileName)
d.PanicIfError(err)
cr = NewCSVReader(res, s.comma)
headers, err = cr.Read()
d.PanicIfError(err)
return
}
func createTestList(s *csvWriteTestSuite) types.List {
storage := &chunks.MemoryStorage{}
db := datas.NewDatabase(storage.NewView())
cr, headers := startReadingCsvTestExpectationFile(s)
l := ReadToList(cr, TEST_ROW_STRUCT_NAME, headers, typesToKinds(s.fieldTypes), db, LIMIT)
return l
}
func createTestMap(s *csvWriteTestSuite) types.Map {
storage := &chunks.MemoryStorage{}
db := datas.NewDatabase(storage.NewView())
cr, headers := startReadingCsvTestExpectationFile(s)
return ReadToMap(cr, TEST_ROW_STRUCT_NAME, headers, []string{"anid"}, typesToKinds(s.fieldTypes), db, LIMIT)
}
func createTestNestedMap(s *csvWriteTestSuite) types.Map {
storage := &chunks.MemoryStorage{}
db := datas.NewDatabase(storage.NewView())
cr, headers := startReadingCsvTestExpectationFile(s)
return ReadToMap(cr, TEST_ROW_STRUCT_NAME, headers, []string{"anid", "year"}, typesToKinds(s.fieldTypes), db, LIMIT)
}
func verifyOutput(s *csvWriteTestSuite, r io.Reader) {
res, err := os.Open(s.tmpFileName)
d.PanicIfError(err)
actual, err := ioutil.ReadAll(r)
d.Chk.NoError(err)
expected, err := ioutil.ReadAll(res)
d.Chk.NoError(err)
s.True(string(expected) == string(actual), "csv files are different")
}
func (s *csvWriteTestSuite) TestCSVWriteList() {
l := createTestList(s)
w := new(bytes.Buffer)
s.True(TEST_DATA_SIZE == l.Len(), "list length")
WriteList(l, s.rowStructDesc, s.comma, w)
verifyOutput(s, w)
}
func (s *csvWriteTestSuite) TestCSVWriteMap() {
m := createTestMap(s)
w := new(bytes.Buffer)
s.True(TEST_DATA_SIZE == m.Len(), "map length")
WriteMap(m, s.rowStructDesc, s.comma, w)
verifyOutput(s, w)
}
func (s *csvWriteTestSuite) TestCSVWriteNestedMap() {
m := createTestNestedMap(s)
w := new(bytes.Buffer)
s.True(TEST_DATA_SIZE == m.Len(), "nested map length")
WriteMap(m, s.rowStructDesc, s.comma, w)
verifyOutput(s, w)
}
-9
View File
@@ -1,9 +0,0 @@
# About
This directory contains two sample applications that demonstrate using Noms in a decentralized environment.
Both applications implement multiuser chat, using different strategies.
`p2p-chat` is the simplest possible example: a fully local noms replica is run on each node, and all nodes synchronize continuously with each other over HTTP.
`ipfs-chat` backs Noms with the [IPFS](https://ipfs.io/) network, so that nodes don't have to keep a full local replica of all data. However, because [Filecoin](http://filecoin.io/) doesn't yet exist, *some node* does have to keep a full replica, so ipfs-chat has a `daemon` mode so that you can run a persistent node somewhere to be the replica of last resort.
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
-47
View File
@@ -1,47 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package dbg
import (
"fmt"
"github.com/attic-labs/noms/go/d"
"log"
"os"
"strconv"
)
var (
Filepath = "/tmp/noms-dbg.log"
lg = NewLogger(Filepath)
)
func NewLogger(fp string) *log.Logger {
f, err := os.OpenFile(fp, os.O_RDWR|os.O_CREATE|os.O_APPEND, 0644)
d.PanicIfError(err)
pid := strconv.FormatInt(int64(os.Getpid()), 10)
return log.New(f, pid+": ", 0644)
}
func GetLogger() *log.Logger {
return lg
}
func SetLogger(newLg *log.Logger) {
lg = newLg
}
func Debug(s string, args ...interface{}) {
s1 := fmt.Sprintf(s, args...)
lg.Println(s1)
}
func BoxF(s string, args ...interface{}) func() {
s1 := fmt.Sprintf(s, args...)
Debug("starting %s", s1)
f := func() {
Debug("finished %s", s1)
}
return f
}
-189
View File
@@ -1,189 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"log"
"os"
"os/signal"
"runtime"
"syscall"
"github.com/attic-labs/noms/go/chunks"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/ipfs"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/samples/go/decent/dbg"
"github.com/attic-labs/noms/samples/go/decent/lib"
"github.com/ipfs/go-ipfs/core"
"github.com/jroimartin/gocui"
"gopkg.in/alecthomas/kingpin.v2"
)
func main() {
// allow short (-h) help
kingpin.CommandLine.HelpFlag.Short('h')
clientCmd := kingpin.Command("client", "runs the ipfs-chat client UI")
clientTopic := clientCmd.Flag("topic", "IPFS pubsub topic to publish and subscribe to").Default("ipfs-chat").String()
username := clientCmd.Flag("username", "username to sign in as").String()
nodeIdx := clientCmd.Flag("node-idx", "a single digit to be used as last digit in all port values: api, gateway and swarm (must be 0-9 inclusive)").Default("-1").Int()
clientDS := clientCmd.Arg("dataset", "the dataset spec to store chat data in").Required().String()
importCmd := kingpin.Command("import", "imports data into a chat")
importDir := importCmd.Flag("dir", "directory that contains data to import").Default("./data").ExistingDir()
importDS := importCmd.Arg("dataset", "the dataset spec to import chat data to").Required().String()
daemonCmd := kingpin.Command("daemon", "runs a daemon that simulates filecoin, eagerly storing all chunks for a chat")
daemonTopic := daemonCmd.Flag("topic", "IPFS pubsub topic to publish and subscribe to").Default("ipfs-chat").String()
daemonInterval := daemonCmd.Flag("interval", "amount of time to wait before publishing state to network").Default("5s").Duration()
daemonNodeIdx := daemonCmd.Flag("node-idx", "a single digit to be used as last digit in all port values: api, gateway and swarm (must be 0-9 inclusive)").Default("-1").Int()
daemonDS := daemonCmd.Arg("dataset", "the dataset spec indicating ipfs repo to use").Required().String()
kingpin.CommandLine.Help = "A demonstration of using Noms to build a scalable multiuser collaborative application."
expandRLimit()
switch kingpin.Parse() {
case "client":
cInfo := lib.ClientInfo{
Topic: *clientTopic,
Username: *username,
Idx: *nodeIdx,
IsDaemon: false,
Delegate: lib.IPFSEventDelegate{},
}
runClient(*clientDS, cInfo)
case "import":
lib.RunImport(*importDir, *importDS)
case "daemon":
cInfo := lib.ClientInfo{
Topic: *daemonTopic,
Username: "daemon",
Interval: *daemonInterval,
Idx: *daemonNodeIdx,
IsDaemon: true,
Delegate: lib.IPFSEventDelegate{},
}
runDaemon(*daemonDS, cInfo)
}
}
func runClient(ipfsSpec string, cInfo lib.ClientInfo) {
dbg.SetLogger(lib.NewLogger(cInfo.Username))
sp, err := spec.ForDataset(ipfsSpec)
d.CheckErrorNoUsage(err)
if !isIPFS(sp.Protocol) {
fmt.Println("ipfs-chat requires an 'ipfs' dataset")
os.Exit(1)
}
node, cs := initIPFSChunkStore(sp, cInfo.Idx)
db := datas.NewDatabase(cs)
// Get the head of specified dataset.
ds := db.GetDataset(sp.Path.Dataset)
ds, err = lib.InitDatabase(ds)
d.PanicIfError(err)
events := make(chan lib.ChatEvent, 1024)
t := lib.CreateTermUI(events)
defer t.Close()
d.PanicIfError(t.Layout())
t.ResetAuthors(ds)
t.UpdateMessages(ds, nil, nil)
go lib.ProcessChatEvents(node, ds, events, t, cInfo)
go lib.ReceiveMessages(node, events, cInfo)
if err := t.Gui.MainLoop(); err != nil && err != gocui.ErrQuit {
dbg.Debug("mainloop has exited, err:", err)
log.Panicln(err)
}
}
func runDaemon(ipfsSpec string, cInfo lib.ClientInfo) {
dbg.SetLogger(log.New(os.Stdout, "", 0))
sp, err := spec.ForDataset(ipfsSpec)
d.CheckErrorNoUsage(err)
if !isIPFS(sp.Protocol) {
fmt.Println("ipfs-chat requires an 'ipfs' dataset")
os.Exit(1)
}
// Create/Open a new network chunkstore
node, cs := initIPFSChunkStore(sp, cInfo.Idx)
db := datas.NewDatabase(cs)
// Get the head of specified dataset.
ds := db.GetDataset(sp.Path.Dataset)
ds, err = lib.InitDatabase(ds)
d.PanicIfError(err)
events := make(chan lib.ChatEvent, 1024)
handleSIGQUIT(events)
go lib.ReceiveMessages(node, events, cInfo)
lib.ProcessChatEvents(node, ds, events, nil, cInfo)
}
func handleSIGQUIT(events chan<- lib.ChatEvent) {
sigChan := make(chan os.Signal)
go func() {
for range sigChan {
stacktrace := make([]byte, 1024*1024)
length := runtime.Stack(stacktrace, true)
dbg.Debug(string(stacktrace[:length]))
events <- lib.ChatEvent{EventType: lib.QuitEvent}
}
}()
signal.Notify(sigChan, os.Interrupt)
signal.Notify(sigChan, syscall.SIGQUIT)
}
// IPFS can use a lot of file decriptors. There are several bugs in the IPFS
// repo about this and plans to improve. For the time being, we bump the limits
// for this process.
func expandRLimit() {
var rLimit syscall.Rlimit
err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rLimit)
d.Chk.NoError(err, "Unable to query file rlimit: %s", err)
if rLimit.Cur < rLimit.Max {
rLimit.Max = 64000
rLimit.Cur = 64000
err = syscall.Setrlimit(syscall.RLIMIT_NOFILE, &rLimit)
d.Chk.NoError(err, "Unable to increase number of open files limit: %s", err)
}
err = syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rLimit)
d.Chk.NoError(err)
err = syscall.Getrlimit(8, &rLimit)
d.Chk.NoError(err, "Unable to query thread rlimit: %s", err)
if rLimit.Cur < rLimit.Max {
rLimit.Max = 64000
rLimit.Cur = 64000
err = syscall.Setrlimit(8, &rLimit)
d.Chk.NoError(err, "Unable to increase number of threads limit: %s", err)
}
err = syscall.Getrlimit(8, &rLimit)
d.Chk.NoError(err)
}
func initIPFSChunkStore(sp spec.Spec, nodeIdx int) (*core.IpfsNode, chunks.ChunkStore) {
// recreate database so that we can have control of chunkstore's ipfs node
node := ipfs.OpenIPFSRepo(sp.DatabaseName, nodeIdx)
cs := ipfs.ChunkStoreFromIPFSNode(sp.DatabaseName, sp.Protocol == "ipfs-local", node, 1)
return node, cs
}
func isIPFS(protocol string) bool {
return protocol == "ipfs" || protocol == "ipfs-local"
}
-67
View File
@@ -1,67 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package lib
import (
"fmt"
"strings"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/marshal"
"github.com/attic-labs/noms/go/types"
)
type dataPager struct {
dataset datas.Dataset
msgKeyChan chan types.String
doneChan chan struct{}
msgMap types.Map
terms []string
}
func NewDataPager(ds datas.Dataset, mkChan chan types.String, doneChan chan struct{}, msgs types.Map, terms []string) *dataPager {
return &dataPager{
dataset: ds,
msgKeyChan: mkChan,
doneChan: doneChan,
msgMap: msgs,
terms: terms,
}
}
func (dp *dataPager) Close() {
dp.doneChan <- struct{}{}
}
func (dp *dataPager) Next() (string, bool) {
msgKey := <-dp.msgKeyChan
if msgKey == "" {
return "", false
}
nm := dp.msgMap.Get(msgKey)
var m Message
err := marshal.Unmarshal(nm, &m)
if err != nil {
return fmt.Sprintf("ERROR: %s", err.Error()), true
}
s1 := fmt.Sprintf("%s: %s", m.Author, m.Body)
s2 := highlightTerms(s1, dp.terms)
return s2, true
}
func (dp *dataPager) Prepend(lines []string, target int) ([]string, bool) {
new := []string{}
m, ok := dp.Next()
if !ok {
return lines, false
}
for ; ok && len(new) < target; m, ok = dp.Next() {
new1 := strings.Split(m, "\n")
new = append(new1, new...)
}
return append(new, lines...), true
}
-281
View File
@@ -1,281 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package lib
import (
"context"
"fmt"
"time"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/hash"
"github.com/attic-labs/noms/go/ipfs"
"github.com/attic-labs/noms/go/merge"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/math"
"github.com/attic-labs/noms/samples/go/decent/dbg"
"github.com/ipfs/go-ipfs/core"
)
const (
InputEvent ChatEventType = "input"
SearchEvent ChatEventType = "search"
SyncEvent ChatEventType = "sync"
QuitEvent ChatEventType = "quit"
)
type ClientInfo struct {
Topic string
Username string
Interval time.Duration
Idx int
IsDaemon bool
Dir string
Spec spec.Spec
Delegate EventDelegate
}
type ChatEventType string
type ChatEvent struct {
EventType ChatEventType
Event string
}
type EventDelegate interface {
PinBlocks(node *core.IpfsNode, sourceDB, sinkDB datas.Database, sourceCommit types.Value)
SourceCommitFromMsgData(db datas.Database, msgData string) (datas.Database, types.Value)
HashFromMsgData(msgData string) (hash.Hash, error)
GenMessageData(cInfo ClientInfo, h hash.Hash) string
}
// ProcessChatEvent reads events from the event channel and processes them
// sequentially. Is ClientInfo.IsDaemon is true, it also publishes the current
// head of the dataset continously.
func ProcessChatEvents(node *core.IpfsNode, ds datas.Dataset, events chan ChatEvent, t *TermUI, cInfo ClientInfo) {
stopChan := make(chan struct{})
if cInfo.IsDaemon {
go func() {
tickChan := time.NewTicker(cInfo.Interval).C
for {
select {
case <-stopChan:
break
case <-tickChan:
Publish(node, cInfo, ds.HeadRef().TargetHash())
}
}
}()
}
for event := range events {
switch event.EventType {
case SyncEvent:
ds = processHash(t, node, ds, event.Event, cInfo)
Publish(node, cInfo, ds.HeadRef().TargetHash())
case InputEvent:
ds = processInput(t, node, ds, event.Event, cInfo)
Publish(node, cInfo, ds.HeadRef().TargetHash())
case SearchEvent:
processSearch(t, node, ds, event.Event, cInfo)
case QuitEvent:
dbg.Debug("QuitEvent received, stopping program")
stopChan <- struct{}{}
return
}
}
}
// processHash processes msgs published by other chat nodes and does the work to
// integrate new data into this nodes local database and display it as needed.
func processHash(t *TermUI, node *core.IpfsNode, ds datas.Dataset, msgData string, cInfo ClientInfo) datas.Dataset {
h, err := cInfo.Delegate.HashFromMsgData(msgData)
d.PanicIfError(err)
defer dbg.BoxF("processHash, msgData: %s, hash: %s, cid: %s", msgData, h, ipfs.NomsHashToCID(h))()
sinkDB := ds.Database()
d.PanicIfFalse(ds.HasHead())
headRef := ds.HeadRef()
if h == headRef.TargetHash() {
dbg.Debug("received hash same as current head, nothing to do")
return ds
}
dbg.Debug("reading value for hash: %s", h)
sourceDB, sourceCommit := cInfo.Delegate.SourceCommitFromMsgData(sinkDB, msgData)
if sourceCommit == nil {
dbg.Debug("FAILED to read value for hash: %s", h)
return ds
}
sourceRef := types.NewRef(sourceCommit)
_, isP2P := cInfo.Delegate.(P2PEventDelegate)
if cInfo.IsDaemon || isP2P {
cInfo.Delegate.PinBlocks(node, sourceDB, sinkDB, sourceCommit)
}
dbg.Debug("Finding common ancestor for merge, sourceRef: %s, headRef: %s", sourceRef.TargetHash(), headRef.TargetHash())
a, ok := datas.FindCommonAncestor(sourceRef, headRef, sinkDB)
if !ok {
dbg.Debug("no common ancestor, cannot merge update!")
return ds
}
dbg.Debug("Checking if source commit is ancestor")
if a.Equals(sourceRef) {
dbg.Debug("source commit was ancestor, nothing to do")
return ds
}
if a.Equals(headRef) {
dbg.Debug("fast-forward to source commit")
ds, err := sinkDB.SetHead(ds, sourceRef)
d.Chk.NoError(err)
if !cInfo.IsDaemon {
t.UpdateMessagesFromSync(ds)
}
return ds
}
dbg.Debug("We have a mergeable commit")
left := ds.HeadValue()
right := sourceCommit.(types.Struct).Get("value")
parent := a.TargetValue(sinkDB).(types.Struct).Get("value")
dbg.Debug("Starting three-way commit")
merged, err := merge.ThreeWay(left, right, parent, sinkDB, nil, nil)
if err != nil {
dbg.Debug("could not merge received data: " + err.Error())
return ds
}
dbg.Debug("setting new datasetHead on localDB")
newCommit := datas.NewCommit(merged, types.NewSet(sinkDB, ds.HeadRef(), sourceRef), types.EmptyStruct)
commitRef := sinkDB.WriteValue(newCommit)
dbg.Debug("wrote new commit: %s", commitRef.TargetHash())
ds, err = sinkDB.SetHead(ds, commitRef)
if err != nil {
dbg.Debug("call to db.SetHead on failed, err: %s", err)
}
dbg.Debug("set new head ref: %s on ds.ID: %s", commitRef.TargetHash(), ds.ID())
newH := ds.HeadRef().TargetHash()
dbg.Debug("merged commit, dataset: %s, head: %s, cid: %s", ds.ID(), newH, ipfs.NomsHashToCID(newH))
if cInfo.IsDaemon {
cInfo.Delegate.PinBlocks(node, sourceDB, sinkDB, newCommit)
} else {
t.UpdateMessagesFromSync(ds)
}
return ds
}
// processInput adds a new msg (entered through the UI) updates it's dataset.
func processInput(t *TermUI, node *core.IpfsNode, ds datas.Dataset, msg string, cInfo ClientInfo) datas.Dataset {
defer dbg.BoxF("processInput, msg: %s", msg)()
t.InSearch = false
if msg != "" {
var err error
ds, err = AddMessage(msg, cInfo.Username, time.Now(), ds)
d.PanicIfError(err)
}
t.UpdateMessagesAsync(ds, nil, nil)
return ds
}
// updates the UI to display search results.
func processSearch(t *TermUI, node *core.IpfsNode, ds datas.Dataset, terms string, cInfo ClientInfo) {
defer dbg.BoxF("processSearch")()
if terms == "" {
return
}
t.InSearch = true
searchTerms := TermsFromString(terms)
searchIds := SearchIndex(ds, searchTerms)
t.UpdateMessagesAsync(ds, &searchIds, searchTerms)
return
}
// recurses over the chunks originating at 'h' and pins them to the IPFS repo.
func pinBlocks(node *core.IpfsNode, h hash.Hash, db datas.Database, depth, cnt int) (maxDepth, newCnt int) {
maxDepth, newCnt = depth, cnt
cid := ipfs.NomsHashToCID(h)
_, pinned, err := node.Pinning.IsPinned(cid)
d.Chk.NoError(err)
if pinned {
return
}
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
v := db.ReadValue(h)
d.Chk.NotNil(v)
v.WalkRefs(func(r types.Ref) {
var newDepth int
newDepth, newCnt = pinBlocks(node, r.TargetHash(), db, depth+1, newCnt)
maxDepth = math.MaxInt(newDepth, maxDepth)
})
n, err := node.DAG.Get(ctx, cid)
d.Chk.NoError(err)
err = node.Pinning.Pin(ctx, n, false)
d.Chk.NoError(err)
newCnt++
return
}
type IPFSEventDelegate struct{}
func (d IPFSEventDelegate) PinBlocks(node *core.IpfsNode, sourceDB, sinkDB datas.Database, sourceCommit types.Value) {
h := sourceCommit.Hash()
dbg.Debug("Starting pinBlocks")
depth, cnt := pinBlocks(node, h, sinkDB, 0, 0)
dbg.Debug("Finished pinBlocks, depth: %d, cnt: %d", depth, cnt)
node.Pinning.Flush()
}
func (d IPFSEventDelegate) SourceCommitFromMsgData(db datas.Database, msgData string) (datas.Database, types.Value) {
h := hash.Parse(msgData)
v := db.ReadValue(h)
return db, v
}
func (d IPFSEventDelegate) HashFromMsgData(msgData string) (hash.Hash, error) {
var err error
h, ok := hash.MaybeParse(msgData)
if !ok {
err = fmt.Errorf("Failed to parse hash from msgData: %s", msgData)
}
return h, err
}
func (d IPFSEventDelegate) GenMessageData(cInfo ClientInfo, h hash.Hash) string {
return h.String()
}
type P2PEventDelegate struct{}
func (d P2PEventDelegate) PinBlocks(node *core.IpfsNode, sourceDB, sinkDB datas.Database, sourceCommit types.Value) {
sourceRef := types.NewRef(sourceCommit)
datas.Pull(sourceDB, sinkDB, sourceRef, nil)
}
func (d P2PEventDelegate) SourceCommitFromMsgData(db datas.Database, msgData string) (datas.Database, types.Value) {
sp, _ := spec.ForPath(msgData)
v := sp.GetValue()
return sp.GetDatabase(), v
}
func (d P2PEventDelegate) HashFromMsgData(msgData string) (hash.Hash, error) {
sp, err := spec.ForPath(msgData)
return sp.Path.Hash, err
}
func (d P2PEventDelegate) GenMessageData(cInfo ClientInfo, h hash.Hash) string {
return fmt.Sprintf("%s::#%s", cInfo.Spec, h)
}
-164
View File
@@ -1,164 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package lib
import (
"errors"
"fmt"
"os"
"path/filepath"
"regexp"
"sort"
"strings"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/marshal"
"github.com/attic-labs/noms/go/merge"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/datetime"
"golang.org/x/net/html"
)
var (
character = ""
msgs = []Message{}
)
func RunImport(dir, dsSpec string) error {
filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
if path == dir {
return nil
}
if !strings.HasSuffix(info.Name(), ".html") {
return nil
}
fmt.Println("importing:", path)
f, err := os.Open(path)
d.Chk.NoError(err)
n, err := html.Parse(f)
d.Chk.NoError(err)
extractDialog(n)
return nil
})
if len(msgs) == 0 {
return errors.New("Failed to import any data")
}
fmt.Println("Imported", len(msgs), "messages")
sp, err := spec.ForDataset(dsSpec)
d.CheckErrorNoUsage(err)
ds := sp.GetDataset()
ds, err = InitDatabase(ds)
d.PanicIfError(err)
db := ds.Database()
fmt.Println("Creating msg map")
kvPairs := []types.Value{}
for _, msg := range msgs {
kvPairs = append(kvPairs, types.String(msg.ID()), marshal.MustMarshal(db, msg))
}
m := types.NewMap(db, kvPairs...)
fmt.Println("Creating index")
ti := NewTermIndex(db, types.NewMap(db)).Edit()
for _, msg := range msgs {
terms := GetTerms(msg)
ti.InsertAll(terms, types.String(msg.ID()))
}
termDocs := ti.Value().TermDocs
fmt.Println("Creating users")
users := topUsers(msgs)
fmt.Println("Docs:", termDocs.Len(), "Users:", len(users))
root := Root{Messages: m, Index: termDocs, Users: users}
nroot := marshal.MustMarshal(db, root)
if ds.HasHead() {
left := ds.HeadValue()
parent := marshal.MustMarshal(db, Root{
Index: types.NewMap(db),
Messages: types.NewMap(db),
})
fmt.Println("Merging data")
nroot, err = merge.ThreeWay(left, nroot, parent, db, nil, nil)
fmt.Println("Merging complete")
d.Chk.NoError(err)
}
fmt.Println("Committing data")
_, err = db.CommitValue(ds, nroot)
return err
}
func extractDialog(n *html.Node) {
if c := characterName(n); c != "" {
//fmt.Println("Character:", character)
character = c
return
}
if character != "" && n.Type == html.TextNode {
//fmt.Println("Dialog:", strings.TrimSpace(n.Data))
msg := Message{
Ordinal: uint64(len(msgs)),
Author: character,
Body: strings.TrimSpace(n.Data),
ClientTime: datetime.Now(),
}
msgs = append(msgs, msg)
character = ""
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
extractDialog(c)
}
}
func characterName(n *html.Node) string {
if n.Type != html.ElementNode ||
n.Data != "b" ||
n.FirstChild == nil {
return ""
}
if hasSpaces, _ := regexp.MatchString(`^\s+[^\s]`, n.FirstChild.Data); !hasSpaces {
return ""
}
return strings.TrimSpace(n.FirstChild.Data)
}
type cpair struct {
character string
cnt int
}
func topUsers(msgs []Message) []string {
userpat := regexp.MustCompile(`^[a-zA-Z][a-zA-Z\s]*\d*$`)
usermap := map[string]int{}
for _, msg := range msgs {
name := strings.TrimSpace(msg.Author)
if userpat.MatchString(name) {
usermap[name] += 1
}
}
pairs := []cpair{}
for name, cnt := range usermap {
if len(name) > 1 && !strings.HasPrefix(name, "ANOTHER") {
pairs = append(pairs, cpair{character: strings.ToLower(name), cnt: cnt})
}
}
// sort descending by cnt
sort.Slice(pairs, func(i, j int) bool {
return pairs[j].cnt < pairs[i].cnt
})
users := []string{}
for i, p := range pairs {
if i >= 30 {
break
}
users = append(users, p.character)
}
sort.Strings(users)
return users
}
-21
View File
@@ -1,21 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package lib
import (
"fmt"
"log"
"os"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/samples/go/decent/dbg"
)
func NewLogger(username string) *log.Logger {
f, err := os.OpenFile(dbg.Filepath, os.O_RDWR|os.O_CREATE|os.O_APPEND, 0644)
d.PanicIfError(err)
prefix := fmt.Sprintf("%d-%s: ", os.Getpid(), username)
return log.New(f, prefix, 0644)
}
-181
View File
@@ -1,181 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package lib
import (
"fmt"
"regexp"
"strings"
"time"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/marshal"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/datetime"
"github.com/attic-labs/noms/samples/go/decent/dbg"
)
type Root struct {
// Map<Key, Message>
// Keys are strings like: <Ordinal>,<Author>
// This scheme allows:
// - map is naturally sorted in the right order
// - conflicts will generally be avoided
// - messages are editable
Messages types.Map
Index types.Map
Users []string `noms:",set"`
}
type Message struct {
Ordinal uint64
Author string
Body string
ClientTime datetime.DateTime
}
func (m Message) ID() string {
return fmt.Sprintf("%020x/%s", m.ClientTime.UnixNano(), m.Author)
}
func AddMessage(body string, author string, clientTime time.Time, ds datas.Dataset) (datas.Dataset, error) {
defer dbg.BoxF("AddMessage, body: %s", body)()
root, err := getRoot(ds)
if err != nil {
return datas.Dataset{}, err
}
db := ds.Database()
nm := Message{
Author: author,
Body: body,
ClientTime: datetime.DateTime{clientTime},
Ordinal: root.Messages.Len(),
}
root.Messages = root.Messages.Edit().Set(types.String(nm.ID()), marshal.MustMarshal(db, nm)).Map()
IndexNewMessage(db, &root, nm)
newRoot := marshal.MustMarshal(db, root)
ds, err = db.CommitValue(ds, newRoot)
return ds, err
}
func InitDatabase(ds datas.Dataset) (datas.Dataset, error) {
if ds.HasHead() {
return ds, nil
}
db := ds.Database()
root := Root{
Index: types.NewMap(db),
Messages: types.NewMap(db),
}
return db.CommitValue(ds, marshal.MustMarshal(db, root))
}
func GetAuthors(ds datas.Dataset) []string {
r, err := getRoot(ds)
d.PanicIfError(err)
return r.Users
}
func IndexNewMessage(vrw types.ValueReadWriter, root *Root, m Message) {
defer dbg.BoxF("IndexNewMessage")()
ti := NewTermIndex(vrw, root.Index)
id := types.String(m.ID())
root.Index = ti.Edit().InsertAll(GetTerms(m), id).Value().TermDocs
root.Users = append(root.Users, m.Author)
}
func SearchIndex(ds datas.Dataset, search []string) types.Map {
root, err := getRoot(ds)
d.PanicIfError(err)
idx := root.Index
ti := NewTermIndex(ds.Database(), idx)
ids := ti.Search(search)
dbg.Debug("search for: %s, returned: %d", strings.Join(search, " "), ids.Len())
return ids
}
var (
punctPat = regexp.MustCompile("[[:punct:]]+")
wsPat = regexp.MustCompile("\\s+")
)
func TermsFromString(s string) []string {
s1 := punctPat.ReplaceAllString(strings.TrimSpace(s), " ")
terms := wsPat.Split(s1, -1)
clean := []string{}
for _, t := range terms {
if t == "" {
continue
}
clean = append(clean, strings.ToLower(t))
}
return clean
}
func GetTerms(m Message) []string {
terms := TermsFromString(m.Body)
terms = append(terms, TermsFromString(m.Author)...)
return terms
}
func ListMessages(ds datas.Dataset, searchIds *types.Map, doneChan chan struct{}) (msgMap types.Map, mc chan types.String, err error) {
//dbg.Debug("##### listMessages: entered")
root, err := getRoot(ds)
db := ds.Database()
if err != nil {
return types.NewMap(db), nil, err
}
msgMap = root.Messages
mc = make(chan types.String)
done := false
go func() {
<-doneChan
done = true
<-mc
//dbg.Debug("##### listMessages: exiting 'done' goroutine")
}()
go func() {
keyMap := msgMap
if searchIds != nil {
keyMap = *searchIds
}
i := uint64(0)
for ; i < keyMap.Len() && !done; i++ {
key, _ := keyMap.At(keyMap.Len() - i - 1)
mc <- key.(types.String)
}
//dbg.Debug("##### listMessages: exiting 'for loop' goroutine, examined: %d", i)
close(mc)
}()
return
}
func getRoot(ds datas.Dataset) (Root, error) {
defer dbg.BoxF("getRoot")()
db := ds.Database()
root := Root{
Messages: types.NewMap(db),
Index: types.NewMap(db),
}
// TODO: It would be nice if Dataset.MaybeHeadValue() or HeadValue()
// would return just <value>, and it would be nil if not there, so you
// could chain calls.
if !ds.HasHead() {
return root, nil
}
err := marshal.Unmarshal(ds.HeadValue(), &root)
if err != nil {
return Root{}, err
}
return root, nil
}
-69
View File
@@ -1,69 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package lib
import (
"testing"
"time"
"github.com/attic-labs/noms/go/chunks"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/marshal"
"github.com/attic-labs/noms/go/util/datetime"
"github.com/stretchr/testify/assert"
)
func TestBasics(t *testing.T) {
a := assert.New(t)
db := datas.NewDatabase(chunks.NewMemoryStoreFactory().CreateStore(""))
ds := db.GetDataset("foo")
ml, err := getAllMessages(ds)
a.NoError(err)
a.Equal(0, len(ml))
ds, err = AddMessage("body1", "aa", time.Unix(0, 0), ds)
a.NoError(err)
ml, err = getAllMessages(ds)
a.NoError(err)
expected := []Message{
Message{
Author: "aa",
Body: "body1",
ClientTime: datetime.DateTime{time.Unix(0, 0)},
Ordinal: 0,
},
}
a.Equal(expected, ml)
ds, err = AddMessage("body2", "bob", time.Unix(1, 0), ds)
a.NoError(err)
ml, err = getAllMessages(ds)
expected = append(
[]Message{
Message{
Author: "bob",
Body: "body2",
ClientTime: datetime.DateTime{time.Unix(1, 0)},
Ordinal: 1,
},
},
expected...,
)
a.NoError(err)
a.Equal(expected, ml)
}
func getAllMessages(ds datas.Dataset) (r []Message, err error) {
doneChan := make(chan struct{})
mm, keys, _ := ListMessages(ds, nil, doneChan)
for k := range keys {
mv := mm.Get(k)
var m Message
marshal.MustUnmarshal(mv, &m)
r = append(r, m)
}
doneChan <- struct{}{}
return r, nil
}
-90
View File
@@ -1,90 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package lib
import (
"context"
"encoding/json"
"sync"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/hash"
"github.com/attic-labs/noms/samples/go/decent/dbg"
"github.com/ipfs/go-ipfs/core"
"github.com/jbenet/go-base58"
)
var (
PubsubUser = "default"
seenHash = map[hash.Hash]bool{}
seenHashMutex = sync.Mutex{}
)
func lockSeenF() func() {
seenHashMutex.Lock()
return func() {
seenHashMutex.Unlock()
}
}
// RecieveMessages listens for messages sent by other chat nodes. It filters out
// any msgs that have already been received and adds events to teh events channel
// for any msgs that it hasn't seen yet.
func ReceiveMessages(node *core.IpfsNode, events chan ChatEvent, cInfo ClientInfo) {
sub, err := node.Floodsub.Subscribe(cInfo.Topic)
d.Chk.NoError(err)
listenForAndHandleMessage := func() {
msg, err := sub.Next(context.Background())
d.PanicIfError(err)
sender := base58.Encode(msg.From)
msgMap := map[string]string{}
err = json.Unmarshal(msg.Data, &msgMap)
if err != nil {
dbg.Debug("ReceiveMessages: received non-json msg: %s from: %s, error: %s", msg.Data, sender, err)
return
}
msgData := msgMap["data"]
h, err := cInfo.Delegate.HashFromMsgData(msgData)
if err != nil {
dbg.Debug("ReceiveMessages: received unknown msg: %s from: %s", msgData, sender)
return
}
defer lockSeenF()()
if !seenHash[h] {
events <- ChatEvent{EventType: SyncEvent, Event: msgData}
seenHash[h] = true
dbg.Debug("got msgData: %s from: %s(%s)", msgData, sender, msgMap["user"])
}
}
dbg.Debug("start listening for msgs on channel: %s", cInfo.Topic)
for {
listenForAndHandleMessage()
}
panic("unreachable")
}
// Publish asks the delegate to format a hash/ClientInfo into a suitable msg
// and publishes that using IPFS pubsub.
func Publish(node *core.IpfsNode, cInfo ClientInfo, h hash.Hash) {
defer func() {
if r := recover(); r != nil {
dbg.Debug("Publish failed, error: %s", r)
}
}()
msgData := cInfo.Delegate.GenMessageData(cInfo, h)
m, err := json.Marshal(map[string]string{"user": cInfo.Username, "data": msgData})
if err != nil {
}
d.PanicIfError(err)
dbg.Debug("publishing to topic: %s, msg: %s", cInfo.Topic, m)
node.Floodsub.Publish(cInfo.Topic, append(m, []byte("\r\n")...))
defer lockSeenF()()
seenHash[h] = true
}
-120
View File
@@ -1,120 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package lib
import (
"sync"
"github.com/attic-labs/noms/go/types"
)
type TermIndex struct {
TermDocs types.Map
vrw types.ValueReadWriter
}
func NewTermIndex(vrw types.ValueReadWriter, TermDocs types.Map) TermIndex {
return TermIndex{TermDocs, vrw}
}
func (ti TermIndex) Edit() *TermIndexEditor {
return &TermIndexEditor{ti.TermDocs.Edit(), ti.vrw}
}
func (ti TermIndex) Search(terms []string) types.Map {
seen := make(map[string]struct{}, len(terms))
iters := make([]types.SetIterator, 0, len(terms))
wg := sync.WaitGroup{}
idx := 0
for _, t := range terms {
if _, ok := seen[t]; ok {
continue
}
seen[t] = struct{}{}
iters = append(iters, nil)
i := idx
t := t
wg.Add(1)
go func() {
ts := ti.TermDocs.Get(types.String(t))
if ts != nil {
iter := ts.(types.Set).Iterator()
iters[i] = iter
}
wg.Done()
}()
idx++
}
wg.Wait()
var si types.SetIterator
for _, iter := range iters {
if iter == nil {
return types.NewMap(ti.vrw) // at least one term had no hits
}
if si == nil {
si = iter // first iter
continue
}
si = types.NewIntersectionIterator(si, iter)
}
ch := make(chan types.Value)
rch := types.NewStreamingMap(ti.vrw, ch)
for next := si.Next(); next != nil; next = si.Next() {
ch <- next
ch <- types.Bool(true)
}
close(ch)
return <-rch
}
type TermIndexEditor struct {
terms *types.MapEditor
vrw types.ValueReadWriter
}
// Builds a new TermIndex
func (te *TermIndexEditor) Value() TermIndex {
return TermIndex{te.terms.Map(), te.vrw}
}
// Indexes |v| by |term|
func (te *TermIndexEditor) Insert(term string, v types.Value) *TermIndexEditor {
tv := types.String(term)
hitSet := te.terms.Get(tv)
if hitSet == nil {
hitSet = types.NewSet(te.vrw)
}
hsEd, ok := hitSet.(*types.SetEditor)
if !ok {
hsEd = hitSet.(types.Set).Edit()
te.terms.Set(tv, hsEd)
}
hsEd.Insert(v)
return te
}
// Indexes |v| by each unique term in |terms| (tolerates duplicate terms)
func (te *TermIndexEditor) InsertAll(terms []string, v types.Value) *TermIndexEditor {
visited := map[string]struct{}{}
for _, term := range terms {
if _, ok := visited[term]; ok {
continue
}
visited[term] = struct{}{}
te.Insert(term, v)
}
return te
}
// TODO: te.Remove
-57
View File
@@ -1,57 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package lib
import (
"strings"
"testing"
"github.com/attic-labs/noms/go/chunks"
"github.com/attic-labs/noms/go/types"
"github.com/stretchr/testify/assert"
)
func TestRun(t *testing.T) {
a := assert.New(t)
storage := &chunks.MemoryStorage{}
vs := types.NewValueStore(storage.NewView())
defer vs.Close()
docs := []struct {
terms string
id int
}{
{"foo bar baz", 1},
{"foo baz", 2},
{"baz bat boo", 3},
}
indexEditor := NewTermIndex(vs, types.NewMap(vs)).Edit()
for _, doc := range docs {
indexEditor.InsertAll(strings.Split(doc.terms, " "), types.Float(doc.id))
}
index := indexEditor.Value()
getMap := func(keys ...int) types.Map {
m := types.NewMap(vs).Edit()
for _, k := range keys {
m.Set(types.Float(k), types.Bool(true))
}
return m.Map()
}
test := func(search string, expect types.Map) {
actual := index.Search(strings.Split(search, " "))
a.True(expect.Equals(actual))
}
test("foo", getMap(1, 2))
test("baz", getMap(1, 2, 3))
test("bar baz", getMap(1))
test("boo", getMap(3))
test("blarg", getMap())
}
-356
View File
@@ -1,356 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package lib
import (
"fmt"
"regexp"
"runtime"
"strings"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/math"
"github.com/attic-labs/noms/samples/go/decent/dbg"
"github.com/jroimartin/gocui"
)
const (
allViews = ""
usersView = "users"
messageView = "messages"
inputView = "input"
linestofetch = 50
searchPrefix = "/s"
quitPrefix = "/q"
)
type TermUI struct {
Gui *gocui.Gui
InSearch bool
lines []string
dp *dataPager
}
var (
viewNames = []string{usersView, messageView, inputView}
firstLayout = true
)
func CreateTermUI(events chan ChatEvent) *TermUI {
g, err := gocui.NewGui(gocui.Output256)
d.PanicIfError(err)
g.Highlight = true
g.SelFgColor = gocui.ColorGreen
g.Cursor = true
relayout := func(g *gocui.Gui) error {
return layout(g)
}
g.SetManagerFunc(relayout)
termUI := new(TermUI)
termUI.Gui = g
d.PanicIfError(g.SetKeybinding(allViews, gocui.KeyF1, gocui.ModNone, debugInfo(termUI)))
d.PanicIfError(g.SetKeybinding(allViews, gocui.KeyCtrlC, gocui.ModNone, quit))
d.PanicIfError(g.SetKeybinding(allViews, gocui.KeyCtrlC, gocui.ModAlt, quitWithStack))
d.PanicIfError(g.SetKeybinding(allViews, gocui.KeyTab, gocui.ModNone, nextView))
d.PanicIfError(g.SetKeybinding(messageView, gocui.KeyArrowUp, gocui.ModNone, arrowUp(termUI)))
d.PanicIfError(g.SetKeybinding(messageView, gocui.KeyArrowDown, gocui.ModNone, arrowDown(termUI)))
d.PanicIfError(g.SetKeybinding(inputView, gocui.KeyEnter, gocui.ModNone, func(g *gocui.Gui, v *gocui.View) (err error) {
defer func() {
v.Clear()
v.SetCursor(0, 0)
msgView, err := g.View(messageView)
d.PanicIfError(err)
msgView.Title = "messages"
msgView.Autoscroll = true
}()
buf := strings.TrimSpace(v.Buffer())
if strings.HasPrefix(buf, searchPrefix) {
events <- ChatEvent{EventType: SearchEvent, Event: strings.TrimSpace(buf[len(searchPrefix):])}
return
}
if strings.HasPrefix(buf, quitPrefix) {
err = gocui.ErrQuit
return
}
events <- ChatEvent{EventType: InputEvent, Event: buf}
return
}))
return termUI
}
func (t *TermUI) Close() {
dbg.Debug("Closing gui")
t.Gui.Close()
}
func (t *TermUI) UpdateMessagesFromSync(ds datas.Dataset) {
if t.InSearch || !t.textScrolledToEnd() {
t.Gui.Execute(func(g *gocui.Gui) (err error) {
updateViewTitle(g, messageView, "messages (NEW!)")
return
})
} else {
t.UpdateMessagesAsync(ds, nil, nil)
}
}
func (t *TermUI) Layout() error {
return layout(t.Gui)
}
func layout(g *gocui.Gui) error {
maxX, maxY := g.Size()
if v, err := g.SetView(usersView, 0, 0, 25, maxY-1); err != nil {
if err != gocui.ErrUnknownView {
return err
}
v.Title = usersView
v.Wrap = false
v.Editable = false
}
if v, err := g.SetView(messageView, 25, 0, maxX-1, maxY-2-1); err != nil {
if err != gocui.ErrUnknownView {
return err
}
v.Title = messageView
v.Editable = false
v.Wrap = true
v.Autoscroll = true
return nil
}
if v, err := g.SetView(inputView, 25, maxY-2-1, maxX-1, maxY-1); err != nil {
if err != gocui.ErrUnknownView {
return err
}
v.Wrap = true
v.Editable = true
v.Autoscroll = true
}
if firstLayout {
firstLayout = false
g.SetCurrentView(inputView)
dbg.Debug("started up")
}
return nil
}
func (t *TermUI) UpdateMessages(ds datas.Dataset, filterIds *types.Map, terms []string) error {
defer dbg.BoxF("updateMessages")()
t.ResetAuthors(ds)
v, err := t.Gui.View(messageView)
d.PanicIfError(err)
v.Clear()
t.lines = []string{}
v.SetOrigin(0, 0)
_, winHeight := v.Size()
if t.dp != nil {
t.dp.Close()
}
doneChan := make(chan struct{})
msgMap, msgKeyChan, err := ListMessages(ds, filterIds, doneChan)
d.PanicIfError(err)
t.dp = NewDataPager(ds, msgKeyChan, doneChan, msgMap, terms)
t.lines, _ = t.dp.Prepend(t.lines, math.MaxInt(linestofetch, winHeight+10))
for _, s := range t.lines {
fmt.Fprintf(v, "%s\n", s)
}
return nil
}
func (t *TermUI) ResetAuthors(ds datas.Dataset) {
v, err := t.Gui.View(usersView)
d.PanicIfError(err)
v.Clear()
for _, u := range GetAuthors(ds) {
fmt.Fprintln(v, u)
}
}
func (t *TermUI) UpdateMessagesAsync(ds datas.Dataset, sids *types.Map, terms []string) {
t.Gui.Execute(func(_ *gocui.Gui) error {
err := t.UpdateMessages(ds, sids, terms)
d.PanicIfError(err)
return nil
})
}
func (t *TermUI) scrollView(v *gocui.View, dy int) {
// Get the size and position of the view.
lineCnt := len(t.lines)
_, windowHeight := v.Size()
ox, oy := v.Origin()
cx, cy := v.Cursor()
// maxCy will either be the height of the screen - 1, or in the case that
// the there aren't enough lines to fill the screen, it will be the
// lineCnt - origin
newCy := cy + dy
maxCy := math.MinInt(lineCnt-oy, windowHeight-1)
// If the newCy doesn't require scrolling, then just move the cursor.
if newCy >= 0 && newCy < maxCy {
v.MoveCursor(cx, dy, false)
return
}
// If the cursor is already at the bottom of the screen and there are no
// lines left to scroll up, then we're at the bottom.
if newCy >= maxCy && oy >= lineCnt-windowHeight {
// Set autoscroll to normal again.
v.Autoscroll = true
} else {
// The cursor is already at the bottom or top of the screen so scroll
// the text
v.Autoscroll = false
v.SetOrigin(ox, oy+dy)
}
}
func quit(_ *gocui.Gui, _ *gocui.View) error {
dbg.Debug("QUITTING #####")
return gocui.ErrQuit
}
func quitWithStack(_ *gocui.Gui, _ *gocui.View) error {
dbg.Debug("QUITTING WITH STACK")
stacktrace := make([]byte, 1024*1024)
length := runtime.Stack(stacktrace, true)
dbg.Debug(string(stacktrace[:length]))
return gocui.ErrQuit
}
func arrowUp(t *TermUI) func(*gocui.Gui, *gocui.View) error {
return func(_ *gocui.Gui, v *gocui.View) error {
lineCnt := len(t.lines)
ox, oy := v.Origin()
if oy == 0 {
var ok bool
t.lines, ok = t.dp.Prepend(t.lines, linestofetch)
if ok {
v.Clear()
for _, s := range t.lines {
fmt.Fprintf(v, "%s\n", s)
}
c1 := len(t.lines)
v.SetOrigin(ox, c1-lineCnt)
}
}
t.scrollView(v, -1)
return nil
}
}
func arrowDown(t *TermUI) func(*gocui.Gui, *gocui.View) error {
return func(_ *gocui.Gui, v *gocui.View) error {
t.scrollView(v, 1)
return nil
}
}
func debugInfo(t *TermUI) func(*gocui.Gui, *gocui.View) error {
return func(g *gocui.Gui, _ *gocui.View) error {
msgView, _ := g.View(messageView)
w, h := msgView.Size()
dbg.Debug("info, window size:(%d, %d), lineCnt: %d", w, h, len(t.lines))
cx, cy := msgView.Cursor()
ox, oy := msgView.Origin()
dbg.Debug("info, origin: (%d,%d), cursor: (%d,%d)", ox, oy, cx, cy)
dbg.Debug("info, view buffer:\n%s", highlightTerms(viewBuffer(msgView), t.dp.terms))
return nil
}
}
func viewBuffer(v *gocui.View) string {
buf := strings.TrimSpace(v.ViewBuffer())
if len(buf) > 0 && buf[len(buf)-1] != byte('\n') {
buf = buf + "\n"
}
return buf
}
func nextView(g *gocui.Gui, v *gocui.View) (err error) {
nextName := nextViewName(v.Name())
if _, err = g.SetCurrentView(nextName); err != nil {
return
}
_, err = g.SetViewOnTop(nextName)
return
}
func nextViewName(currentView string) string {
for i, viewname := range viewNames {
if currentView == viewname {
return viewNames[(i+1)%len(viewNames)]
}
}
return viewNames[0]
}
func (t *TermUI) textScrolledToEnd() bool {
v, err := t.Gui.View(messageView)
if err != nil {
// doubt this will ever happen, if it does just assume we're at bottom
return true
}
_, oy := v.Origin()
_, h := v.Size()
lc := len(t.lines)
dbg.Debug("textScrolledToEnd, oy: %d, h: %d, lc: %d, lc-oy: %d, res: %t", oy, h, lc, lc-oy, lc-oy <= h)
return lc-oy <= h
}
func updateViewTitle(g *gocui.Gui, viewname, title string) (err error) {
v, err := g.View(viewname)
if err != nil {
return
}
v.Title = title
return
}
var bgColors, fgColors = genColors()
func genColors() ([]string, []string) {
bg, fg := []string{}, []string{}
for i := 1; i <= 9; i++ {
// skip dark blue & white
if i != 4 && i != 7 {
bg = append(bg, fmt.Sprintf("\x1b[48;5;%dm\x1b[30m%%s\x1b[0m", i))
fg = append(fg, fmt.Sprintf("\x1b[38;5;%dm%%s\x1b[0m", i))
}
}
return bg, fg
}
func colorTerm(color int, s string, background bool) string {
c := fgColors[color]
if background {
c = bgColors[color]
}
return fmt.Sprintf(c, s)
}
func highlightTerms(s string, terms []string) string {
for i, t := range terms {
color := i % len(fgColors)
re := regexp.MustCompile(fmt.Sprintf("(?i)%s", regexp.QuoteMeta(t)))
s = re.ReplaceAllStringFunc(s, func(s string) string {
return colorTerm(color, s, false)
})
}
return s
}
-10
View File
@@ -1,10 +0,0 @@
This demo application is the simplest p2p chat app you could build using Noms.
Basic idea:
- Every node runs a Noms HTTP server (port controlled by --port) flag
- Every node broadcasts its current commit and IP/port continuously
- Every node continuously sync/merges with every other node
(note that due to content addressing, most of these syncs will immediately exit)
-144
View File
@@ -1,144 +0,0 @@
// Copyright 2017 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"log"
"net"
"os"
"os/signal"
"path"
"syscall"
"github.com/attic-labs/noms/go/config"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/ipfs"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/util/profile"
"github.com/attic-labs/noms/samples/go/decent/dbg"
"github.com/attic-labs/noms/samples/go/decent/lib"
"github.com/jroimartin/gocui"
"gopkg.in/alecthomas/kingpin.v2"
)
func main() {
// allow short (-h) help
kingpin.CommandLine.HelpFlag.Short('h')
clientCmd := kingpin.Command("client", "runs the ipfs-chat client UI")
clientTopic := clientCmd.Flag("topic", "IPFS pubsub topic to publish and subscribe to").Default("noms-chat-p2p").String()
username := clientCmd.Flag("username", "username to sign in as").Required().String()
nodeIdx := clientCmd.Flag("node-idx", "a single digit to be used as last digit in all port values: api, gateway and swarm (must be 0-9 inclusive)").Default("-1").Int()
clientDir := clientCmd.Arg("path", "local directory to store data in").Required().ExistingDir()
importCmd := kingpin.Command("import", "imports data into a chat")
importSrc := importCmd.Flag("dir", "directory that contains data to import").Default("../data").ExistingDir()
importDir := importCmd.Arg("path", "local directory to store data in").Required().ExistingDir()
kingpin.CommandLine.Help = "A demonstration of using Noms to build a scalable multiuser collaborative application."
switch kingpin.Parse() {
case "client":
cInfo := lib.ClientInfo{
Topic: *clientTopic,
Username: *username,
Idx: *nodeIdx,
IsDaemon: false,
Dir: *clientDir,
Delegate: lib.P2PEventDelegate{},
}
runClient(cInfo)
case "import":
err := lib.RunImport(*importSrc, fmt.Sprintf("%s/noms::chat", *importDir))
d.PanicIfError(err)
}
}
func runClient(cInfo lib.ClientInfo) {
dbg.SetLogger(lib.NewLogger(cInfo.Username))
var err error
httpPort := 8000 + cInfo.Idx
sp, err := spec.ForDatabase(fmt.Sprintf("http://%s:%d", getIP(), httpPort))
d.PanicIfError(err)
cInfo.Spec = sp
<-runServer(path.Join(cInfo.Dir, "noms"), httpPort)
db := cInfo.Spec.GetDatabase()
ds := db.GetDataset("chat")
ds, err = lib.InitDatabase(ds)
d.PanicIfError(err)
node := ipfs.OpenIPFSRepo(path.Join(cInfo.Dir, "ipfs"), cInfo.Idx)
events := make(chan lib.ChatEvent, 1024)
t := lib.CreateTermUI(events)
defer t.Close()
d.PanicIfError(t.Layout())
t.ResetAuthors(ds)
t.UpdateMessages(ds, nil, nil)
go lib.ProcessChatEvents(node, ds, events, t, cInfo)
go lib.ReceiveMessages(node, events, cInfo)
if err := t.Gui.MainLoop(); err != nil && err != gocui.ErrQuit {
dbg.Debug("mainloop has exited, err:", err)
log.Panicln(err)
}
}
func getIP() string {
ifaces, err := net.Interfaces()
d.PanicIfError(err)
for _, i := range ifaces {
addrs, err := i.Addrs()
d.PanicIfError(err)
for _, addr := range addrs {
switch v := addr.(type) {
case *net.IPNet:
if !v.IP.IsLoopback() {
ip := v.IP.To4()
if ip != nil {
return v.IP.String()
}
}
}
}
}
d.Panic("notreached")
return ""
}
func runServer(atPath string, port int) (ready chan struct{}) {
ready = make(chan struct{})
_ = os.Mkdir(atPath, 0755)
cfg := config.NewResolver()
cs, err := cfg.GetChunkStore(atPath)
d.CheckError(err)
server := datas.NewRemoteDatabaseServer(cs, port)
server.Ready = func() {
ready <- struct{}{}
}
// Shutdown server gracefully so that profile may be written
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
signal.Notify(c, syscall.SIGTERM)
go func() {
<-c
server.Stop()
}()
go func() {
d.Try(func() {
defer profile.MaybeStartProfile().Stop()
server.Run()
})
}()
return
}
-1
View File
@@ -1 +0,0 @@
hr
-12
View File
@@ -1,12 +0,0 @@
# HR
This is a small command line application that manages a very simple hypothetical hr database.
## Usage
```shell
go build
./hr --ds /tmp/my-noms::hr add-person 42 Abigail Architect
./hr --ds /tmp/my-noms::hr add-person 43 Samuel "Chief Laser Operator"
./hr --ds /tmp/my-noms::hr list-persons
```
-8
View File
@@ -1,8 +0,0 @@
#!/bin/sh
if [ -d test-data ]; then
mv test-data test-data.bak
fi
./hr --ds test-data::hr add-person 7 "Aaron Boodman" "Chief Evangelism Officer"
./hr --ds test-data::hr add-person 13 "Samuel Boodman" "VP, Culture"
-117
View File
@@ -1,117 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"os"
"strconv"
"github.com/attic-labs/noms/go/config"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/marshal"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/verbose"
flag "github.com/juju/gnuflag"
)
func main() {
var dsStr = flag.String("ds", "", "noms dataset to read/write from")
flag.Usage = func() {
fmt.Fprintf(os.Stderr, "Usage: %s [flags] [command] [command-args]\n\n", os.Args[0])
fmt.Fprintln(os.Stderr, "Flags:")
flag.PrintDefaults()
fmt.Fprintln(os.Stderr, "\nCommands:")
fmt.Fprintln(os.Stderr, "\tadd-person <id> <name> <title>")
fmt.Fprintln(os.Stderr, "\tlist-persons")
}
verbose.RegisterVerboseFlags(flag.CommandLine)
flag.Parse(true)
if flag.NArg() == 0 {
fmt.Fprintln(os.Stderr, "Not enough arguments")
return
}
if *dsStr == "" {
fmt.Fprintln(os.Stderr, "Required flag '--ds' not set")
return
}
cfg := config.NewResolver()
db, ds, err := cfg.GetDataset(*dsStr)
if err != nil {
fmt.Fprintf(os.Stderr, "Could not create dataset: %s\n", err)
return
}
defer db.Close()
switch flag.Arg(0) {
case "add-person":
addPerson(db, ds)
case "list-persons":
listPersons(ds)
default:
fmt.Fprintf(os.Stderr, "Unknown command: %s\n", flag.Arg(0))
}
}
type Person struct {
Name, Title string
Id uint64
}
func addPerson(db datas.Database, ds datas.Dataset) {
if flag.NArg() != 4 {
fmt.Fprintln(os.Stderr, "Not enough arguments for command add-person")
return
}
id, err := strconv.ParseUint(flag.Arg(1), 10, 64)
if err != nil {
fmt.Fprintf(os.Stderr, "Invalid person-id: %s", flag.Arg(1))
return
}
np, err := marshal.Marshal(db, Person{flag.Arg(2), flag.Arg(3), id})
if err != nil {
fmt.Fprintln(os.Stderr, err)
return
}
_, err = db.CommitValue(ds, getPersons(ds).Edit().Set(types.Float(id), np).Map())
if err != nil {
fmt.Fprintf(os.Stderr, "Error committing: %s\n", err)
return
}
}
func listPersons(ds datas.Dataset) {
d := getPersons(ds)
if d.Empty() {
fmt.Println("No people found")
return
}
d.IterAll(func(k, v types.Value) {
var p Person
err := marshal.Unmarshal(v, &p)
if err != nil {
fmt.Fprintln(os.Stderr, err)
return
}
fmt.Printf("%s (id: %d, title: %s)\n", p.Name, p.Id, p.Title)
})
}
func getPersons(ds datas.Dataset) types.Map {
hv, ok := ds.MaybeHeadValue()
if ok {
return hv.(types.Map)
}
return types.NewMap(ds.Database())
}
-61
View File
@@ -1,61 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"path"
"runtime"
"testing"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/util/clienttest"
"github.com/stretchr/testify/suite"
)
func TestBasics(t *testing.T) {
suite.Run(t, &testSuite{})
}
type testSuite struct {
clienttest.ClientTestSuite
}
func (s *testSuite) TestRoundTrip() {
spec := spec.CreateValueSpecString("nbs", s.DBDir, "hr")
stdout, stderr := s.MustRun(main, []string{"--ds", spec, "list-persons"})
s.Equal("No people found\n", stdout)
s.Equal("", stderr)
stdout, stderr = s.MustRun(main, []string{"--ds", spec, "add-person", "42", "Benjamin Kalman", "Programmer, Barista"})
s.Equal("", stdout)
s.Equal("", stderr)
stdout, stderr = s.MustRun(main, []string{"--ds", spec, "add-person", "43", "Abigail Boodman", "Chief Architect"})
s.Equal("", stdout)
s.Equal("", stderr)
stdout, stderr = s.MustRun(main, []string{"--ds", spec, "list-persons"})
s.Equal(`Benjamin Kalman (id: 42, title: Programmer, Barista)
Abigail Boodman (id: 43, title: Chief Architect)
`, stdout)
s.Equal("", stderr)
}
func (s *testSuite) TestReadCanned() {
_, p, _, _ := runtime.Caller(0)
p = path.Join(path.Dir(p), "test-data")
stdout, stderr := s.MustRun(main, []string{"--ds", spec.CreateValueSpecString("nbs", p, "hr"), "list-persons"})
s.Equal(`Aaron Boodman (id: 7, title: Chief Evangelism Officer)
Samuel Boodman (id: 13, title: VP, Culture)
`, stdout)
s.Equal("", stderr)
}
func (s *testSuite) TestInvalidDatasetSpec() {
// Should not crash
_, _ = s.MustRun(main, []string{"--ds", "invalid-dataset", "list-persons"})
}
View File
-1
View File
@@ -1 +0,0 @@
4:7.18:8s92pdafhd4hkhav6r4748u1rjlosh1k:5b1e9knhol2orv0a8ej6tvelc46jp92l:bsvid54jt8pjto211lcdl14tbfd39jmn:2:998se5i5mf15fld7f318818i6ie0c8rr:2
-1
View File
@@ -1 +0,0 @@
json-import
-98
View File
@@ -1,98 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"encoding/json"
"errors"
"fmt"
"io"
"log"
"net/http"
"os"
"strings"
"time"
"github.com/attic-labs/noms/go/config"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/util/jsontonoms"
"github.com/attic-labs/noms/go/util/progressreader"
"github.com/attic-labs/noms/go/util/status"
"github.com/attic-labs/noms/go/util/verbose"
"github.com/dustin/go-humanize"
flag "github.com/juju/gnuflag"
)
func main() {
performCommit := flag.Bool("commit", true, "commit the data to head of the dataset (otherwise only write the data to the dataset)")
flag.Usage = func() {
fmt.Fprintf(os.Stderr, "usage: %s <url> <dataset>\n", os.Args[0])
flag.PrintDefaults()
}
spec.RegisterCommitMetaFlags(flag.CommandLine)
verbose.RegisterVerboseFlags(flag.CommandLine)
flag.Parse(true)
if len(flag.Args()) != 2 {
d.CheckError(errors.New("expected url and dataset flags"))
}
cfg := config.NewResolver()
db, ds, err := cfg.GetDataset(flag.Arg(1))
d.CheckError(err)
defer db.Close()
url := flag.Arg(0)
if url == "" {
flag.Usage()
}
var r io.Reader
if strings.HasPrefix(url, "http") {
res, err := http.Get(url)
if err != nil {
log.Fatalf("Error fetching %s: %+v\n", url, err)
} else if res.StatusCode != 200 {
log.Fatalf("Error fetching %s: %s\n", url, res.Status)
}
defer res.Body.Close()
r = res.Body
} else {
// assume it's a file
f, err := os.Open(url)
if err != nil {
log.Fatalf("Invalid URL %s - does not start with 'http' and isn't local file either. fopen error: %s", url, err)
}
r = f
}
var jsonObject interface{}
start := time.Now()
r = progressreader.New(r, func(seen uint64) {
elapsed := time.Since(start).Seconds()
rate := uint64(float64(seen) / elapsed)
status.Printf("%s decoded in %ds (%s/s)...", humanize.Bytes(seen), int(elapsed), humanize.Bytes(rate))
})
err = json.NewDecoder(r).Decode(&jsonObject)
if err != nil {
log.Fatalln("Error decoding JSON: ", err)
}
status.Done()
if *performCommit {
additionalMetaInfo := map[string]string{"url": url}
meta, err := spec.CreateCommitMetaStruct(ds.Database(), "", "", additionalMetaInfo, nil)
d.CheckErrorNoUsage(err)
_, err = db.Commit(ds, jsontonoms.NomsValueFromDecodedJSON(db, jsonObject, true), datas.CommitOptions{Meta: meta})
d.PanicIfError(err)
} else {
ref := db.WriteValue(jsontonoms.NomsValueFromDecodedJSON(db, jsonObject, true))
fmt.Fprintf(os.Stdout, "#%s\n", ref.TargetHash().String())
}
}
-55
View File
@@ -1,55 +0,0 @@
# Nomdex
Nomdex demonstrates how Noms maps can be used to index values in a database and provides a simple query language to search for objects.
## Description
This program experiments with using ordinary Noms Maps as indexes. It leverages the fact that Maps in Noms are implemented by prolly-trees which are similar to B-Trees in many important ways that make them ideal for use as indexes. They are balanced, sorted, require relatively few accesses to find any leaf node and efficient to update.
###Building Indexes
Nomdex constructs indexes as Maps that are keyed by either Strings or Numbers. The values in the index are sets of objects. The following command can be used to build an index:
```shell
nomdex up --in-path <absolute noms path> --by <relative noms path> --out-ds <dataset name>
```
The ***'in-path'*** argument must be a ValueSpec(see [Spelling In Noms](../../../doc/spelling.md#spelling-values)) that designates the root of an object hierarchy to be scanned for "indexable" objects.
The ***'by'*** argument must be a relative path. Nomdex traverses every value reachable from 'in-path' and attempts to resolve this relative ***'by'*** path from it. Any value that has a String, Number, or Bool index using the relative attribute as it's key.
The ***'out-ds'*** argument specifies a dataset name that will be used to store the new index.
In addition, there are arguments that allow values to be transformed before using them as keys in the index by applying regex expressions functions. Consult to the help text and code to see how those can be used.
### Queries in Nomdex
Once an index is built, it can be queried against using the nomdex find command. For example, given a database that contains structs of the following type representing cities:
```go
struct Row {
City: String,
State: String,
GeoPos: struct {
Latitude: Number,
Longitude: Number,
}
}
```
The following commands could be used to build indexes on the City, State, Latitude and Longitude attibutes.
```shell
nomdex up --in-path http://localhost:8000::cities --by .City --out-ds by-name
nomdex up --in-path http://localhost:8000::cities --by .State --out-ds by-state
nomdex up --in-path http://localhost:8000::cities --by .GeoPos.Latitude --out-ds by-lat
nomdex up --in-path http://localhost:8000::cities --by .GeoPos.Longitude --out-ds by-lon
```
Once these indexes are created, the following queries could be made using the find command:
```shell
// find all cities in California
nomdex find --db http://localhost:8000 'by-state = "California"'
// find all cities whose name begins with A, B, or C
nomdex find --db http://localhost:8000 'by-name >= "A" and by-name < "D"'
// Find all tropical cities whose name begins with A, B, or C
nomdex find --db http://localhost:8000 '(by-name >= "A" and by-name < "D") and (by-lat >= -23.5 and by-lat <= 23.5)
```
The nomdex query language is simple, it consists of comparison expressions which take the form of '*indexName comparisonOperator constantValue*'. Index names are the dataset given as the ***'out-ds'*** argument to the *build* command. Comparison operators can be one of: <, <=, >, >=, =, !=. Constants are either String values which are quoted: "hi, I'm a string constant", and Numbers which consist of digits and an optional decimal point and minus sign: 1, -1, 2.3, -3.2.
In addition, comparison expressions can be combined using "and" and "or". Parenthesis can, and should be used to express the order that evaluation should take place.
Note: nomdex is not a complete query system. It's purpose is only to illustrate the fact that Noms maps have all the necessary properties to be used as indexes. A complete query system would have many additional features and the ability to optimize queries in an intelligent way.
-205
View File
@@ -1,205 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"bytes"
"fmt"
"io"
"sort"
"github.com/attic-labs/noms/go/types"
)
type expr interface {
ranges() queryRangeSlice
dbgPrintTree(w io.Writer, level int)
indexName() string
iterator(im *indexManager) types.SetIterator
}
// logExpr represents a logical 'and' or 'or' expression between two other expressions.
// e.g. logExpr would represent the and/or expressions in this query:
// (index1 > 0 and index1 < 9) or (index1 > 100 and index < 109)
type logExpr struct {
op boolOp
expr1 expr
expr2 expr
idxName string
}
type compExpr struct {
idxName string
op compOp
v1 types.Value
}
func (le logExpr) indexName() string {
return le.idxName
}
func (le logExpr) iterator(im *indexManager) types.SetIterator {
if le.idxName != "" {
return unionizeIters(iteratorsFromRanges(im.indexes[le.idxName], le.ranges()))
}
i1 := le.expr1.iterator(im)
i2 := le.expr2.iterator(im)
var iter types.SetIterator
switch le.op {
case and:
if i1 == nil || i2 == nil {
return nil
}
iter = types.NewIntersectionIterator(le.expr1.iterator(im), le.expr2.iterator(im))
case or:
if i1 == nil {
return i2
}
if i2 == nil {
return i1
}
iter = types.NewUnionIterator(le.expr1.iterator(im), le.expr2.iterator(im))
}
return iter
}
func (le logExpr) ranges() (ranges queryRangeSlice) {
rslice1 := le.expr1.ranges()
rslice2 := le.expr2.ranges()
rslice := queryRangeSlice{}
switch le.op {
case and:
if len(rslice1) == 0 || len(rslice2) == 0 {
return rslice
}
for _, r1 := range rslice1 {
for _, r2 := range rslice2 {
rslice = append(rslice, r1.and(r2)...)
}
}
sort.Sort(rslice)
return rslice
case or:
if len(rslice1) == 0 {
return rslice2
}
if len(rslice2) == 0 {
return rslice1
}
for _, r1 := range rslice1 {
for _, r2 := range rslice2 {
rslice = append(rslice, r1.or(r2)...)
}
}
sort.Sort(rslice)
return rslice
}
return queryRangeSlice{}
}
func (le logExpr) dbgPrintTree(w io.Writer, level int) {
fmt.Fprintf(w, "%*s%s\n", 2*level, "", le.op)
if le.expr1 != nil {
le.expr1.dbgPrintTree(w, level+1)
}
if le.expr2 != nil {
le.expr2.dbgPrintTree(w, level+1)
}
}
func (re compExpr) indexName() string {
return re.idxName
}
func iteratorsFromRange(index types.Map, rd queryRange) []types.SetIterator {
first := true
iterators := []types.SetIterator{}
index.IterFrom(rd.lower.value, func(k, v types.Value) bool {
if first && rd.lower.value != nil && !rd.lower.include && rd.lower.value.Equals(k) {
return false
}
if rd.upper.value != nil {
if !rd.upper.include && rd.upper.value.Equals(k) {
return true
}
if rd.upper.value.Less(k) {
return true
}
}
s := v.(types.Set)
iterators = append(iterators, s.Iterator())
return false
})
return iterators
}
func iteratorsFromRanges(index types.Map, ranges queryRangeSlice) []types.SetIterator {
iterators := []types.SetIterator{}
for _, r := range ranges {
iterators = append(iterators, iteratorsFromRange(index, r)...)
}
return iterators
}
func unionizeIters(iters []types.SetIterator) types.SetIterator {
if len(iters) == 0 {
return nil
}
if len(iters) <= 1 {
return iters[0]
}
unionIters := []types.SetIterator{}
var iter0 types.SetIterator
for i, iter := range iters {
if i%2 == 0 {
iter0 = iter
} else {
unionIters = append(unionIters, types.NewUnionIterator(iter0, iter))
iter0 = nil
}
}
if iter0 != nil {
unionIters = append(unionIters, iter0)
}
return unionizeIters(unionIters)
}
func (re compExpr) iterator(im *indexManager) types.SetIterator {
index := im.indexes[re.idxName]
iters := iteratorsFromRanges(index, re.ranges())
return unionizeIters(iters)
}
func (re compExpr) ranges() (ranges queryRangeSlice) {
var r queryRange
switch re.op {
case equals:
e := bound{value: re.v1, include: true}
r = queryRange{lower: e, upper: e}
case gt:
r = queryRange{lower: bound{re.v1, false, 0}, upper: bound{nil, true, 1}}
case gte:
r = queryRange{lower: bound{re.v1, true, 0}, upper: bound{nil, true, 1}}
case lt:
r = queryRange{lower: bound{nil, true, -1}, upper: bound{re.v1, false, 0}}
case lte:
r = queryRange{lower: bound{nil, true, -1}, upper: bound{re.v1, true, 0}}
case ne:
return queryRangeSlice{
{lower: bound{nil, true, -1}, upper: bound{re.v1, false, 0}},
{lower: bound{re.v1, false, 0}, upper: bound{nil, true, 1}},
}
}
return queryRangeSlice{r}
}
func (re compExpr) dbgPrintTree(w io.Writer, level int) {
buf := bytes.Buffer{}
types.WriteEncodedValue(&buf, re.v1)
fmt.Fprintf(w, "%*s%s %s %s\n", 2*level, "", re.idxName, re.op, buf.String())
}
-83
View File
@@ -1,83 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"os"
"path"
"github.com/attic-labs/noms/cmd/util"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/util/exit"
flag "github.com/juju/gnuflag"
)
var commands = []*util.Command{
update,
find,
}
var usageLine = `Nomdex builds indexes to support fast data access.`
func main() {
progName := path.Base(os.Args[0])
util.InitHelp(progName, commands, usageLine)
flag.Usage = util.Usage
flag.Parse(false)
args := flag.Args()
if len(args) < 1 {
util.Usage()
return
}
if args[0] == "help" {
util.Help(args[1:])
return
}
for _, cmd := range commands {
if cmd.Name() == args[0] {
flags := cmd.Flags()
flags.Usage = cmd.Usage
flags.Parse(true, args[1:])
args = flags.Args()
if cmd.Nargs != 0 && len(args) < cmd.Nargs {
cmd.Usage()
}
exitCode := cmd.Run(args)
if exitCode != 0 {
exit.Exit(exitCode)
}
return
}
}
fmt.Fprintf(os.Stderr, "noms: unknown command %q\n", args[0])
util.Usage()
}
func printError(err error, msgAndArgs ...interface{}) bool {
if err != nil {
err := d.Unwrap(err)
switch len(msgAndArgs) {
case 0:
fmt.Fprintf(os.Stderr, "error: %s\n", err)
case 1:
fmt.Fprintf(os.Stderr, "%s%s\n", msgAndArgs[0], err)
default:
format, ok := msgAndArgs[0].(string)
if ok {
s1 := fmt.Sprintf(format, msgAndArgs[1:]...)
fmt.Fprintf(os.Stderr, "%s%s\n", s1, err)
} else {
fmt.Fprintf(os.Stderr, "error: %s\n", err)
}
}
}
return err != nil
}
-181
View File
@@ -1,181 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"io"
"os"
"github.com/attic-labs/noms/cmd/util"
"github.com/attic-labs/noms/go/config"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/outputpager"
"github.com/attic-labs/noms/go/util/verbose"
flag "github.com/juju/gnuflag"
)
var longFindHelp = `'nomdex find' retrieves and prints objects that satisfy the 'query' argument.
Indexes are built using the 'nomdex up' command. For information about building
indexes, see: nomdex up -h
Objects that have been indexed can be quickly found using the nomdex query
language. For example, consider objects with the following type:
struct Person {
name String,
geopos struct GeoPos {
latitude Float,
longitude Float,
}
}
Objects of this type can be indexed on the name, latitude and longitude fields
with the following commands:
nomdex up --in-path ~/nomsdb::people.value --by .name --out-ds by-name
nomdex up --in-path ~/nomsdb::people.value --by .geopos.latitude --out-ds by-lat
nomdex up --in-path ~/nomsdb::people.value --by .geopos.longitude --out-ds by-lng
The following query could be used to find all people with an address near the
equator:
nomdex find 'by-lat >= -1.0 and by-lat <= 1.0'
We could also get a list of all people who live near the equator whose name begins with "A":
nomdex find '(by-name >= "A" and by-name < "B") and (by-lat >= -1.0 and by-lat <= 1.0)'
The query language is simple. It currently supports the following relational operators:
<, <=, >, >=, =, !=
Relational expressions are always of the form:
<index> <relational operator> <constant> e.g. personId >= 2000.
Indexes are the name given by the --out-ds argument in the 'nomdex up' command.
Constants are either "strings" (in quotes) or numbers (e.g. 3, 3000, -2, -2.5,
3.147, etc).
Relational expressions can be combined using the "and" and "or" operators.
Parentheses can (and should) be used to ensure that the evaluation is done in
the desired order.
`
var find = &util.Command{
Run: runFind,
UsageLine: "find --db <database spec> <query>",
Short: "Print objects in index that satisfy 'query'",
Long: longFindHelp,
Flags: setupFindFlags,
Nargs: 1,
}
var dbPath = ""
func setupFindFlags() *flag.FlagSet {
flagSet := flag.NewFlagSet("find", flag.ExitOnError)
flagSet.StringVar(&dbPath, "db", "", "database containing index")
outputpager.RegisterOutputpagerFlags(flagSet)
verbose.RegisterVerboseFlags(flagSet)
return flagSet
}
func runFind(args []string) int {
query := args[0]
if dbPath == "" {
fmt.Fprintf(os.Stderr, "Missing required 'index' arg\n")
flag.Usage()
return 1
}
cfg := config.NewResolver()
db, err := cfg.GetDatabase(dbPath)
if printError(err, "Unable to open database\n\terror: ") {
return 1
}
defer db.Close()
im := &indexManager{db: db, indexes: map[string]types.Map{}}
expr, err := parseQuery(query, im)
if err != nil {
fmt.Printf("err: %s\n", err)
return 1
}
pgr := outputpager.Start()
defer pgr.Stop()
iter := expr.iterator(im)
cnt := 0
if iter != nil {
for v := iter.Next(); v != nil; v = iter.Next() {
types.WriteEncodedValue(pgr.Writer, v)
fmt.Fprintf(pgr.Writer, "\n")
cnt++
}
}
fmt.Fprintf(pgr.Writer, "Found %d objects\n", cnt)
return 0
}
func printObjects(w io.Writer, index types.Map, ranges queryRangeSlice) {
cnt := 0
first := true
printObjectForRange := func(index types.Map, r queryRange) {
index.IterFrom(r.lower.value, func(k, v types.Value) bool {
if first && r.lower.value != nil && !r.lower.include && r.lower.value.Equals(k) {
return false
}
if r.upper.value != nil {
if !r.upper.include && r.upper.value.Equals(k) {
return true
}
if r.upper.value.Less(k) {
return true
}
}
s := v.(types.Set)
s.IterAll(func(v types.Value) {
types.WriteEncodedValue(w, v)
fmt.Fprintf(w, "\n")
cnt++
})
return false
})
}
for _, r := range ranges {
printObjectForRange(index, r)
}
fmt.Fprintf(w, "Found %d objects\n", cnt)
}
func openIndex(idxName string, im *indexManager) error {
if _, hasIndex := im.indexes[idxName]; hasIndex {
return nil
}
ds := im.db.GetDataset(idxName)
commit, ok := ds.MaybeHead()
if !ok {
return fmt.Errorf("index '%s' not found", idxName)
}
index, ok := commit.Get(datas.ValueField).(types.Map)
if !ok {
return fmt.Errorf("Value of commit at '%s' is not a valid index", idxName)
}
// Todo: make this type be Map<String | Float>, Set<Value>> once Issue #2326 gets resolved and
// IsSubtype() returns the correct value.
typ := types.MakeMapType(
types.MakeUnionType(types.StringType, types.FloaTType),
types.ValueType)
if !types.IsValueSubtypeOf(index, typ) {
return fmt.Errorf("%s does not point to a suitable index type:", idxName)
}
im.indexes[idxName] = index
return nil
}
-146
View File
@@ -1,146 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"regexp"
"testing"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/marshal"
"github.com/attic-labs/noms/go/nbs"
"github.com/attic-labs/noms/go/spec"
"github.com/attic-labs/noms/go/util/clienttest"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/suite"
)
type TestObj struct {
Key int
Fname string
Lname string
Gender string
Age int
}
type testSuite struct {
clienttest.ClientTestSuite
}
func TestNomdex(t *testing.T) {
suite.Run(t, &testSuite{})
}
func makeTestDb(s *testSuite, dsId string) datas.Database {
db := datas.NewDatabase(nbs.NewLocalStore(s.DBDir, clienttest.DefaultMemTableSize))
l1 := []TestObj{
{1, "will", "smith", "m", 40},
{2, "lana", "turner", "f", 91},
{3, "john", "wayne", "m", 86},
{4, "johnny", "depp", "m", 50},
{5, "merrill", "streep", "f", 60},
{6, "rob", "courdry", "m", 45},
{7, "bruce", "lee", "m", 72},
{8, "bruce", "willis", "m", 36},
{9, "luis", "bunuel", "m", 100},
{10, "andy", "sandberg", "m", 32},
{11, "walter", "coggins", "m", 28},
{12, "seth", "rogan", "m", 29},
}
m1 := map[string]TestObj{
"lg": {13, "lady", "gaga", "f", 39},
"ss": {14, "sam", "smith", "m", 28},
"rp": {15, "robert", "plant", "m", 69},
"ml": {16, "meat", "loaf", "m", 65},
"gf": {17, "glenn", "frey", "m", 60},
"jr": {18, "joey", "ramone", "m", 55},
"rc": {19, "ray", "charles", "m", 72},
"bk": {20, "bb", "king", "m", 77},
"b": {21, "beck", "", "m", 38},
"md": {22, "miles", "davis", "m", 82},
"rd": {23, "roger", "daltry", "m", 62},
"jf": {24, "john", "fogerty", "m", 60},
}
m := map[string]interface{}{"actors": l1, "musicians": m1}
v, err := marshal.Marshal(db, m)
s.NoError(err)
_, err = db.CommitValue(db.GetDataset(dsId), v)
s.NoError(err)
return db
}
func (s *testSuite) TestNomdex() {
dsId := "data"
db := makeTestDb(s, dsId)
s.NotNil(db)
db.Close()
fnameIdx := "fname-idx"
dataSpec := spec.CreateValueSpecString("nbs", s.DBDir, dsId)
dbSpec := spec.CreateDatabaseSpecString("nbs", s.DBDir)
stdout, stderr := s.MustRun(main, []string{"up", "--out-ds", fnameIdx, "--in-path", dataSpec, "--by", ".fname"})
s.Contains(stdout, "Indexed 24 objects")
s.Equal("", stderr)
genderIdx := "gender-idx"
stdout, stderr = s.MustRun(main, []string{"up", "--out-ds", genderIdx, "--in-path", dataSpec, "--by", ".gender"})
s.Contains(stdout, "Indexed 24 objects")
s.Equal("", stderr)
stdout, stderr = s.MustRun(main, []string{"find", "--db", dbSpec, `fname-idx = "lady"`})
s.Contains(stdout, "Found 1 objects")
s.Equal("", stderr)
stdout, stderr = s.MustRun(main, []string{"find", "--db", dbSpec, `fname-idx = "lady" and gender-idx = "f"`})
s.Contains(stdout, "Found 1 objects")
s.Equal("", stderr)
stdout, stderr = s.MustRun(main, []string{"find", "--db", dbSpec, `fname-idx != "lady" and gender-idx != "m"`})
s.Contains(stdout, "Found 2 objects")
s.Equal("", stderr)
stdout, stderr = s.MustRun(main, []string{"find", "--db", dbSpec, `fname-idx != "lady" and fname-idx != "john"`})
s.Contains(stdout, "Found 21 objects")
s.Equal("", stderr)
stdout, stderr = s.MustRun(main, []string{"find", "--db", dbSpec, `fname-idx != "lady" or gender-idx != "f"`})
s.Contains(stdout, "Found 23 objects")
s.Equal("", stderr)
}
func TestTransform(t *testing.T) {
assert := assert.New(t)
tcs := [][]string{
[]string{`"01/02/2003"`, "\"(\\d{2})/(\\d{2})/(\\d{4})\"", "$3/$2/$1", "2003/02/01"},
}
for _, tc := range tcs {
base, regex, replace, expected := tc[0], tc[1], tc[2], tc[3]
testRe := regexp.MustCompile(regex)
result := testRe.ReplaceAllString(base, replace)
assert.Equal(expected, result)
}
tcs = [][]string{
[]string{"343 STATE ST\nROCHESTER, NY 14650\n(43.161276, -77.619386)", "43.161276", "-77.619386"},
[]string{"TWO EMBARCADERO CENTER\nPROMENADE LEVEL SAN FRANCISCO, CA 94111\n", "", ""},
}
findLatRe := regexp.MustCompile("(?s)\\(([\\d.]+)")
findLngRe := regexp.MustCompile("(?s)(-?[\\d.]+)\\)")
for _, tc := range tcs {
base, expectedLat, expectedLng := tc[0], tc[1], tc[2]
lat := findLatRe.FindStringSubmatch(base)
assert.True(len(lat) == 0 && expectedLat == "" || (len(lat) == 2 && expectedLat == lat[1]))
lng := findLngRe.FindStringSubmatch(base)
assert.True(len(lng) == 0 && expectedLng == "" || (len(lng) == 2 && expectedLng == lng[1]))
}
}
-221
View File
@@ -1,221 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"os"
"regexp"
"strconv"
"sync"
"sync/atomic"
"github.com/attic-labs/noms/cmd/util"
"github.com/attic-labs/noms/go/config"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/hash"
"github.com/attic-labs/noms/go/types"
"github.com/attic-labs/noms/go/util/profile"
"github.com/attic-labs/noms/go/util/status"
"github.com/attic-labs/noms/go/util/verbose"
humanize "github.com/dustin/go-humanize"
flag "github.com/juju/gnuflag"
)
var (
inPathArg = ""
outDsArg = ""
relPathArg = ""
txRegexArg = ""
txReplaceArg = ""
txConvertArg = ""
)
var longUpHelp = `'nomdex up' builds indexes that are useful for rapidly accessing objects.
This sample tool can index objects based on any string or number attribute of that
object. The 'up' command works by scanning all the objects reachable from the --in-path
command line argument. It tests the object to determine if there is a string or number
value reachable by applying the --by path argument to the object. If so, the object is
added to the index under that value.
For example, if there are objects in the database that contain a personId and a
gender field, 'nomdex up' can scan all the objects in a given dataset and build
an index on the specified field with the following commands:
nomdex up --in-path <dsSpec>.value --by .gender --out-ds gender-index
nomdex up --in-path <dsSpec>.value --by .address.city --out-ds personId-index
The previous commands can be understood as follows. The first command updates or
builds an index by scanning all the objects that are reachable from |in-path| that
have a string or number value reachable using |by| and stores the root of the
resulting index in a dataset specified by |out-ds|.
Notice that the --in-path argument has a value of '<dsSpec>.value'. The '.value'
is not strictly necessary but it's normally useful when indexing. Since datasets
generally point to Commit objects in Noms, they usually have parents which are
previous versions of the data. If you add .value to the end of the dataset, only
the most recent version of the data will be indexed. Without the '.value' all
objects in all previous commits will also be indexed which is most often not what
is expected.
There are three additional commands that can be useful for transforming the value
being indexed:
* tx-replace: used to modify behavior of tx-regex, see below
* tx-regex: the behavior for this argument depends on whether a tx-replace argument
is present. If so, the go routine "regexp.ReplaceAllString() is called:
txRe := regex.MustCompile(|tx-regex|)
txRe.ReplaceAllString(|index value|, |tx-replace|
If tx-replace is not present then the following call is made on each value:
txRe := regex.MustCompile(|tx-regex|)
regex.FindStringSubmatch(|index value|)
*tx-convert: attempts to convert the index value to the type specified.
Currently the only value accepted for this arg is 'number'
The resulting indexes can be used by the 'nomdex find command' for help on that
see: nomdex find -h
`
var update = &util.Command{
Run: runUpdate,
UsageLine: "up --in-path <path> --out-ds <dspath> --by <relativepath>",
Short: "Build/Update an index",
Long: longUpHelp,
Flags: setupUpdateFlags,
Nargs: 0,
}
func setupUpdateFlags() *flag.FlagSet {
flagSet := flag.NewFlagSet("up", flag.ExitOnError)
flagSet.StringVar(&inPathArg, "in-path", "", "a value to search for items to index within ")
flagSet.StringVar(&outDsArg, "out-ds", "", "name of dataset to save the results to")
flagSet.StringVar(&relPathArg, "by", "", "a path relative to all the items in <in-path> to index by")
flagSet.StringVar(&txRegexArg, "tx-regex", "", "perform a string transformation on value before putting it in index")
flagSet.StringVar(&txReplaceArg, "tx-replace", "", "replace values matched by tx-regex")
flagSet.StringVar(&txConvertArg, "tx-convert", "", "convert the result of a tx regex/replace to this type (only does 'number' currently)")
verbose.RegisterVerboseFlags(flagSet)
profile.RegisterProfileFlags(flagSet)
return flagSet
}
type StreamingSetEntry struct {
valChan chan<- types.Value
setChan <-chan types.Set
}
type IndexMap map[types.Value]StreamingSetEntry
type Index struct {
m IndexMap
indexedCnt int64
seenCnt int64
mutex sync.Mutex
}
func runUpdate(args []string) int {
requiredArgs := map[string]string{"in-path": inPathArg, "out-ds": outDsArg, "by": relPathArg}
for argName, argValue := range requiredArgs {
if argValue == "" {
fmt.Fprintf(os.Stderr, "Missing required '%s' arg\n", argName)
flag.Usage()
return 1
}
}
defer profile.MaybeStartProfile().Stop()
cfg := config.NewResolver()
db, rootObject, err := cfg.GetPath(inPathArg)
d.Chk.NoError(err)
if rootObject == nil {
fmt.Printf("Object not found: %s\n", inPathArg)
return 1
}
outDs := db.GetDataset(outDsArg)
relPath, err := types.ParsePath(relPathArg)
if printError(err, "Error parsing -by value\n\t") {
return 1
}
gb := types.NewGraphBuilder(db, types.MapKind)
addElementsToGraphBuilder(gb, db, rootObject, relPath)
indexMap := gb.Build().(types.Map)
outDs, err = db.Commit(outDs, indexMap, datas.CommitOptions{})
d.Chk.NoError(err)
fmt.Printf("Committed index with %d entries to dataset: %s\n", indexMap.Len(), outDsArg)
return 0
}
func addElementsToGraphBuilder(gb *types.GraphBuilder, db datas.Database, rootObject types.Value, relPath types.Path) {
typeCacheMutex := sync.Mutex{}
typeCache := map[hash.Hash]bool{}
var txRe *regexp.Regexp
if txRegexArg != "" {
var err error
txRe, err = regexp.Compile(txRegexArg)
d.CheckError(err)
}
index := Index{m: IndexMap{}}
types.WalkValues(rootObject, db, func(v types.Value) bool {
typ := types.TypeOf(v)
typeCacheMutex.Lock()
hasPath, ok := typeCache[typ.Hash()]
typeCacheMutex.Unlock()
if !ok || hasPath {
pathResolved := false
tv := relPath.Resolve(v, db)
if tv != nil {
index.addToGraphBuilder(gb, tv, v, txRe)
pathResolved = true
}
if !ok {
typeCacheMutex.Lock()
typeCache[typ.Hash()] = pathResolved
typeCacheMutex.Unlock()
}
}
return false
})
status.Done()
}
func (idx *Index) addToGraphBuilder(gb *types.GraphBuilder, k, v types.Value, txRe *regexp.Regexp) {
atomic.AddInt64(&idx.seenCnt, 1)
if txRe != nil {
k1 := types.EncodedValue(k)
k2 := ""
if txReplaceArg != "" {
k2 = txRe.ReplaceAllString(string(k1), txReplaceArg)
} else {
matches := txRe.FindStringSubmatch(string(k1))
if len(matches) > 0 {
k2 = matches[len(matches)-1]
}
}
if txConvertArg == "number" {
if k2 == "" {
return
}
n, err := strconv.ParseFloat(k2, 64)
if err != nil {
fmt.Println("error converting to number: ", err)
return
}
k = types.Float(n)
} else {
k = types.String(k2)
}
}
atomic.AddInt64(&idx.indexedCnt, 1)
gb.SetInsert(types.ValueSlice{k}, v)
status.Printf("Found %s objects, Indexed %s objects", humanize.Comma(idx.seenCnt), humanize.Comma(idx.indexedCnt))
}
-263
View File
@@ -1,263 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"fmt"
"strconv"
"strings"
"text/scanner"
"unicode"
"github.com/attic-labs/noms/go/d"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/types"
)
/**** Query language BNF
query := expr
expr := expr boolOp compExpr | group
compExpr := indexToken compOp value
group := '(' expr ')' | compExpr
boolOp := 'and' | 'or'
compOp := '=' | '<' | '<=' | '>' | '>=' | !=
value := "<string>" | number
number := '-' digits | digits
digits := int | float
*/
type compOp string
type boolOp string
type indexManager struct {
db datas.Database
indexes map[string]types.Map
}
const (
equals compOp = "="
gt compOp = ">"
gte compOp = ">="
lt compOp = "<"
lte compOp = "<="
ne compOp = "!="
openP = "("
closeP = ")"
and boolOp = "and"
or boolOp = "or"
)
var (
compOps = []compOp{equals, gt, gte, lt, lte, ne}
boolOps = []boolOp{and, or}
)
type qScanner struct {
s scanner.Scanner
peekedToken rune
peekedText string
peeked bool
}
func (qs *qScanner) Scan() rune {
var r rune
if qs.peeked {
r = qs.peekedToken
qs.peeked = false
} else {
r = qs.s.Scan()
}
return r
}
func (qs *qScanner) Peek() rune {
var r rune
if !qs.peeked {
qs.peekedToken = qs.s.Scan()
qs.peekedText = qs.s.TokenText()
qs.peeked = true
}
r = qs.peekedToken
return r
}
func (qs *qScanner) TokenText() string {
var text string
if qs.peeked {
text = qs.peekedText
} else {
text = qs.s.TokenText()
}
return text
}
func (qs *qScanner) Pos() scanner.Position {
return qs.s.Pos()
}
func parseQuery(q string, im *indexManager) (expr, error) {
s := NewQueryScanner(q)
var expr expr
err := d.Try(func() {
expr = s.parseExpr(0, im)
})
return expr, err
}
func NewQueryScanner(query string) *qScanner {
isIdentRune := func(r rune, i int) bool {
identChars := ":/.>=-"
startIdentChars := "!><"
if i == 0 {
return unicode.IsLetter(r) || strings.ContainsRune(startIdentChars, r)
}
return unicode.IsLetter(r) || unicode.IsDigit(r) || strings.ContainsRune(identChars, r)
}
errorFunc := func(s *scanner.Scanner, msg string) {
d.PanicIfError(fmt.Errorf("%s, pos: %s\n", msg, s.Pos()))
}
var s scanner.Scanner
s.Mode = scanner.ScanIdents | scanner.ScanFloats | scanner.ScanStrings | scanner.SkipComments
s.Init(strings.NewReader(query))
s.IsIdentRune = isIdentRune
s.Error = errorFunc
qs := qScanner{s: s}
return &qs
}
func (qs *qScanner) parseExpr(level int, im *indexManager) expr {
tok := qs.Scan()
switch tok {
case '(':
expr1 := qs.parseExpr(level+1, im)
tok := qs.Scan()
if tok != ')' {
d.PanicIfError(fmt.Errorf("missing ending paren for expr"))
} else {
tok = qs.Peek()
if tok == ')' {
return expr1
}
tok = qs.Scan()
text := qs.TokenText()
switch {
case tok == scanner.Ident && isBoolOp(text):
op := boolOp(text)
expr2 := qs.parseExpr(level+1, im)
return logExpr{op: op, expr1: expr1, expr2: expr2, idxName: idxNameIfSame(expr1, expr2)}
case tok == scanner.EOF:
return expr1
default:
d.PanicIfError(fmt.Errorf("extra text found at end of expr, tok: %d, text: %s", int(tok), qs.TokenText()))
}
}
case scanner.Ident:
err := openIndex(qs.TokenText(), im)
d.PanicIfError(err)
expr1 := qs.parseCompExpr(level+1, qs.TokenText(), im)
tok := qs.Peek()
switch tok {
case ')':
return expr1
case rune(scanner.Ident):
_ = qs.Scan()
text := qs.TokenText()
if isBoolOp(text) {
op := boolOp(text)
expr2 := qs.parseExpr(level+1, im)
return logExpr{op: op, expr1: expr1, expr2: expr2, idxName: idxNameIfSame(expr1, expr2)}
} else {
d.PanicIfError(fmt.Errorf("expected boolean op, found: %s, level: %d", text, level))
}
case rune(scanner.EOF):
return expr1
default:
_ = qs.Scan()
}
default:
d.PanicIfError(fmt.Errorf("unexpected token in expr: %s, %d", qs.TokenText(), tok))
}
return logExpr{}
}
func (qs *qScanner) parseCompExpr(level int, indexName string, im *indexManager) compExpr {
qs.Scan()
text := qs.TokenText()
if !isCompOp(text) {
d.PanicIfError(fmt.Errorf("expected relop token but found: '%s'", text))
}
op := compOp(text)
value := qs.parseValExpr()
return compExpr{indexName, op, value}
}
func (qs *qScanner) parseValExpr() types.Value {
tok := qs.Scan()
text := qs.TokenText()
isNeg := false
if tok == '-' {
isNeg = true
tok = qs.Scan()
text = qs.TokenText()
}
switch tok {
case scanner.String:
if isNeg {
d.PanicIfError(fmt.Errorf("expected number after '-', found string: %s", text))
}
return valueFromString(text)
case scanner.Float:
f, _ := strconv.ParseFloat(text, 64)
if isNeg {
f = -f
}
return types.Float(f)
case scanner.Int:
i, _ := strconv.ParseInt(text, 10, 64)
if isNeg {
i = -i
}
return types.Float(i)
}
d.PanicIfError(fmt.Errorf("expected value token, found: '%s'", text))
return nil // for compiler
}
func valueFromString(t string) types.Value {
l := len(t)
if l < 2 && t[0] == '"' && t[l-1] == '"' {
d.PanicIfError(fmt.Errorf("Unable to get value from token: %s", t))
}
return types.String(t[1 : l-1])
}
func isCompOp(s string) bool {
for _, op := range compOps {
if s == string(op) {
return true
}
}
return false
}
func isBoolOp(s string) bool {
for _, op := range boolOps {
if s == string(op) {
return true
}
}
return false
}
func idxNameIfSame(expr1, expr2 expr) string {
if expr1.indexName() == expr2.indexName() {
return expr1.indexName()
}
return ""
}
-139
View File
@@ -1,139 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"testing"
"text/scanner"
"github.com/attic-labs/noms/go/chunks"
"github.com/attic-labs/noms/go/datas"
"github.com/attic-labs/noms/go/types"
"github.com/stretchr/testify/assert"
)
type scannerResult struct {
tok int
text string
}
type parseResult struct {
query string
ex expr
}
func TestQueryScanner(t *testing.T) {
assert := assert.New(t)
s := NewQueryScanner(`9 (99.9) -9 0x7F "99.9" and or http://localhost:8000/cli-tour::yo <= >= < > = _ !=`)
scannerResults := []scannerResult{
{tok: scanner.Int, text: "9"},
{tok: int('('), text: "("},
{tok: scanner.Float, text: "99.9"},
{tok: int(')'), text: ")"},
{tok: '-', text: "-"},
{tok: scanner.Int, text: "9"},
{tok: scanner.Int, text: "0x7F"},
{tok: scanner.String, text: `"99.9"`},
{tok: scanner.Ident, text: "and"},
{tok: scanner.Ident, text: "or"},
{tok: scanner.Ident, text: "http://localhost:8000/cli-tour::yo"},
{tok: scanner.Ident, text: "<="},
{tok: scanner.Ident, text: ">="},
{tok: scanner.Ident, text: "<"},
{tok: scanner.Ident, text: ">"},
{tok: int('='), text: "="},
{tok: int('_'), text: "_"},
{tok: scanner.Ident, text: "!="},
}
for _, sr := range scannerResults {
tok := s.Scan()
assert.Equal(sr.tok, int(tok), "expected text: %s, found: %s, pos: %s", sr.text, s.TokenText(), s.Pos())
assert.Equal(sr.text, s.TokenText())
}
tok := s.Scan()
assert.Equal(scanner.EOF, int(tok))
}
func TestPeek(t *testing.T) {
assert := assert.New(t)
s := NewQueryScanner(`_ < "one"`)
scannerResults := []scannerResult{
{tok: int('_'), text: "_"},
{tok: scanner.Ident, text: "<"},
{tok: scanner.String, text: `"one"`},
{tok: scanner.EOF, text: ""},
}
for _, sr := range scannerResults {
assert.Equal(sr.tok, int(s.Peek()))
assert.Equal(sr.text, s.TokenText())
assert.Equal(sr.tok, int(s.Scan()))
assert.Equal(sr.text, s.TokenText())
}
}
func TestParsing(t *testing.T) {
assert := assert.New(t)
re1 := compExpr{"index1", equals, types.Float(2015)}
re2 := compExpr{"index1", gte, types.Float(2020)}
re3 := compExpr{"index1", lte, types.Float(2022)}
re4 := compExpr{"index1", lt, types.Float(-2030)}
re5 := compExpr{"index1", ne, types.Float(3.5)}
re6 := compExpr{"index1", ne, types.Float(-3500.4536632)}
re7 := compExpr{"index1", ne, types.String("whassup")}
queries := []parseResult{
{`index1 = 2015`, re1},
{`(index1 = 2015 )`, re1},
{`(((index1 = 2015 ) ))`, re1},
{`index1 = 2015 or index1 >= 2020`, logExpr{or, re1, re2, "index1"}},
{`(index1 = 2015) or index1 >= 2020`, logExpr{or, re1, re2, "index1"}},
{`index1 = 2015 or (index1 >= 2020)`, logExpr{or, re1, re2, "index1"}},
{`(index1 = 2015 or index1 >= 2020)`, logExpr{or, re1, re2, "index1"}},
{`(index1 = 2015 or index1 >= 2020) and index1 <= 2022`, logExpr{and, logExpr{or, re1, re2, "index1"}, re3, "index1"}},
{`index1 = 2015 or index1 >= 2020 and index1 <= 2022`, logExpr{or, re1, logExpr{and, re2, re3, "index1"}, "index1"}},
{`index1 = 2015 or index1 >= 2020 and index1 <= 2022 or index1 < -2030`, logExpr{or, re1, logExpr{and, re2, logExpr{or, re3, re4, "index1"}, "index1"}, "index1"}},
{`(index1 = 2015 or index1 >= 2020) and (index1 <= 2022 or index1 < -2030)`, logExpr{and, logExpr{or, re1, re2, "index1"}, logExpr{or, re3, re4, "index1"}, "index1"}},
{`index1 != 3.5`, re5},
{`index1 != -3500.4536632`, re6},
{`index1 != "whassup"`, re7},
}
storage := &chunks.MemoryStorage{}
db := datas.NewDatabase(storage.NewView())
_, err := db.CommitValue(db.GetDataset("index1"), types.NewMap(db, types.String("one"), types.NewSet(db, types.String("two"))))
assert.NoError(err)
im := &indexManager{db: db, indexes: map[string]types.Map{}}
for _, pr := range queries {
expr, err := parseQuery(pr.query, im)
assert.NoError(err)
assert.Equal(pr.ex, expr, "bad query: %s", pr.query)
}
badQueries := []string{
`sdfsd = 2015`,
`index1 = "unfinished string`,
`index1 and 2015`,
`index1 < `,
`index1 < 2015 and ()`,
`index1 < 2015 an index1 > 2016`,
`(index1 < 2015) what`,
`(index1< 2015`,
`(badIndexName < 2015)`,
}
im1 := &indexManager{db: db, indexes: map[string]types.Map{}}
for _, q := range badQueries {
expr, err := parseQuery(q, im1)
assert.Error(err)
assert.Nil(expr)
}
}
-162
View File
@@ -1,162 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"bytes"
"fmt"
"io"
"sort"
"github.com/attic-labs/noms/go/types"
)
type bound struct {
value types.Value
include bool
infinity int8
}
func (b bound) isLessThanOrEqual(o bound) (res bool) {
return b.equals(o) || b.isLessThan(o)
}
func (b bound) isLessThan(o bound) (res bool) {
if b.infinity < o.infinity {
return true
}
if b.infinity > o.infinity {
return false
}
if b.infinity == o.infinity && b.infinity != 0 {
return false
}
if b.value.Less(o.value) {
return true
}
if b.value.Equals(o.value) {
if !b.include && o.include {
return true
}
}
return false
}
func (b bound) isGreaterThanOrEqual(o bound) (res bool) {
return !b.isLessThan(o)
}
func (b bound) isGreaterThan(o bound) (res bool) {
return !b.equals(o) || !b.isLessThan(o)
}
func (b bound) equals(o bound) bool {
return b.infinity == o.infinity && b.include == o.include &&
(b.value == nil && o.value == nil || (b.value != nil && o.value != nil && b.value.Equals(o.value)))
}
func (b bound) String() string {
var s1 string
if b.value == nil {
s1 = "<nil>"
} else {
buf := bytes.Buffer{}
types.WriteEncodedValue(&buf, b.value)
s1 = buf.String()
}
return fmt.Sprintf("bound{v: %s, include: %t, infinity: %d}", s1, b.include, b.infinity)
}
func (b bound) minValue(o bound) (res bound) {
if b.isLessThan(o) {
return b
}
return o
}
func (b bound) maxValue(o bound) (res bound) {
if b.isLessThan(o) {
return o
}
return b
}
type queryRange struct {
lower bound
upper bound
}
func (r queryRange) and(o queryRange) (rangeDescs queryRangeSlice) {
if !r.intersects(o) {
return []queryRange{}
}
lower := r.lower.maxValue(o.lower)
upper := r.upper.minValue(o.upper)
return []queryRange{{lower, upper}}
}
func (r queryRange) or(o queryRange) (rSlice queryRangeSlice) {
if r.intersects(o) {
v1 := r.lower.minValue(o.lower)
v2 := r.upper.maxValue(o.upper)
return queryRangeSlice{queryRange{v1, v2}}
}
rSlice = queryRangeSlice{r, o}
sort.Sort(rSlice)
return rSlice
}
func (r queryRange) intersects(o queryRange) (res bool) {
if r.lower.isGreaterThanOrEqual(o.lower) && r.lower.isLessThanOrEqual(o.upper) {
return true
}
if r.upper.isGreaterThanOrEqual(o.lower) && r.upper.isLessThanOrEqual(o.upper) {
return true
}
if o.lower.isGreaterThanOrEqual(r.lower) && o.lower.isLessThanOrEqual(r.upper) {
return true
}
if o.upper.isGreaterThanOrEqual(r.lower) && o.upper.isLessThanOrEqual(r.upper) {
return true
}
return false
}
func (r queryRange) String() string {
return fmt.Sprintf("queryRange{lower: %s, upper: %s", r.lower, r.upper)
}
// queryRangeSlice defines the sort.Interface. This implementation sorts queryRanges by
// the lower bound in ascending order.
type queryRangeSlice []queryRange
func (rSlice queryRangeSlice) Len() int {
return len(rSlice)
}
func (rSlice queryRangeSlice) Swap(i, j int) {
rSlice[i], rSlice[j] = rSlice[j], rSlice[i]
}
func (rSlice queryRangeSlice) Less(i, j int) bool {
return !rSlice[i].lower.equals(rSlice[j].lower) && rSlice[i].lower.isLessThanOrEqual(rSlice[j].lower)
}
func (rSlice queryRangeSlice) dbgPrint(w io.Writer) {
for i, rd := range rSlice {
if i == 0 {
fmt.Fprintf(w, "\n#################\n")
}
fmt.Fprintf(w, "queryRange %d: %s\n", i, rd)
}
if len(rSlice) > 0 {
fmt.Fprintf(w, "\n")
}
}
-150
View File
@@ -1,150 +0,0 @@
// Copyright 2016 Attic Labs, Inc. All rights reserved.
// Licensed under the Apache License, version 2.0:
// http://www.apache.org/licenses/LICENSE-2.0
package main
import (
"testing"
"github.com/attic-labs/noms/go/types"
"github.com/stretchr/testify/assert"
)
const nilHolder = -1000000
var (
r1 = qr(2, true, 5, true)
r2 = qr(0, true, 8, true)
r3 = qr(0, true, 3, true)
r4 = qr(3, true, 8, true)
r5 = qr(0, true, 1, true)
r6 = qr(6, true, 10, true)
r7 = qr(nilHolder, true, 10, true)
r8 = qr(3, true, nilHolder, true)
r10 = qr(2, true, 5, false)
r11 = qr(5, true, 10, true)
)
func newBound(i int, include bool, infinity int) bound {
var v types.Value
if i != nilHolder {
v = types.Float(i)
}
return bound{value: v, include: include, infinity: int8(infinity)}
}
func qr(lower int, lowerIncl bool, upper int, upperIncl bool) queryRange {
lowerInf := 0
if lower == nilHolder {
lowerInf = -1
}
upperInf := 0
if upper == nilHolder {
upperInf = 1
}
return queryRange{newBound(lower, lowerIncl, lowerInf), newBound(upper, upperIncl, upperInf)}
}
func TestRangeIntersects(t *testing.T) {
assert := assert.New(t)
assert.True(r1.intersects(r2))
assert.True(r1.intersects(r3))
assert.True(r1.intersects(r4))
assert.True(r2.intersects(r1))
assert.True(r1.intersects(r7))
assert.True(r1.intersects(r8))
assert.True(r3.intersects(r4))
assert.True(r3.intersects(r4))
assert.False(r1.intersects(r5))
assert.False(r1.intersects(r6))
assert.False(r10.intersects(r11))
}
func TestRangeAnd(t *testing.T) {
assert := assert.New(t)
assert.Empty(r1.and(r5))
assert.Empty(r1.and(r6))
assert.Equal(r1, r1.and(r2)[0])
assert.Equal(r1, r2.and(r1)[0])
expected := qr(3, true, 5, true)
assert.Equal(expected, r1.and(r4)[0])
}
func TestRangeOr(t *testing.T) {
assert := assert.New(t)
assert.Equal(r2, r1.or(r2)[0])
expected := qr(0, true, 5, true)
assert.Equal(expected, r1.or(r3)[0])
expectedSlice := queryRangeSlice{r5, r1}
assert.Equal(expectedSlice, r1.or(r5))
assert.Equal(expectedSlice, r5.or(r1))
}
func TestIsLessThan(t *testing.T) {
assert := assert.New(t)
assert.True(newBound(1, true, 0).isLessThanOrEqual(newBound(2, true, 0)))
assert.False(newBound(2, true, 0).isLessThanOrEqual(newBound(1, true, 0)))
assert.True(newBound(1, true, 0).isLessThanOrEqual(newBound(1, true, 0)))
assert.True(newBound(1, false, 0).isLessThanOrEqual(newBound(2, false, 0)))
assert.False(newBound(2, false, 0).isLessThanOrEqual(newBound(1, false, 0)))
assert.True(newBound(1, false, 0).isLessThanOrEqual(newBound(1, false, 0)))
assert.False(newBound(1, true, 0).isLessThanOrEqual(newBound(1, false, 0)))
assert.True(newBound(1, false, 0).isLessThanOrEqual(newBound(1, true, 0)))
assert.True(newBound(nilHolder, true, -1).isLessThanOrEqual(newBound(1, true, 0)))
assert.False(newBound(1, false, 0).isLessThanOrEqual(newBound(nilHolder, true, -1)))
}
func TestIsGreaterThan(t *testing.T) {
assert := assert.New(t)
assert.True(newBound(2, true, 0).isGreaterThanOrEqual(newBound(1, true, 0)))
assert.False(newBound(1, true, 0).isGreaterThanOrEqual(newBound(2, true, 0)))
assert.True(newBound(1, true, 0).isGreaterThanOrEqual(newBound(1, true, 0)))
assert.True(newBound(2, false, 0).isGreaterThanOrEqual(newBound(1, false, 0)))
assert.False(newBound(1, false, 0).isGreaterThanOrEqual(newBound(2, false, 0)))
assert.True(newBound(1, false, 0).isGreaterThanOrEqual(newBound(1, false, 0)))
assert.True(newBound(1, true, 0).isGreaterThanOrEqual(newBound(1, false, 0)))
assert.False(newBound(1, false, 0).isGreaterThanOrEqual(newBound(2, true, 0)))
assert.True(newBound(nilHolder, true, 1).isGreaterThanOrEqual(newBound(1, true, 0)))
assert.False(newBound(1, true, 0).isGreaterThanOrEqual(newBound(nilHolder, true, 1)))
}
func TestMinValue(t *testing.T) {
assert := assert.New(t)
ve1 := newBound(5, false, 0)
ve2 := newBound(5, true, 0)
ve3 := newBound(nilHolder, true, -1)
ve4 := newBound(nilHolder, true, 1)
assert.Equal(ve1, ve1.minValue(ve2))
assert.Equal(ve3, ve1.minValue(ve3))
assert.Equal(ve1, ve1.minValue(ve4))
}
func TestMaxValue(t *testing.T) {
assert := assert.New(t)
ve1 := newBound(5, false, 0)
ve2 := newBound(5, true, 0)
ve3 := newBound(nilHolder, true, -1)
ve4 := newBound(nilHolder, true, 1)
assert.Equal(ve2, ve1.maxValue(ve2))
assert.Equal(ve1, ve1.maxValue(ve3))
assert.Equal(ve4, ve1.maxValue(ve4))
}
-81
View File
@@ -1,81 +0,0 @@
package main
import (
"flag"
"fmt"
"github.com/attic-labs/noms/go/types"
"github.com/pkg/profile"
"log"
"time"
)
type NextEdit func() (types.Value, types.Value)
type MEBenchmark interface {
GetName() string
AddEdits(nextEdit NextEdit)
//SortEdits()
Map()
}
func main() {
profPath := flag.String("profpath", "./", "")
cpuProf := flag.Bool("cpuprof", false, "")
memProf := flag.Bool("memprof", false, "")
meBench := flag.Bool("me-bench", false, "")
count := flag.Int("n", 1000000, "")
flag.Parse()
if *cpuProf {
fmt.Println("cpu profiling enabled.")
fmt.Println("writing cpu prof to", *profPath)
defer profile.Start(profile.CPUProfile).Stop()
}
if *memProf {
fmt.Println("mem profiling enabled.")
fmt.Println("writing mem prof to", *profPath)
defer profile.Start(profile.MemProfile).Stop()
}
var toBench []MEBenchmark
if *meBench {
toBench = append(toBench, NewNomsMEBench())
}
log.Printf("Running each benchmark for %d items\n", *count)
tg := NewTupleGen(*count)
run(tg, toBench)
}
func benchmark(meb MEBenchmark, nextKVP NextEdit) {
startAdd := time.Now()
meb.AddEdits(nextKVP)
endAdd := time.Now()
addDelta := endAdd.Sub(startAdd)
log.Printf("%s - add time: %f\n", meb.GetName(), addDelta.Seconds())
/*startSort := time.Now()
meb.SortEdits()
endSort := time.Now()
sortDelta := endSort.Sub(startSort)
log.Printf("%s - sort time: %f\n", meb.GetName(), sortDelta.Seconds())*/
startMap := time.Now()
meb.Map()
endMap := time.Now()
mapDelta := endMap.Sub(startMap)
log.Printf("%s - map time: %f\n", meb.GetName(), mapDelta.Seconds())
}
func run(tg *TupleGen, toBench []MEBenchmark) {
for _, currBench := range toBench {
log.Println("Starting", currBench.GetName())
tg.Reset()
benchmark(currBench, tg.NextKVP)
log.Println(currBench.GetName(), "completed")
}
}
-36
View File
@@ -1,36 +0,0 @@
package main
import (
"context"
"github.com/attic-labs/noms/go/chunks"
"github.com/attic-labs/noms/go/types"
)
type NomsMEBench struct {
me *types.MapEditor
}
func NewNomsMEBench() *NomsMEBench {
ts := &chunks.TestStorage{}
vrw := types.NewValueStore(ts.NewView())
me := types.NewMap(context.Background(), vrw).Edit()
return &NomsMEBench{me}
}
func (nmeb *NomsMEBench) GetName() string {
return "noms map editor"
}
func (nmeb *NomsMEBench) AddEdits(nextEdit NextEdit) {
k, v := nextEdit()
for k != nil {
nmeb.me = nmeb.me.Set(k, v)
k, v = nextEdit()
}
}
func (nmeb *NomsMEBench) Map() {
nmeb.me.Map(context.Background())
}
-51
View File
@@ -1,51 +0,0 @@
package main
import (
"github.com/attic-labs/noms/go/types"
"github.com/google/uuid"
"math/rand"
)
type TupleGen struct {
keys []uint64
pos int
rng *rand.Rand
}
func NewTupleGen(count int) *TupleGen {
rng := rand.New(rand.NewSource(0))
keySet := make(map[uint64]struct{}, count)
for len(keySet) < count {
keySet[rng.Uint64()] = struct{}{}
}
keys := make([]uint64, 0, count)
for k := range keySet {
keys = append(keys, k)
}
return &TupleGen{keys, 0, rng}
}
func (tg *TupleGen) Reset() {
tg.pos = 0
}
func (tg *TupleGen) NextKVP() (types.Value, types.Value) {
if tg.pos >= len(tg.keys) {
return nil, nil
}
key := types.Uint(tg.keys[tg.pos])
val := types.NewTuple(
types.UUID(uuid.New()),
types.Int(tg.rng.Int63()),
types.Uint(tg.rng.Uint64()),
types.Float(tg.rng.Float64()),
types.String("test string"),
types.Bool(tg.rng.Int()%2 == 0),
types.NullValue)
tg.pos++
return key, val
}
-1
View File
@@ -1 +0,0 @@
nomsfs
-125
View File
@@ -1,125 +0,0 @@
# nomsfs
Nomsfs is a [FUSE](https://en.wikipedia.org/wiki/Filesystem_in_Userspace) filesystem built on Noms. To use it you'll need FUSE:
* *Linux* -- built-in; you should be good to go
* *Mac OS X* -- Install [FUSE for OS X](https://osxfuse.github.io/)
Development and testing have been done exclusively on Mac OS X using FUSE for OS X.
Nomsfs builds on the [Go FUSE imlementation](https://github.com/hanwen/go-fuse) from Han-Wen Nienhuys.
## Usage
Make sure FUSE is installed. On Mac OS X remember to run `/Library/Filesystems/osxfuse.fs/Contents/Resources/load_osxfuse`.
Build with `go build` (or just run with `go run nomsfs.go`); test with `go test`.
Mount an existing or new dataset by executing `nomsfs`:
```shell
$ mkdir /var/tmp/mnt
$ go run nomsfs.go /var/tmp/nomsfs::fs /var/tmp/mnt
running...
```
Use ^C to stop `nomsfs`
### Exploring The Data
1. Once you have a mount point and `nomsfs` is running you can add/delete/rename files and directories using the Finder or the command line as you would with any other file system.
2. Stop `nomsfs` with ^C
3. Let's look around the dataset:
```shell
> noms ds /var/tmp/nomsfs
fs
> noms show /var/tmp/nomsfs::fs
struct Commit {
meta: struct {},
parents: Set<Ref<Cycle<Commit>>>,
value: struct Filesystem {
root: struct Inode {
attr: struct Attr {
ctime: Number,
gid: Number,
mode: Number,
mtime: Number,
uid: Number,
xattr: Map<String, Blob>,
},
contents: struct Directory {
entries: Map<String, Cycle<1>>,
} | struct Symlink {
targetPath: String,
} | struct File {
data: Ref<Blob>,
},
},
},
}({
meta: {},
parents: {
d6jn389ov693oa4b9vqhe3fmn2g49c2k,
},
value: Filesystem {
root: Inode {
attr: Attr {
ctime: 1.4703496225642643e+09,
gid: 20,
mode: 511,
mtime: 1.4703496225642643e+09,
uid: 501,
xattr: {},
},
contents: Directory {
entries: {
"file.txt": Inode {
attr: Attr {
ctime: 1.470349669044128e+09,
gid: 20,
mode: 420,
mtime: 1.465233596e+09,
uid: 501,
xattr: {
"com.apple.FinderInfo": 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // 32 B
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00,
},
},
contents: File {
data: hv6f7d07uajec3mebergu810v12gem83,
},
},
"noms_logo.png": Inode {
attr: Attr {
ctime: 1.4703496464136713e+09,
gid: 20,
mode: 420,
mtime: 1.470171468e+09,
uid: 501,
xattr: {
"com.apple.FinderInfo": 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // 32 B
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00,
"com.apple.quarantine": 30 30 30 32 3b 35 37 61 31 30 39 34 63 3b 50 72 // 22 B
65 76 69 65 77 3b,
},
},
contents: File {
data: higtjmhq7fo5m072vkmmldtmkn2vspkb,
},
},
...
```
## Limitations
Hard links are not supported at this time, but may be added in the future.
Mounting a dataset in multiple locations is not supported, but may be added in the future.
## Troubleshooting
`Mount failed: no FUSE devices found`
Make sure FUSE is installed. If you're on Mac OS X make sure the kernel module is loaded by executing `/Library/Filesystems/osxfuse.fs/Contents/Resources/load_osxfuse`.
## Contributing
Issues welcome; testing welcome; code welcome. Feel free to pitch in!

Some files were not shown because too many files have changed in this diff Show More