mirror of
https://github.com/opencloud-eu/opencloud.git
synced 2026-02-24 22:49:06 -06:00
split into api, indexind and query language
This commit is contained in:
60
docs/ocis/adr/0018-file-search-api.md
Normal file
60
docs/ocis/adr/0018-file-search-api.md
Normal file
@@ -0,0 +1,60 @@
|
||||
---
|
||||
title: "18. File Search API"
|
||||
date: 2022-03-18T09:00:00+01:00
|
||||
geekdocRepo: https://github.com/owncloud/ocis
|
||||
geekdocEditPath: edit/master/docs/ocis/adr
|
||||
geekdocFilePath: 0018-file-search-api.md
|
||||
---
|
||||
|
||||
* Status: proposed
|
||||
* Deciders: @butonic, @micbar, @dragotin, @C0rby
|
||||
* Date: 2022-03-18
|
||||
|
||||
## Context and Problem Statement
|
||||
|
||||
The ability to find files based on certain search terms is a key requirement for a system that provides the ability to store unstructured data on a large scale.
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
* Have a simple yet powerful way of finding files in oCIS
|
||||
* Be able to construct intelligent searches based on metadata
|
||||
* Allow the user to filter the search queries based on metadata
|
||||
|
||||
## Considered Options
|
||||
|
||||
* [Libre Graph API](#libre-graph-api)
|
||||
* [WebDAV API](#webdav-api)
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
Chosen option: [WebDAV API](#webdav-api) because the current WebUI is compatible with that API. We may use the GraphAPI later in a second iteration.
|
||||
|
||||
### Positive Consequences
|
||||
|
||||
* The existing Clients can continue to use the well-known API
|
||||
* There are existing API tests which cover the basic behavior
|
||||
|
||||
### Negative consequences
|
||||
|
||||
* We have no server side result filtering capabilities
|
||||
|
||||
## Pros and Cons of the Options
|
||||
|
||||
### Libre Graph API
|
||||
|
||||
* Good, because we try to switch most of our HTTP requests to Libre Graph
|
||||
* Good, because the Graph API supports scopes, sorting and query language
|
||||
* Good, because it supports server side result filtering
|
||||
* Bad, because there are currently no clients which support that
|
||||
|
||||
### WebDAV API
|
||||
|
||||
* Good, because WebDAV is a well-known and widely adopted Standard
|
||||
* Good, because existing Clients continue to work without extra efforts
|
||||
* Bad, because the syntax is limited
|
||||
* Bad, because we cannot do server side result filtering
|
||||
|
||||
## Links
|
||||
|
||||
* [Search Indexing](0019-file-search-index.md)
|
||||
* [Search Query Language](0020-file-search-query-language.md)
|
||||
@@ -1,115 +0,0 @@
|
||||
|
||||
---
|
||||
title: "8. oCIS Search Infrastructure"
|
||||
date: 2021-06-08T09:00:00+01:00
|
||||
geekdocRepo: https://github.com/owncloud/ocis
|
||||
geekdocEditPath: edit/master/docs/ocis/adr
|
||||
geekdocFilePath: 0008-search.md
|
||||
---
|
||||
|
||||
* Status: proposed
|
||||
* Deciders: @butonic, @micbar, @dragotin
|
||||
* Date: 2021-06-08
|
||||
|
||||
Technical Story: oCIS Internal Services and APIs for File Search
|
||||
|
||||
## Context and Problem Statement
|
||||
|
||||
The ability to find files based on certain search terms is a key requirement for a system that provides the ability to store unstructured data large scale. This ADR outlines the concepts to implement search in oCIS.
|
||||
|
||||
From the users perspective, the interface to search is just a single entry field where the user enters one or more search terms. The minimum expectation is that the search returns file names and links to files that
|
||||
|
||||
- have a file name that contains at least one of the search terms
|
||||
- contain at least one of the search terms in the file contents
|
||||
- have meta data that is equal or contains one of the search terms
|
||||
|
||||
More sophisticated search capabilities are expected and can be implemted, especially based on metadata.
|
||||
|
||||
Another assumption that this ADR makes is that the search operation is scoped to the file space. Each file space has its own search index, and the search query can be run in parallel per space.
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
- Have a simple yet powerful way of finding files in oCIS
|
||||
- Be able to construct intelligent searches based on metadata
|
||||
- Allow the user to filter the search queries based on metadata
|
||||
|
||||
## Considered Options Query Notation
|
||||
|
||||
This lists the considered options for the search query notations.
|
||||
|
||||
### 1. Graph API
|
||||
|
||||
The search query adopts the Graph API to run search queries.
|
||||
|
||||
The [Libre Graph API](https://github.com/owncloud/libre-graph-api) would be inspired by the
|
||||
[Microsoft Graph API](https://developer.microsoft.com/en-us/graph). Specifically the part that is [described here](https://docs.microsoft.com/en-us/graph/api/driveitem-search?view=graph-rest-1.0&tabs=http)
|
||||
|
||||
### 2. Keyword Query Language (KQL)
|
||||
|
||||
Implement a search based on the [Keyword Query Language (KQL)](https://github.com/SharePoint/sp-dev-docs/blob/master/docs/general-development/keyword-query-language-kql-syntax-reference.md), adopted from Sharepoint.
|
||||
|
||||
### 3. Simplified Query
|
||||
|
||||
Implement a very simple search approach: Return all files which contain at least one of the keywords in their name, path, alias or selected metadata.
|
||||
|
||||
## Considered Options Result Listing
|
||||
|
||||
The search request returns the result listing as described in the specification for Options 1. and 2.
|
||||
|
||||
For option 3. Simplified Query the result is returned in a ordered list of file Ids and relative paths that match the search pattern. Note that the file IDs are all within one file space.
|
||||
|
||||
## Considered Options Indexing
|
||||
|
||||
To start the indexing of a file the search service listens to create, update and delete events on the internal event bus of oCIS.
|
||||
|
||||
The events need to contain a valid reference that defines the file space and file id of the file in question. The event only must be sent when the file operation (update, creation, removal) is finished.
|
||||
|
||||
### Setting dirty Flags
|
||||
|
||||
*To be discussed*
|
||||
|
||||
Once a file is changed, it would be beneficial to set a metadata flag on the file that inicates that the file was changed and operations might have to happen, ie. propagating the index and updating the search index.
|
||||
|
||||
There should be a flag for every operation that is needed, ie. `user.dirty.etagpropagation` and `user.dirty.nameindex` as name, and the new ETag of the node as value.
|
||||
|
||||
The flags are set by each storage driver directly after the write was finished, within the write lock. The list of dirty flags to set needs to be pulled from a central method that lists them for all storage drivers.
|
||||
|
||||
### Multiple Indexes
|
||||
|
||||
*To be discussed*
|
||||
|
||||
For each space, it should be possible to have multiple indices.
|
||||
|
||||
For now we can foresee at least an index for file- and path segment names, and another for "simple" file meta data like times and permissions.
|
||||
|
||||
Benefit of having multiple indexing:
|
||||
- Indexing can happen in parallel
|
||||
- Querying can happen in parallel
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
A search service is implemented as a separate microservice in oCIS. The service provides an API for search queries that deliver a list of results.
|
||||
|
||||
The indexing (create, update or remove) is only triggered asynchronously via events through the event bus. For that, the storage drivers need to send out the signals accordingly.
|
||||
The search service provides an API to provide the following functionality:
|
||||
|
||||
### Query/Read
|
||||
|
||||
The search service provides a synchronous API to
|
||||
- answer a search query based on search terms and storage space ids. The reply is a list of nodes that provides references to files within the space.
|
||||
|
||||
The search service API operates on one storage space by default. If a list of storage spaces to be searched through is provided as API parameter, the search is going sequentially through the list of Storage Spaces.
|
||||
|
||||
### Implementation for oCIS 2.0 GA
|
||||
|
||||
For oCIS 2.0, the search service only supports the Query Notation 3. Simplified Query.
|
||||
|
||||
It supports multiple indexes per space, but handles the query of them transparently to the outside. That means that the caller can not "choose" an index that should be queried or such.
|
||||
|
||||
The index is blieve based and saved as files via the CS3 API to the oCIS storage.
|
||||
|
||||
### Positive Consequences
|
||||
|
||||
### Negative Consequences
|
||||
|
||||
### Open Topics
|
||||
78
docs/ocis/adr/0019-file-search-index.md
Normal file
78
docs/ocis/adr/0019-file-search-index.md
Normal file
@@ -0,0 +1,78 @@
|
||||
---
|
||||
title: "19. File Search Index"
|
||||
date: 2022-03-18T09:00:00+01:00
|
||||
geekdocRepo: https://github.com/owncloud/ocis
|
||||
geekdocEditPath: edit/master/docs/ocis/adr
|
||||
geekdocFilePath: 0019-file-search-index.md
|
||||
---
|
||||
|
||||
* Status: proposed
|
||||
* Deciders: @butonic, @micbar, @dragotin, @C0rby
|
||||
* Date: 2022-03-18
|
||||
|
||||
## Context and Problem Statement
|
||||
|
||||
The ability to find files based on certain search terms is a key requirement for a system that provides the ability to store unstructured data on a large scale.
|
||||
|
||||
More sophisticated search capabilities are expected and can be implemented, especially based on metadata.
|
||||
|
||||
To trigger the indexing of a file, the search service listens to create, update and delete events on the internal event bus of oCIS.
|
||||
|
||||
The events need to contain a valid reference that defines the file space and file id of the file in question. The event only must be sent when the file operation (update, creation, removal) is finished.
|
||||
|
||||
Sharing adds more complexity because the index also needs to react to create, delete and modify shares events. Sharing should not duplicate the indexed data, especially within spaces or group shares.
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
* Have a simple yet powerful way of finding files in oCIS
|
||||
* Be able to construct intelligent searches based on metadata
|
||||
* Allow the user to filter the search queries based on metadata
|
||||
* Basic File Search needs to be implemented out of the box without external dependencies
|
||||
* The Search Indexing Service should be replacable with more sophisticated technologies like Elasticsearch
|
||||
* Make use of the spaces architecture to shard search indexes by space
|
||||
* The Search Indexing Service needs to deal with multiple users accessing the same resources due to shares
|
||||
|
||||
## Considered Options
|
||||
|
||||
* [Bleve Search](#bleve-search)
|
||||
* [Elastic Search](#elastic-search)
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
Chosen option: Bleve Search, because we can fulfill the MVP and include it into the single binary.
|
||||
|
||||
### Positive Consequences
|
||||
|
||||
* Basic File Search works out of the box
|
||||
* We do not need heavy external dependencies which need to be deployed alongside
|
||||
|
||||
### Negative consequences
|
||||
|
||||
* We need to be aware of the scaling limits
|
||||
* We need to find a way to work with shares and spaces
|
||||
* It has a limited query language
|
||||
|
||||
## Pros and Cons of the Options
|
||||
|
||||
### Bleve Search
|
||||
|
||||
* Good, because it is written in GoLang and can be bundled into the single oCIS binary
|
||||
* Good, because it is a lightweight but powerful solution which could fulfill a lot of use cases
|
||||
* Bad, because we do not know exactly how we can represent shares in the index without duplicating data
|
||||
* Bad, because it is a single process
|
||||
* Bad, because the query language is limited
|
||||
|
||||
### Elastic Search
|
||||
|
||||
* Good, because it has become an industry standard
|
||||
* Good, because it supports a rich query language
|
||||
* Good, because it has built in cluster support and scales well
|
||||
* Good, because it has a permission system and supports multiple users and groups to access the same resource
|
||||
* Bad, because it is a heavy setup and needs extra effort and knowledge
|
||||
|
||||
## Links
|
||||
|
||||
* [Search API](0018-file-search-api.md)
|
||||
* [Search Query Language](0020-file-search-query-language.md)
|
||||
* [Bleve Search on GitHub](https://github.com/blevesearch/bleve)
|
||||
* [ElasticSearch](https://www.elastic.co/elastic-stack/)
|
||||
104
docs/ocis/adr/0020-file-search-query-language.md
Normal file
104
docs/ocis/adr/0020-file-search-query-language.md
Normal file
@@ -0,0 +1,104 @@
|
||||
---
|
||||
title: "20. File Search Query Language"
|
||||
date: 2022-03-18T09:00:00+01:00
|
||||
geekdocRepo: https://github.com/owncloud/ocis
|
||||
geekdocEditPath: edit/master/docs/ocis/adr
|
||||
geekdocFilePath: 0018-file-search-query-language.md
|
||||
---
|
||||
|
||||
* Status: proposed
|
||||
* Deciders: @butonic, @micbar, @dragotin, @C0rby
|
||||
* Date: 2022-03-18
|
||||
|
||||
## Context and Problem Statement
|
||||
|
||||
From the users perspective, the interface to search is just a single form field where the user enters one or more search terms. The minimum expectation is that the search returns file names and links to files that
|
||||
|
||||
* have a file name that contains at least one of the search terms
|
||||
* contain at least one of the search terms in the file contents
|
||||
* have meta data that is equal or contains one of the search terms
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
* The standard user should not be bothered by a query syntax
|
||||
* The power user should also be able to narrow his search with an efficient and flexible syntax
|
||||
* We need to consider different backend technologies which we need to access through an abstraction layer
|
||||
* Using different indexing systems should lead to a slightly different feature set whitout changing the syntax completely
|
||||
|
||||
## Considered Options
|
||||
|
||||
* [Keyword Query Language](#keyword-query-language-kql)
|
||||
* [Simple Query](#simplified-query)
|
||||
* [Lucene Query Language](#lucene-query-language)
|
||||
* [Solr Query Language](#solr-query-language)
|
||||
* [Elasticsearch Query Language](#elasticsearch-query-language)
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
Chosen option: "[option 1]", because [justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force force | … | comes out best (see below)].
|
||||
|
||||
### Positive Consequences
|
||||
|
||||
* [e.g., improvement of quality attribute satisfaction, follow-up decisions required, …]
|
||||
* …
|
||||
|
||||
### Negative consequences
|
||||
|
||||
* [e.g., compromising quality attribute, follow-up decisions required, …]
|
||||
* …
|
||||
|
||||
## Pros and Cons of the Options
|
||||
|
||||
### Keyword Query Language (KQL)
|
||||
|
||||
Implement a search based on the Keyword Query Language (KQL), adopted from Sharepoint.
|
||||
|
||||
* Good, because microsoft already uses it togethe with the GraphAPI
|
||||
* Bad, because there is no go package
|
||||
|
||||
### Simplified Query
|
||||
|
||||
Implement a very simple search approach: Return all files which contain at least one of the keywords in their name, path, alias or selected metadata.
|
||||
|
||||
* Good, because that covers 80% of the users needs
|
||||
* Good, because it is very straightforward
|
||||
* Bad, because it is below the industry standard
|
||||
* Bad, because it only provides one search query
|
||||
|
||||
### Lucene Query Language
|
||||
|
||||
The Lucene Query Parser syntax supports advanced queries like term, phrase, wildcard, fuzzy search, proximity search, regular expressions, boosting, boolean operators and grouping. It is a well known query syntax used by the Apache Lucene Project. Popular Platforms like Wikipedia are using Lucene or Solr, which is the successor of Lucene
|
||||
|
||||
* Good, because it is a well documented and powerful syntax
|
||||
* Good, because it is very close to the Elasticsearch and the Solr syntax which enhances compatibility
|
||||
* Bad, because there is no powerful and well tested query parser for golang available
|
||||
* Bad, because it adds complexity and fulfilling all the different query usecases can be an "uphill battle"
|
||||
|
||||
### Solr Query Language
|
||||
|
||||
Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.
|
||||
|
||||
* Good, because it is a well documented and powerful syntax
|
||||
* Good, because it is very close to the Elasticsearch and the Lucene syntax which enhances compatibility
|
||||
* Good, because it has a strong community with large resources and knowledge
|
||||
* Bad, because it adds complexity and fulfilling all the different query usecases can be an "uphill battle"
|
||||
|
||||
### Elasticsearch Query Language
|
||||
|
||||
Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses. It is able to combine multiple query types into compound queries. It is also a successor of Solr.
|
||||
|
||||
* Good, because it is a well documented and powerful syntax
|
||||
* Good, because it is very close to the Elasticsearch and the Solr syntax which enhances compatibility
|
||||
* Good, because there is a stable and well tested go client which brings a query builder
|
||||
* Good, because it could be used as the query language which supports different search backends by just implementing what is needed for our usecase
|
||||
* Bad, because it adds complexity and fulfilling all the different query usecases can be an "uphill battle"
|
||||
|
||||
## Links
|
||||
|
||||
* [Search API](0018-file-search-api.md)
|
||||
* [Search Indexing](0019-file-search-index.md)
|
||||
* [KQL](https://github.com/SharePoint/sp-dev-docs/blob/master/docs/general-development/keyword-query-language-kql-syntax-reference.md)
|
||||
* [Apache Lucene](https://lucene.apache.org/)
|
||||
* [Apache Solr](https://solr.apache.org/)
|
||||
* [Elastic Search](https://solr.apache.org/)
|
||||
* [Elastic Search for go](https://github.com/elastic/go-elasticsearch)
|
||||
Reference in New Issue
Block a user