mirror of
https://github.com/TecharoHQ/anubis.git
synced 2026-02-13 15:29:56 -06:00
Split up AI filtering files (#592)
* Split up AI filtering files Create aggressive/moderate/permissive policies to allow administrators to choose their AI/LLM stance. Aggressive policy matches existing default in Anubis. Removes `Google-Extended` flag from `ai-robots-txt.yaml` as it doesn't exist in requests. Rename `ai-robots-txt.yaml` to `ai-catchall.yaml` as the file is no longer a copy of the source repo/file. * chore: spelling * chore: fix embeds * chore: fix data includes * chore: fix file name typo * chore: Ignore READMEs in configs * chore(lib/policy/config): go tool goimports -w Signed-off-by: Xe Iaso <me@xeiaso.net> --------- Signed-off-by: Xe Iaso <me@xeiaso.net> Co-authored-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
@@ -41,6 +41,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
- Added `--version` flag.
|
||||
- Added `anubis_proxied_requests_total` metric to count proxied requests.
|
||||
- Add `Applebot` as "good" web crawler
|
||||
- Reorganize AI/LLM crawler blocking into three separate stances, maintaining existing status quo as default.
|
||||
- Split out AI/LLM user agent blocking policies, adding documentation for each.
|
||||
|
||||
## v1.18.0: Varis zos Galvus
|
||||
|
||||
|
||||
@@ -14,7 +14,7 @@ EG:
|
||||
{
|
||||
"bots": [
|
||||
{
|
||||
"import": "(data)/bots/ai-robots-txt.yaml"
|
||||
"import": "(data)/bots/ai-catchall.yaml"
|
||||
},
|
||||
{
|
||||
"import": "(data)/bots/cloudflare-workers.yaml"
|
||||
@@ -29,8 +29,8 @@ EG:
|
||||
```yaml
|
||||
bots:
|
||||
# Pathological bots to deny
|
||||
- # This correlates to data/bots/ai-robots-txt.yaml in the source tree
|
||||
import: (data)/bots/ai-robots-txt.yaml
|
||||
- # This correlates to data/bots/ai-catchall.yaml in the source tree
|
||||
import: (data)/bots/ai-catchall.yaml
|
||||
- import: (data)/bots/cloudflare-workers.yaml
|
||||
```
|
||||
|
||||
@@ -46,7 +46,7 @@ Of note, a bot rule can either have inline bot configuration or import a bot con
|
||||
{
|
||||
"bots": [
|
||||
{
|
||||
"import": "(data)/bots/ai-robots-txt.yaml",
|
||||
"import": "(data)/bots/ai-catchall.yaml",
|
||||
"name": "generic-browser",
|
||||
"user_agent_regex": "Mozilla|Opera\n",
|
||||
"action": "CHALLENGE"
|
||||
@@ -60,7 +60,7 @@ Of note, a bot rule can either have inline bot configuration or import a bot con
|
||||
|
||||
```yaml
|
||||
bots:
|
||||
- import: (data)/bots/ai-robots-txt.yaml
|
||||
- import: (data)/bots/ai-catchall.yaml
|
||||
name: generic-browser
|
||||
user_agent_regex: >
|
||||
Mozilla|Opera
|
||||
@@ -167,7 +167,7 @@ static
|
||||
├── botPolicies.json
|
||||
├── botPolicies.yaml
|
||||
├── bots
|
||||
│ ├── ai-robots-txt.yaml
|
||||
│ ├── ai-catchall.yaml
|
||||
│ ├── cloudflare-workers.yaml
|
||||
│ ├── headless-browsers.yaml
|
||||
│ └── us-ai-scraper.yaml
|
||||
|
||||
Reference in New Issue
Block a user