Split up AI filtering files (#592)

* Split up AI filtering files

Create aggressive/moderate/permissive policies to allow administrators to choose their AI/LLM stance.

Aggressive policy matches existing default in Anubis.

Removes `Google-Extended` flag from `ai-robots-txt.yaml` as it doesn't exist in requests.

Rename `ai-robots-txt.yaml` to `ai-catchall.yaml` as the file is no longer a copy of the source repo/file.

* chore: spelling

* chore: fix embeds

* chore: fix data includes

* chore: fix file name typo

* chore: Ignore READMEs in configs

* chore(lib/policy/config): go tool goimports -w

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Co-authored-by: Xe Iaso <me@xeiaso.net>
This commit is contained in:
Corry Haines
2025-06-01 13:21:18 -07:00
committed by GitHub
parent 77e0bbbce9
commit de7dbfe6d6
19 changed files with 107 additions and 18 deletions

View File

@@ -41,6 +41,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added `--version` flag.
- Added `anubis_proxied_requests_total` metric to count proxied requests.
- Add `Applebot` as "good" web crawler
- Reorganize AI/LLM crawler blocking into three separate stances, maintaining existing status quo as default.
- Split out AI/LLM user agent blocking policies, adding documentation for each.
## v1.18.0: Varis zos Galvus

View File

@@ -14,7 +14,7 @@ EG:
{
"bots": [
{
"import": "(data)/bots/ai-robots-txt.yaml"
"import": "(data)/bots/ai-catchall.yaml"
},
{
"import": "(data)/bots/cloudflare-workers.yaml"
@@ -29,8 +29,8 @@ EG:
```yaml
bots:
# Pathological bots to deny
- # This correlates to data/bots/ai-robots-txt.yaml in the source tree
import: (data)/bots/ai-robots-txt.yaml
- # This correlates to data/bots/ai-catchall.yaml in the source tree
import: (data)/bots/ai-catchall.yaml
- import: (data)/bots/cloudflare-workers.yaml
```
@@ -46,7 +46,7 @@ Of note, a bot rule can either have inline bot configuration or import a bot con
{
"bots": [
{
"import": "(data)/bots/ai-robots-txt.yaml",
"import": "(data)/bots/ai-catchall.yaml",
"name": "generic-browser",
"user_agent_regex": "Mozilla|Opera\n",
"action": "CHALLENGE"
@@ -60,7 +60,7 @@ Of note, a bot rule can either have inline bot configuration or import a bot con
```yaml
bots:
- import: (data)/bots/ai-robots-txt.yaml
- import: (data)/bots/ai-catchall.yaml
name: generic-browser
user_agent_regex: >
Mozilla|Opera
@@ -167,7 +167,7 @@ static
├── botPolicies.json
├── botPolicies.yaml
├── bots
│ ├── ai-robots-txt.yaml
│ ├── ai-catchall.yaml
│ ├── cloudflare-workers.yaml
│ ├── headless-browsers.yaml
│ └── us-ai-scraper.yaml