Haijo on Nostr: robots.txt is a file server admins or web developers can add to the root of a ...
robots.txt is a file server admins or web developers can add to the root of a website.
it contains info about the permissions for bots navigating the website. usually webcrawlers for things like search engines, or more recently to plagiarize text and images for fake neural networks.
under normal circumstances
User-agent: *
Disallow: /
should make it so no bots that obey robots.txt can access anything on the website, apart from robots.txt.
but the file from this link has a lot of other rules that should already be covered.
i get that people don't trust the authors of the bots to respect the rules, but adding more rules isn't going to help with that
CC: Cory :prami_pride_demi: (npub1jey…u2lf)
it contains info about the permissions for bots navigating the website. usually webcrawlers for things like search engines, or more recently to plagiarize text and images for fake neural networks.
under normal circumstances
User-agent: *
Disallow: /
should make it so no bots that obey robots.txt can access anything on the website, apart from robots.txt.
but the file from this link has a lot of other rules that should already be covered.
i get that people don't trust the authors of the bots to respect the rules, but adding more rules isn't going to help with that
CC: Cory :prami_pride_demi: (npub1jey…u2lf)