Fabio Manganiello on Nostr: nprofile1q…5v5sj my doctrine is that abuse shouldn’t prevent legitimate use. The ...
nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpqjl0wt67um0zeynsj434uwxul8vrxlhm4xl3n832w60d9nyayxzhs35v5sj (nprofile…v5sj) my doctrine is that abuse shouldn’t prevent legitimate use.
The vision behind the (original) Web 3.0 was to have a Web that was as much machine-readable as human-readable. Giving up on that vision because of a small percentage of abusers is like giving up on hosting a website because it may be subject to DDoS attacks.
From a more practical perspective, as someone who has been hosting sites for a couple of decades (without even having Cloudflare and friends in front of them), I’ve noticed a couple of scrape abuse patterns:
Genuinely misconfigured scripts (it also happened a few times with Fediverse instances). In those cases, it’s usually quite easy to find the culprit and urge them to fix their logic.
Malignant actors: in that case it has nothing to do with scraping (they don’t want your content, they just want to take down your website).
Some mitigation actions that I’ve found useful:
Cache your static content.
Provide your content in a machine-readable format too (RSS, JSON-LD, RDF, some API…). Legitimate scrapers prefer it much more when the data is already machine-ready rather than having to implement their own brittle HTML scraper. And non-HTML content is usually also much lighter from a content size and load perspective.
The vision behind the (original) Web 3.0 was to have a Web that was as much machine-readable as human-readable. Giving up on that vision because of a small percentage of abusers is like giving up on hosting a website because it may be subject to DDoS attacks.
From a more practical perspective, as someone who has been hosting sites for a couple of decades (without even having Cloudflare and friends in front of them), I’ve noticed a couple of scrape abuse patterns:
Genuinely misconfigured scripts (it also happened a few times with Fediverse instances). In those cases, it’s usually quite easy to find the culprit and urge them to fix their logic.
Malignant actors: in that case it has nothing to do with scraping (they don’t want your content, they just want to take down your website).
Some mitigation actions that I’ve found useful:
Cache your static content.
Provide your content in a machine-readable format too (RSS, JSON-LD, RDF, some API…). Legitimate scrapers prefer it much more when the data is already machine-ready rather than having to implement their own brittle HTML scraper. And non-HTML content is usually also much lighter from a content size and load perspective.