Fabio Manganiello on Nostr: Several websites now have #RSS feeds that seem to block bots - Reuters, ANSA and ...
Several websites now have #RSS feeds that seem to block bots - Reuters, ANSA and Dutch Review are among those.
I've noticed it because several feeds had become unavailable in the past days on my Miniflux instance.
Apparently setting HTTP_CLIENT_USER_AGENT to a Firefox/Chrome user agent rather than a string that contains "Miniflux" is enough to bypass the block.
This kind of stuff just baffles me.
1. How do they expect people to consume RSS feeds? From their browsers? That's a bummer, because both Firefox and Chrome don't even render RSS/Atom content types anymore, so it'd require people to be quite fluent in reading XML in order to consume the content.
2. If for some reason they expect people to consume feeds from the browser, then how are they going to notify people that there's a feed available when they navigate on their page? Reuters doesn't even bother to use a <link> attribute in the DOM, for instance, nor it bothers to tell folks about the feeds on the homepage.
3. If, realistically speaking, feeds can no longer be read in a browser in 2024 (sure, there are folks like me that use custom Firefox extensions, but realistically we're <0.1% of the traffic), then of course the only alternative is an offline aggregator. So what's the point of blocking bot user agents, if that's exactly the way things are intended to work?
4. How is a mechanism that simply throws a 403 if the request comes from a user agent containing e.g. "Miniflux" or "libcurl" supposed to be "bot protection", when I'm only one step away from spoofing my user agent?
5. If these folks are really so hostile towards feeds, then why do they even bother to still run feeds?
My proposal: all large news outlets should have mandatory support for RSS/Atom feeds, properly advertised as a <link> tag and/or on the homepage, and with no barriers (especially barriers as dumb as a static UA test).
Being a large news outlet (especially, as it's often the case in Europe, a large publisher partly funded by public money) means that your information *must* be accessible even to users that don't/can't read your articles in a standard web browser. Especially if we want to set up automatic alerts/notifications based on some events. Sure, I can technically bypass all the dumb barriers and all the pointless friction points that both browser manufacturers and news outlets add to discourage people from using feeds. But at some point I just run into technical fatigue. Open feeds for large outlets that deliver critical news services, and are often partly funded by our taxes, should be a mandatory requirement. Not a war that should be fought only by tech savvy citizens on an individual level.
I've noticed it because several feeds had become unavailable in the past days on my Miniflux instance.
Apparently setting HTTP_CLIENT_USER_AGENT to a Firefox/Chrome user agent rather than a string that contains "Miniflux" is enough to bypass the block.
This kind of stuff just baffles me.
1. How do they expect people to consume RSS feeds? From their browsers? That's a bummer, because both Firefox and Chrome don't even render RSS/Atom content types anymore, so it'd require people to be quite fluent in reading XML in order to consume the content.
2. If for some reason they expect people to consume feeds from the browser, then how are they going to notify people that there's a feed available when they navigate on their page? Reuters doesn't even bother to use a <link> attribute in the DOM, for instance, nor it bothers to tell folks about the feeds on the homepage.
3. If, realistically speaking, feeds can no longer be read in a browser in 2024 (sure, there are folks like me that use custom Firefox extensions, but realistically we're <0.1% of the traffic), then of course the only alternative is an offline aggregator. So what's the point of blocking bot user agents, if that's exactly the way things are intended to work?
4. How is a mechanism that simply throws a 403 if the request comes from a user agent containing e.g. "Miniflux" or "libcurl" supposed to be "bot protection", when I'm only one step away from spoofing my user agent?
5. If these folks are really so hostile towards feeds, then why do they even bother to still run feeds?
My proposal: all large news outlets should have mandatory support for RSS/Atom feeds, properly advertised as a <link> tag and/or on the homepage, and with no barriers (especially barriers as dumb as a static UA test).
Being a large news outlet (especially, as it's often the case in Europe, a large publisher partly funded by public money) means that your information *must* be accessible even to users that don't/can't read your articles in a standard web browser. Especially if we want to set up automatic alerts/notifications based on some events. Sure, I can technically bypass all the dumb barriers and all the pointless friction points that both browser manufacturers and news outlets add to discourage people from using feeds. But at some point I just run into technical fatigue. Open feeds for large outlets that deliver critical news services, and are often partly funded by our taxes, should be a mandatory requirement. Not a war that should be fought only by tech savvy citizens on an individual level.