I've tried making a full-text #RSS feed for the websites of ScienceX, the parent org ...

I've tried making a full-text #RSS feed for the websites of ScienceX, the parent org for Phys.org and Tech Xplore.

The webpages are very straightforward, so the bridge (for #rssbridge) took just about 200 LoC. But! #CloudFlare is super zealous there.

Even with the following parameters:
- 3 feeds
- fetch every hour
- cache webpages for 7 days (= fetch each webpage only once, for all intents and purposes)
I already got 429'ed. I'll try fetching every 4 hours, I guess...

W-why such extreme measures to prevent parsing? I'm sure #AI corps or whoever needs their data will just hire a bunch of people to solve CloudFlare's CAPTCHAs, but everyone else will be left behind.

Just give me the damn full-text RSS, I'd even pay for it... if I could sign up, the signup form returns 503 for me.

Pavel Korytov :emacs:☮️ on Nostr: I've tried making a full-text #RSS feed for the websites of ScienceX, the parent org ...