Fabio Manganiello on Nostr: In theory, I wouldn’t mind big tech using #AI to scrape my content at all. From its ...
In theory, I wouldn’t mind big tech using #AI to scrape my content at all.
From its very inception, the Web was meant to be open and machine-accessible. In the lack of proper widespread Web 3.0 semantic constructs (whose adoption was limited by big tech itself, as they would have democratized content extraction, and removed the need for services like Google’s), scraping is an inalienable part of such freedom.
But do I have the same freedom to scrape as big tech has?
Can I scrape content from my own Facebook timeline to train my model (after jumping all the client-side rendering hurdles that Facebook has put in place to prevent scraping) without resulting in my account being locked/banned?
Can I do the same with Twitter or Reddit maybe?
Can I systematically scrape the results of a YouTube page without resulting in my Google account being locked/banned or my IP being blacklisted?
Can I freely access the predictions of Copilot, which was trained on billions of freely available lines of open-source code, without having to pay for a subscription?
Can I freely scrape results from the Amazon store to build my own pricing model?
As long as the answer to all of these questions is *no*, I’ll block the shit out of whatever user agent has the *Bot* string inside.
And that’s only because *they* have ruined the game for everyone by twisting the rules in their advantage.
From its very inception, the Web was meant to be open and machine-accessible. In the lack of proper widespread Web 3.0 semantic constructs (whose adoption was limited by big tech itself, as they would have democratized content extraction, and removed the need for services like Google’s), scraping is an inalienable part of such freedom.
But do I have the same freedom to scrape as big tech has?
Can I scrape content from my own Facebook timeline to train my model (after jumping all the client-side rendering hurdles that Facebook has put in place to prevent scraping) without resulting in my account being locked/banned?
Can I do the same with Twitter or Reddit maybe?
Can I systematically scrape the results of a YouTube page without resulting in my Google account being locked/banned or my IP being blacklisted?
Can I freely access the predictions of Copilot, which was trained on billions of freely available lines of open-source code, without having to pay for a subscription?
Can I freely scrape results from the Amazon store to build my own pricing model?
As long as the answer to all of these questions is *no*, I’ll block the shit out of whatever user agent has the *Bot* string inside.
And that’s only because *they* have ruined the game for everyone by twisting the rules in their advantage.