What is Nostr?
Velocirooster adminensis :bc: /
npub1r3x…4a09
2024-06-13 15:10:17

Velocirooster adminensis :bc: on Nostr: :bc: Attention Beige Party-goers! :bc: Recently it was brought to my attention that ...

:bc: Attention Beige Party-goers! :bc:

Recently it was brought to my attention that Maven, a new social network founded by former OpenAI Team Lead Ken Stanley, has automatically been scraping posts from the Fediverse to include as content on their network without the consent of posters, and without linking back to the original posts. It's also highly likely that this data was used as training data for its AI, since Maven's algorithm for serving posts is highly dependent on AI. Instead of following people, you follow "interests" and the AI decides which posts to show you. How convenient it must be to have a social network that does away with all those pesky "connections" between "human beings"!

Needless to say, this is rather troubling. The good news, for now, is that the CTO seems to have realized his mistake and has "paused" the scraping of Fediverse data, and he further claims to have deleted the data that was previously ingested. If, however, it was used to train the AI model, there's no real way to effectively claw that data back. I did some searching on Maven and didn't see any content from Beige Party; however, since the copied posts have been deleted it's hard to say if any of our posts made their way onto Maven. One advantage we have is that we're a relatively small instance, so we probably don't look like a rich source of data to potential scrapers.

In response to this, I have emailed the CTO requesting that he exclude Beige Party from any future scraping efforts and he has replied promising that he would do so. I have also preemptively defederated from Maven. This is the kind of behavior we were worried about from Meta, so to me it makes sense to defederate from Maven just as we did with Threads.

For an excellent overview of this issue, check out this article: https://wedistribute.org/2024/06/maven-mastodon-posts/

Please Note: Though the article claims that DMs were somehow able to be scraped, Eugen believes that the example in question was actually initially posted as a public post, deleted, then resent as a DM. I am inclined to believe him because there is no mechanism within Mastodon for sending private DMs to third-party instances, so if someone were able scrape these posts it would imply a massive vulnerability that went entirely unnoticed until Maven somehow figured it out. Frankly, I doubt they are that good.

Please see Eugen's post about this here: https://mastodon.social/@Gargron/112608441965799612

One suggestion that often comes up when issues like this arise is to enable authorized fetch. This would mean disabling anonymous access so that only logged-in users would be able to view our posts. This sounds pretty good, but it can cause compatibility issues with instances that are not running Mastodon, and there are various ways to circumvent it, such as reading public feeds anonymously through the Mastodon API, and there is currently is no way to disallow unauthenticated API access without breaking the entire site for anyone not logged into an account on this instance.

You can find more information about authorized fetch here: https://hub.sunny.garden/2023/06/28/what-does-authorized_fetch-actually-do/

Though there are technical reasons to avoid enabling authorized fetch, my reticence to use it is more philosophical. I understand the concerns about corporations scraping our data, but I feel that if we adopt a bunker mentality then we are effectively ceding space in the Fediverse to these corporate actors. The point of a social network is to communicate and to connect, and I don't want to punish people trying to interact with us in good faith just because there are some bad actors out there. In my mind, the greatest threat to the Fediverse is corporate interests carving it up into their own proprietary walled gardens like they did with the World Wide Web in the late 90s and early 2000s, and the best defense against that is to have a robust decentralized network of mostly small instances engaging in open and vibrant communication, so that whatever corporations can offer in their proprietary spaces will always pale in comparison to what's available in the free Fediverse.

As always, I welcome your thoughts on this topic. I've done my best to consider all angles, but I will never claim to have all the answers. Whatever our policy is, I want to make sure it's something we can all understand and build a consensus around.

Thanks to πšƒπš‘πšŠπš π™³πšŽπšŠπš π™Άπšžπš’ (npub1c8e…56ne) and rothko β˜•οΈ ♏️ (npub1hn2…lf40) for bringing this to my attention.

Beige-bless :bb:
Author Public Key
npub1r3xkyr2guqhh0fdp33qta3mnn0ef5eej8ml0wc06ec6zk43z2vwqc04a09