Mark T. Tomczak on Nostr: #ai #openai "AI is stealing from artists" is an argument that falls flat. This is not ...
#ai #openai
"AI is stealing from artists" is an argument that falls flat. This is not to say AI doesn't interfere with a traditional artist's ability to turn their labor into income; just that "stealing" is probably the wrong term.
Google (and other search engines) have been collecting, indexing, and making searchable the Internet for decades. They do that by exhaustively crawling and copying data from other people's websites and transforming it into a novel representation. Modulo some dissent, nobody really considers that crawl itself "stealing," nor the building of hte index.
But notably... Where people *do* get bent out of shape is when Google crawls the Internet into OneBox and provides zero-clickthrough factual answers to questions. *This* has generated some heat. But the heat is not grounded in a philosophical "theft," because it's not about how the data was harvested or indexed, but how it was *used.* The search index *directs traffic to other sites,* the OneBox answers questions *without redirecting user attention to the source of the information.*
"Theft" is a bad term for this. It breaks down because we do not have a common-sense understanding of "theft" being "how the acquired property is used;" we have a common-sense understanding of it in the taking. OpenAI wasn't "stealing" from artists when it was crawling and indexing the world of images; it wasn't "stealing" when it was labeling data or building bridges to existing label caches; it wasn't "stealing" when applying training data to them or comprehending the training data. If they'd made a big art search engine, I doubt we'd be having a "stealing from artists" conversation. But use that dataset to synthesize novel images and *uh-oh!*
So there's probably a better term for what OpenAI does than "theft," but I don't have it. And I don't think some people want to find it because moving the conversation to the use of the data opens the question "Was something actually taken from the source artists *that they deserve to have*", and that's, I think, an open question, one that many try to short-cut by falling back to an idea of "theft."
"AI is stealing from artists" is an argument that falls flat. This is not to say AI doesn't interfere with a traditional artist's ability to turn their labor into income; just that "stealing" is probably the wrong term.
Google (and other search engines) have been collecting, indexing, and making searchable the Internet for decades. They do that by exhaustively crawling and copying data from other people's websites and transforming it into a novel representation. Modulo some dissent, nobody really considers that crawl itself "stealing," nor the building of hte index.
But notably... Where people *do* get bent out of shape is when Google crawls the Internet into OneBox and provides zero-clickthrough factual answers to questions. *This* has generated some heat. But the heat is not grounded in a philosophical "theft," because it's not about how the data was harvested or indexed, but how it was *used.* The search index *directs traffic to other sites,* the OneBox answers questions *without redirecting user attention to the source of the information.*
"Theft" is a bad term for this. It breaks down because we do not have a common-sense understanding of "theft" being "how the acquired property is used;" we have a common-sense understanding of it in the taking. OpenAI wasn't "stealing" from artists when it was crawling and indexing the world of images; it wasn't "stealing" when it was labeling data or building bridges to existing label caches; it wasn't "stealing" when applying training data to them or comprehending the training data. If they'd made a big art search engine, I doubt we'd be having a "stealing from artists" conversation. But use that dataset to synthesize novel images and *uh-oh!*
So there's probably a better term for what OpenAI does than "theft," but I don't have it. And I don't think some people want to find it because moving the conversation to the use of the data opens the question "Was something actually taken from the source artists *that they deserve to have*", and that's, I think, an open question, one that many try to short-cut by falling back to an idea of "theft."