Eryk Salvaggio on Nostr: "We find that indiscriminate use of model-generated content in training causes ...
"We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. We refer to this effect as ‘model collapse’ and show that it can occur in LLMs as well as in variational autoencoders (VAEs) and Gaussian mixture models (GMMs)."
https://www.nature.com/articles/s41586-024-07566-yPublished at
2024-07-26 15:21:19Event JSON
{
"id": "5b5b3d5b2aa4850097f09cb1d41cedb1fcdb7e538149eb950c9c26ff7c000581",
"pubkey": "f51976a1e657071561702d34834d565e956a349df9cc6e25caa7cfda043d9b42",
"created_at": 1722007279,
"kind": 1,
"tags": [
[
"proxy",
"https://assemblag.es/@CyberneticForests/112853469095894678",
"web"
],
[
"proxy",
"https://assemblag.es/users/CyberneticForests/statuses/112853469095894678",
"activitypub"
],
[
"L",
"pink.momostr"
],
[
"l",
"pink.momostr.activitypub:https://assemblag.es/users/CyberneticForests/statuses/112853469095894678",
"pink.momostr"
],
[
"-"
]
],
"content": "\"We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. We refer to this effect as ‘model collapse’ and show that it can occur in LLMs as well as in variational autoencoders (VAEs) and Gaussian mixture models (GMMs).\" \nhttps://www.nature.com/articles/s41586-024-07566-y",
"sig": "2a2f4af930cb79f3e83d42fbc975367df086446b5be1d511b15dc48895f11178abcd2d177a68e1f85deb3988b7f94a1c3e3b6a38036335eb21e712ad165f3d25"
}