What is Nostr?
ResearchBuzz /
npub1edz…rj6j
2024-11-14 12:58:30

ResearchBuzz on Nostr: #AI #MachineLearning #OpenAccess #LLM #HuggingFace "Many have claimed that training ...

#AI #MachineLearning #OpenAccess #LLM #HuggingFace

"Many have claimed that training large language models requires copyrighted data, making truly open AI development impossible. Today, Pleias is proving otherwise with the release of Common Corpus...—the largest fully open multilingual dataset for training LLMs, containing over 2 trillion tokens of permissibly licensed content with provenance information (2,003,039,184,047 tokens)."

https://huggingface.co/blog/Pclanglais/two-trillion-tokens-open
Author Public Key
npub1edz6ysaqe6ratc29kzpqvgp33twzr0xefqv9ka7mdxjyacfy52mq42rj6j