Vladimir Savić on Nostr: "We introduce phi-1, a new large language model for code, with significantly smaller ...
"We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of 'textbook quality' data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens)."
Textbooks are all you need
https://arxiv.org/pdf/2306.11644 #AI #GenAI #LLM #compsci
Published at
2024-09-12 11:16:49Event JSON
{
"id": "d34c9fe0d6eb9d898696129b7b9a87bddf917ff27980bd1e76615413500cbe06",
"pubkey": "d21cd1857830821310d566c42ec7f5b7ca641c06828a4d55cf469dc1827b81df",
"created_at": 1726139809,
"kind": 1,
"tags": [
[
"t",
"ai"
],
[
"t",
"genai"
],
[
"t",
"llm"
],
[
"t",
"compsci"
],
[
"proxy",
"https://mastodon.social/users/firusvg/statuses/113124298574517575",
"activitypub"
]
],
"content": "\"We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of 'textbook quality' data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens).\" \n\nTextbooks are all you need https://arxiv.org/pdf/2306.11644 #AI #GenAI #LLM #compsci",
"sig": "952914dcdcb287cf37ece8a5ddd19de5427bbcc1f551a82cf25070e535bfae2d106b0cb537dfa21b386014e6d4328463e8a10384665afc763a240af26750c28b"
}