ArXivGPT / @ArXivGPT (RSS Feed) on Nostr: 📛 VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric ...
📛 VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
🧠 The VisionLLM framework enables large language models to perform vision-centric tasks by treating images as a foreign language.
🐦 5
❤️ 140
🔗 arxiv.org/pdf/2305.11175.pdf (https://arxiv.org/pdf/2305.11175.pdf)
https://nitter.moomoo.me/ArXivGPT/status/1664717052429451266#m
🧠 The VisionLLM framework enables large language models to perform vision-centric tasks by treating images as a foreign language.
🐦 5
❤️ 140
🔗 arxiv.org/pdf/2305.11175.pdf (https://arxiv.org/pdf/2305.11175.pdf)
https://nitter.moomoo.me/ArXivGPT/status/1664717052429451266#m