Simon Willison on Nostr: Did you know Google’s Gemini 1.5 Pro vision LLM is trained to return bounding boxes ...
Did you know Google’s Gemini 1.5 Pro vision LLM is trained to return bounding boxes for objects found within images?
I built this browser tool that lets you run a prompt with an image against Gemini and visualize the bounding boxes
You can try it out using your own Google Gemini API key:
https://tools.simonwillison.net/gemini-bbox Published at
2024-08-26 16:54:18Event JSON
{
"id": "96f1b776574d67b1415f7dc01715de60d6dd31f3500f0aeee68888a03583cbae",
"pubkey": "8b0be93ed69c30e9a68159fd384fd8308ce4bbf16c39e840e0803dcb6c08720e",
"created_at": 1724691258,
"kind": 1,
"tags": [
[
"proxy",
"https://fedi.simonwillison.net/users/simon/statuses/113029366334549422",
"activitypub"
]
],
"content": "Did you know Google’s Gemini 1.5 Pro vision LLM is trained to return bounding boxes for objects found within images?\n\nI built this browser tool that lets you run a prompt with an image against Gemini and visualize the bounding boxes\n\nYou can try it out using your own Google Gemini API key: https://tools.simonwillison.net/gemini-bbox\n\nhttps://cdn.masto.host/fedisimonwillisonnet/media_attachments/files/113/029/366/005/677/241/original/02726b1a470c51f2.jpeg",
"sig": "1558d6df773b8fbf469dfd68d6333e9b6afcaa30e5dc4fbe7308790b328ef90d69558000ae764230b8776d0dc20552f394e7f673f1e33b7ecc53d3266d733be9"
}