Semisol 👨💻 on Nostr: tl;dr you feed images to an embedding model, get out embeddings (also known as points ...
tl;dr you feed images to an embedding model, get out embeddings (also known as points in high-dimensional space) and then later use those to search
you train another model to generate embeddings in the same space, but with captions instead of images
you usually take some images and get their embeddings, caption the images and use that to train the model
Published at
2025-01-09 15:56:26Event JSON
{
"id": "53d9cb8bb105a83904fa060eff2930fe175152b94efed14618839923ab96fe53",
"pubkey": "52b4a076bcbbbdc3a1aefa3735816cf74993b1b8db202b01c883c58be7fad8bd",
"created_at": 1736438186,
"kind": 1,
"tags": [
[
"e",
"8ab446cfaf948b32e4d774e9321bfb3a2160b4e257c5e1b2d440584a819e239e",
"",
"root"
],
[
"e",
"554300b55eb9dead5f4c51306d65b56d397b136833cf5f0fa35ad5f46f7932ee",
"",
"reply"
],
[
"p",
"b9e76546ba06456ed301d9e52bc49fa48e70a6bf2282be7a1ae72947612023dc"
],
[
"p",
"96a0b3e0738e7ff0838abc900fc48f61effef780d175a6bb2c0240246556bb3e"
]
],
"content": "tl;dr you feed images to an embedding model, get out embeddings (also known as points in high-dimensional space) and then later use those to search\n\nyou train another model to generate embeddings in the same space, but with captions instead of images\n\nyou usually take some images and get their embeddings, caption the images and use that to train the model",
"sig": "96157b066d1baca40b19a5ba7b492f287a071f5ec3ca11138336a45d1bcaf2a47ccacac2737856a2ee0635556318200b59f8d4a91eff434f975407c89f64b34e"
}