Abhinav Tushar on Nostr: I am playing with multi-speaker #voicebot and seems like #openai realtime API is not ...
I am playing with multi-speaker #voicebot and seems like #openai realtime API is not detecting speakers natively. Surprisingly it keeps saying "can't tell only based on text" as if it's biased more towards transcriptions than full spectrum audio signals.
Probably the best way to do this right now would be to input speaker-tagged 'text' mode item in the API and let it emit audio in response.
Published at
2024-10-12 09:26:31Event JSON
{
"id": "f32f04c72e47ce347c075a61721bb676131cdd46a9909c7cf67a2ad18bea7856",
"pubkey": "fad4c7742f14fb01c3ae6f10f2e99df3b91fbc827601a02fff9be1cd8049975e",
"created_at": 1728725191,
"kind": 1,
"tags": [
[
"t",
"openai"
],
[
"t",
"Voicebot"
],
[
"proxy",
"https://mathstodon.xyz/users/lepisma/statuses/113293734154126019",
"activitypub"
]
],
"content": "I am playing with multi-speaker #voicebot and seems like #openai realtime API is not detecting speakers natively. Surprisingly it keeps saying \"can't tell only based on text\" as if it's biased more towards transcriptions than full spectrum audio signals.\n\nProbably the best way to do this right now would be to input speaker-tagged 'text' mode item in the API and let it emit audio in response.",
"sig": "b6076231014ef2b79b82f573e7014d005aa04c69bbeb04194da3f486c707ebdfc9f5bba56cb49be5ff7bc16d8b43c4a9e42968f91848db9c1f62a42bd5edb304"
}