It took a bit of figuring out, but I managed to spin up our #emacsconf video ...

It took a bit of figuring out, but I managed to spin up our #emacsconf video processing pipeline and got the first uploaded video through the process and into our backstage area, complete with edited captions. I experimented with using the word-level timestamps from WhisperX, but merging them was a little tedious. I might go back to using the text output and then using either Aeneas to align or splitting based on the word data from the WhisperX JSON. Could try finding some other subtitle segmentation thing - maybe give lachesis another try, or check out recent research, or just go with something based on length+punctuation+gap...

Sacha Chua on Nostr: It took a bit of figuring out, but I managed to spin up our #emacsconf video ...