Substitcher script to stitch up to 200 vtt, srt subs #1320
mrfragger
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Substitcher stitches up to 200 audio segments and transcribed subtitles into one single subtitle
requirements
pip3 install pysubs2 titlecase
sudo apt install jq kid3-cli rename ffmpeg
flatpak install flathub org.freac.freac
works on mac too
brew install jq kid3-cli rename ffmpeg
....should work on windows
Purpose is to identify hallucinations, repeating subs, stuck timecodes, repeating timecodes.
Biggest difference I've noticed is medium is better to not hallucinate than medium.en and large (which is largev2). I've also tried quantized 5 model but accuracy is as bad as small so might as well just use small in that case.
Substitcher comes with a sample librivox audiobook to quickly play around with the options. You put all your srt or vtt subs into the root directory along with the opus audio segments and stitch them together. With the included audiobook extract by chapters and rename 001.opus, 002.opus which is option h) and that will correspond to the whisper.cpp transcribed vtt or srt.
Play audiobooks with subs with a black cover image. Linux SMPlayer, Windows PotPlayer, Mac IINA, Android mpv-android, iOS $ nPlayer or Liquid Player. Best though is mpv with plugin to search all subs.
substitcher202310.zip
Beta Was this translation helpful? Give feedback.
All reactions