Substitcher script to stitch up to 200 vtt, srt subs #1320

mrfragger · 2023-09-25T02:01:24Z

mrfragger
Sep 25, 2023

Substitcher stitches up to 200 audio segments and transcribed subtitles into one single subtitle

requirements
pip3 install pysubs2 titlecase

sudo apt install jq kid3-cli rename ffmpeg

flatpak install flathub org.freac.freac

works on mac too
brew install jq kid3-cli rename ffmpeg
....should work on windows

Purpose is to identify hallucinations, repeating subs, stuck timecodes, repeating timecodes.
Biggest difference I've noticed is medium is better to not hallucinate than medium.en and large (which is largev2). I've also tried quantized 5 model but accuracy is as bad as small so might as well just use small in that case.

Substitcher comes with a sample librivox audiobook to quickly play around with the options. You put all your srt or vtt subs into the root directory along with the opus audio segments and stitch them together. With the included audiobook extract by chapters and rename 001.opus, 002.opus which is option h) and that will correspond to the whisper.cpp transcribed vtt or srt.

Play audiobooks with subs with a black cover image. Linux SMPlayer, Windows PotPlayer, Mac IINA, Android mpv-android, iOS $ nPlayer or Liquid Player. Best though is mpv with plugin to search all subs.

################ Encode Opus chaptered 16kbps audiobook ####################
a) Encode all mp3, opus, m4a, mp4, etc. to 32kbps opus audio chapters
b) freac flatpak encode opus 16kbps chaptered audiobook
c) Filesize, bitrate, duration, total chapters in opus audiobook in source/
d) Extract all chapters with names from one opus audiobook in source/
e) Remove underscores from opus filenames
f) Titlecase opus chapter filenames Lord Of The Rings --> Lord of the Rings
g) Set metadata title for each chapter based on opus filename
h) Extract all chapters from opus audiobook rename to 001.opus, 002.opus
i) Split into 30, 1h, 2h, 3h, 4h segments from opus audiobook in source/
j) Split into equal chunks of audio from opus audiobook in source/
k) Rename all chapters to 001.opus, 002.opus, 003.opus ... 200.opus
l) Convert all .srt subtitles in current dir to .vtt subtitles
m) Replace/Insert black.png cover image on opus audiobooks for subtitles
n) Remove hiss (must re-encode audiobook)
o) Remove silence from audio files (must re-encode audiobook)
p) Trim beginning / end of audio chapters (must re-encode audiobook)
r) Combine two s) three t) four different language vtt into one vtt/srt
################ SubStitcher #################################
1) Run SubStitcher on up to 200 opus / vtt files --> source/stitchedsubs.vtt
001.vtt, 002.vtt, 003.vtt and 001.opus, 002.opus, 003.opus ... 200.opus
2) Check for Stuck Timecodes on stitchedsubs.vtt
Example: StuckSub Line# 247971: 76:52 05
                StuckSub Line# 248592: 77:05 52
3) Check for 2x+ Repeating Timecodes only on stitchedsubs.vtt
Example: 3x 24:31:20.068 --> 24:31:20.068 Line#: 82487, 82490, 82493
4) Check for 15x+ Repeating Timecodes and Phrases on stitchedsubs.vtt
Example: 22x Now, that's not a question of the devil
154989, 154992, 154995, 154998, 155001, 155004, 155007, 155010, 155013,
5) Copy stitchedsubs.vtt --> Title of Audiobook.vtt and srt in source/
6) Set black cover image/metadata + filename audiobook in source/
7) Transcribe *.opus files whisper.cpp with medium multilingual model
8) Propernoun capitalization of vtt subs america America nancy Nancy
q) Quit SubStitcher

substitcher202310.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Substitcher script to stitch up to 200 vtt, srt subs #1320

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Substitcher script to stitch up to 200 vtt, srt subs #1320

mrfragger Sep 25, 2023

Substitcher stitches up to 200 audio segments and transcribed subtitles into one single subtitle

Replies: 0 comments

mrfragger
Sep 25, 2023