Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between direct source execution and command line execution for Chinese subtitle generation #167

Closed
sumanit opened this issue Dec 21, 2023 · 2 comments

Comments

@sumanit
Copy link

sumanit commented Dec 21, 2023

When executing the subtitle generation logic directly from the source code, it correctly segments sentences based on punctuation, and there are no spaces between the characters, which is the expected behavior for Chinese text.
However, when running the same logic via the command line interface, the sentence segmentation appears to be inaccurate, and there are unexpected spaces between Chinese characters.

input:
在忙碌和挑战中,我们的内心有时会感到疲惫。尤其是当我们发现自己脱发时,

sourceCode:

00:00:00,100 --> 00:00:01,538
在忙碌和挑战中

00:00:01,625 --> 00:00:04,237
我们的内心有时会感到疲惫

00:00:04,787 --> 00:00:07,225
尤其是当我们发现自己脱发时

command line:
00:00:00.100 --> 00:00:03.400
在 忙碌 和 挑战 中 我们 的 内心 有时 会

00:00:03.413 --> 00:00:07.875
感到 疲惫 尤其是 当 我们 发现 自己 脱发 时 心里

@rany2
Copy link
Owner

rany2 commented Apr 29, 2024

Does this still happen?

@rany2
Copy link
Owner

rany2 commented Apr 29, 2024

Nevermind I see what you mean now. It's the same issue as #156

@rany2 rany2 closed this as not planned Won't fix, can't repro, duplicate, stale May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants