Skip to content

Latest commit

 

History

History
30 lines (29 loc) · 2.24 KB

README_Bark.md

File metadata and controls

30 lines (29 loc) · 2.24 KB

Bark tab:

  • History Settings:
    • Empty History:
      • Allows the model to generate audio without any "context" to start from.
    • Voice:
      • Allows you to select an original Bark voice to use for the model.
      • Language:
        • Different languages have different voices.
      • Speaker ID:
        • Allows you to select a specific speaker to use for the model.
      • Use V2:
        • The voices were "updated", so now there is a V2 version of the voices. It is presumably better (noise levels or following the prompt or accuracy).
    • Use Old Generation as History:
      • Allows you to use the previous generation's audio as the history for the next generation. This results in a similar speaker and tone. These are also the "custom" voices. (This setting might be renamed in the future.)
  • Max Length:
    • The maximum length of the generated audio in seconds by the model in one generation. Long form audio is generated by splitting the prompt into multiple generations. Higher values than 15 are not recommended.
    • This setting allows "previewing" the audio to reduce the amount of time spent generating audio in cases where the failure rate is high.
  • Split long Prompt:
    • For long form audio, this setting allows you to split the prompt into multiple generations. The model will generate audio for each split prompt and then concatenate the audio together.
    • Split prompt by lines:
      • Splits the prompt by lines.
  • Burn in Prompt (Optional):
    • This will generate the semantic tokens from the burn in prompt and then use those tokens as the semantic history for the model during the actual generation. This is useful for "priming" the model to generate audio in the style of the prompt. Semantic tokens are like the "script" for the model to follow.
  • Prompt:
    • The prompt is the text that the model will generate audio from. The prompt is converted into semantic tokens. Thus it is not an exact transscript of the audio that will be generated. For example, laugh or question marks will be acted out by the model, not spoken.

Bark voice clone tab:

  • Tokenizer:
    • Tokenizers are language specific. The tokenizer is used to convert the prompt into semantic tokens. The tokenizer must match the language of the prompt. The "default" language is English.