Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3: Add RawAudio class #682

Open
wants to merge 13 commits into
base: v3
Choose a base branch
from
Open

v3: Add RawAudio class #682

wants to merge 13 commits into from

Conversation

Th3G33k
Copy link

@Th3G33k Th3G33k commented Apr 5, 2024

Following messages from #680

The 'save to wav' is my own simple implementation, using file specs, and hex viewer of a generated wav file.

Below the changes :

  • added RawAudio class, with .save(path) (support browser, webworker and nodejs)
  • modified some audio pipeline, to return RawAudio object
  • added properties isBrowserEnv and isWebworkerEnv to env

Example use :

const synthesizer = await pipeline('text-to-speech', 'Xenova/mms-tts-eng');
const output = await synthesizer('Hello, my dog is cute');
output.save("audio.wav");

@Th3G33k Th3G33k closed this Apr 6, 2024
@Th3G33k Th3G33k deleted the dev branch April 6, 2024 05:21
@Th3G33k Th3G33k restored the dev branch April 6, 2024 05:23
@Th3G33k Th3G33k reopened this Apr 6, 2024
Copy link
Owner

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Just some questions/suggestions.

I also think we should split this into a .save() and "convert to wave" function, in case the user wants to convert to WAV without having to save the file immediately.

src/env.js Outdated Show resolved Hide resolved
src/pipelines.js Outdated Show resolved Hide resolved
src/pipelines.js Outdated Show resolved Hide resolved
src/utils/audio.js Outdated Show resolved Hide resolved
src/utils/audio.js Outdated Show resolved Hide resolved
src/utils/audio.js Show resolved Hide resolved
src/utils/audio.js Outdated Show resolved Hide resolved
src/utils/audio.js Outdated Show resolved Hide resolved
Th3G33k and others added 2 commits April 10, 2024 22:03
@Th3G33k
Copy link
Author

Th3G33k commented Apr 11, 2024

Thank you @xenova for the review.

Here's the changes I have made :

  • split into two functions : toBlob() and save(path)
  • check type in constructor()
  • in save(), check running environment first before proceeding
  • reduce memory footprint, by using new Blob([wav_header, audio]) instead of allocating additional TypedArray new Uint8Array(buf_size + wav_header.length)
  • add saveBlob(path, blob) in utils/core.js, and use it in RawAudio and RawImage, to directly save blob in the web

@xenova
Copy link
Owner

xenova commented Apr 12, 2024

Thanks! 🤗 Would you mind benchmarking/comparing your code with https://www.npmjs.com/package/audiobuffer-to-wav, which I used in a demo a few months ago. Also, at the moment, we only support 1-channel audios, but their code supports 2-channel + interleaving (see here), and might be good to include.

Other than that, I like the abstractions you introduced for the RawImage and RawAudio classes, and this will be perfect to merge into the v3 branch for a musicgen demo I'm working on 🔥

@Th3G33k
Copy link
Author

Th3G33k commented Apr 12, 2024

I have added support for 2 channels audio + interleave.

interleave(keepOriginalValues) will use a new buffer of length * 2 (keeping original), or a new buffer of length * 1 (overwriting original audio data)

Below a quick benchmark, comparing with encodeWAV(samples) used in the demo.

function benchmark(){
    let i, input, output

    console.time('encodeWAV')
    for(i=0; i<20000; i++){
        input = new Float32Array(i).fill(i)
        output = encodeWAV(input)
        output = new Blob([output])
    }
    console.timeEnd('encodeWAV')

    console.time('RawAudio')
    for(i=0; i<20000; i++){
        input = new Float32Array(i).fill(i)
        output = new RawAudio(input, 16000)
        output = output.toBlob()
    }
    console.timeEnd('RawAudio')
}

/*
encodeWAV: 3216.6669921875 ms
RawAudio: 2702.23291015625 ms
---
encodeWAV: 3296.2138671875 ms
RawAudio: 2768.235107421875 ms
*/

encodeWAV is slower, since it's hard copy all audio values, into a new Buffer.

    for (let i = 0; i < samples.length; ++i, offset += 4) {
        view.setFloat32(offset, samples[i], true)
    }

unit test for interleave

let audio = new RawAudio([new Float32Array([1,2,3,4,5]), new Float32Array([1,2,3,4,5])], 16000)
console.log(audio.interleave(true)[0].toString() == '1,1,2,2,3')

@xenova
Copy link
Owner

xenova commented Apr 22, 2024

Thanks again! Just letting you know this PR is marked for the next release :)

@xenova xenova changed the base branch from main to v3 April 22, 2024 10:41
@Th3G33k Th3G33k changed the title Add RawAudio class and 'save to wav' Add RawAudio class Apr 27, 2024
@Th3G33k
Copy link
Author

Th3G33k commented May 8, 2024

I have merged branch v3 #545 into this PR

@Th3G33k Th3G33k changed the title Add RawAudio class v3: Add RawAudio class May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants