Want to use AVCaptureSession buffers instead of AVAudioEngine #44

cgfarmer4 · 2024-03-04T02:59:21Z

Hey there!

First off, thanks so much for building this awesome library! Its a total pleasure to use and works great. Looking forward to the Metal update. In the meantime, I was curious if you all would accept a PR to allow for AVCaptureSession to be used in the AudioProcessor class instead of AVAudioEngine.

I was thinking of creating a way to pass in a new setupEngine function that allowed for the captureOutput delegate to be used in place of the installTap function. The reason I want to do this is it makes it easier to change the microphone in app instead of relying on the system default.

Would it make sense to allow for this in the AudioProcessor? If so, Im happy to come up with a clean interface proposal.
If no, perhaps theres a way to override the AudioProcessor class and provide an alternate setupEngine function?

The text was updated successfully, but these errors were encountered:

atiorh · 2024-03-04T03:47:38Z

Thanks for the note @cgfarmer4! @ZachNagengast what do you think?

cgfarmer4 · 2024-03-05T04:50:59Z

Ah I just found this code ;) // TODO: implement selecting input device

Decided against using AVCaptureSession but instead to just change the device using CoreAudio. Seems to work for macbook + continuity microphone but haven't figured out why it doesn't work for my audio interface yet. Thoughts on this if I can figure out why it doesnt work for my external audio interface?

New assignMicrophoneInput function:

func assignMicrophoneInput(inputNode: AVAudioInputNode, inputDeviceID: AudioDeviceID) {
        guard let audioUnit = inputNode.audioUnit else {
            Logging.error("Failed to access the audio unit of the input node.")
            return
        }
        
        var inputDeviceID = inputDeviceID

        let error = AudioUnitSetProperty(
            audioUnit,
            kAudioOutputUnitProperty_CurrentDevice,
            kAudioUnitScope_Global,
            0,
            &inputDeviceID,
            UInt32(MemoryLayout<AudioDeviceID>.size)
        )
        
        if error != noErr {
            Logging.error("Error setting Audio Unit property: \(error)")
        } else {
            Logging.info("Successfully set input device.")
        }
    }

Update setupEngine

        func setupEngine(inputDeviceID: AudioDeviceID? = nil) throws -> AVAudioEngine {
        let audioEngine = AVAudioEngine()
        let inputNode = audioEngine.inputNode
        let inputFormat = inputNode.outputFormat(forBus: 0)
        
        if let inputDeviceID = inputDeviceID {
            assignMicrophoneInput(inputNode: inputNode, inputDeviceID: inputDeviceID)
        }

Update start recording function to allow for passing in the AudioDeviceID

        func startRecordingLive(inputDeviceID: AudioDeviceID? = nil, callback: (([Float]) -> Void)? = nil) throws {
        audioSamples = []
        audioEnergy = []

        audioEngine = try setupEngine(inputDeviceID: inputDeviceID)

        // Set the callback
        audioBufferCallback = callback
    }

Going to see if I can try some tactics from this thread for my interface but seems hacky.

ZachNagengast · 2024-03-05T18:03:17Z

@cgfarmer4 thanks for the effort looking into this. This looks promising, although I would also support an additional method that uses AVCaptureSession to generate audioSamples in case some folks already had easy access their apps AVCaptureDevice. There is nothing specifically tied to audioengine in the protocol, we'd just need to make sure it has handling for the various different platforms that don't have access to those apis (watchOS for example doesn't support it). Curious to see how your tests go and would be happy to integrate these back into the AudioProcessor depending on the results.

cgfarmer4 · 2024-03-06T04:35:18Z

Decided against using AVCaptureSession since theres quite a bit of buffer conversion involved that likely adds latency (loosely held hypothesis). This meets my needs since I can take the AVCaptureSession selected AVCaptureDevice and get the AudioDeviceID from it on macOS. AVCaptureSession would give us a list of devices on other OSes but what it wont do, is allow for AVAudioEngine to have its audioUnit changed. In order to get a more comprehensive list of devices from other OSes, we'd need to figure out the buffer conversion mechanism and keep it fast enough from CMSampleBuffer to AVAudioPCMBuffer.

#51

static func getAudioDeviceID(for captureDevice: AVCaptureDevice) -> AudioDeviceID? {
        var propertySize: UInt32 = 0
        var address = AudioObjectPropertyAddress(
            mSelector: kAudioHardwarePropertyDevices,
            mScope: kAudioObjectPropertyScopeGlobal,
            mElement: kAudioObjectPropertyElementMain
        )

        AudioObjectGetPropertyDataSize(AudioObjectID(kAudioObjectSystemObject), &address, 0, nil, &propertySize)

        let deviceCount = Int(propertySize) / MemoryLayout<AudioDeviceID>.size
        var deviceIDs = [AudioDeviceID](repeating: 0, count: deviceCount)
        let status = AudioObjectGetPropertyData(AudioObjectID(kAudioObjectSystemObject), &address, 0, nil, &propertySize, &deviceIDs)

        if status == noErr {
            for id in deviceIDs {
                var uidSize: UInt32 = 0
                var uidAddress = AudioObjectPropertyAddress(
                    mSelector: kAudioDevicePropertyDeviceUID,
                    mScope: kAudioObjectPropertyScopeGlobal,
                    mElement: kAudioObjectPropertyElementMain
                )
                
                AudioObjectGetPropertyDataSize(id, &uidAddress, 0, nil, &uidSize)
                
                var deviceUID: Unmanaged<CFString>?
                var uidPropertySize = UInt32(MemoryLayout.size(ofValue: deviceUID))
                
                let uidStatus = AudioObjectGetPropertyData(id, &uidAddress, 0, nil, &uidPropertySize, &deviceUID)
                
                if uidStatus == noErr, let deviceUID = deviceUID?.takeUnretainedValue() as String? {
                    if captureDevice.uniqueID == deviceUID {
                        return id
                    }
                } else {
                    logger.error("Failed to get device UID with error: \(uidStatus)")
                }
            }
        } else {
            logger.error("Failed to get device IDs with error: \(status)")
        }

        return nil
    }

cgfarmer4 · 2024-03-08T05:35:52Z

Previously when I was working with SwiftWhisper, I could translate the format from CMSampleBuffers using the float conversion method here. This implementation was not great but good enough for demos. Im curious if theres some conversion here that might be doing similar?

https://gist.github.com/cgfarmer4/182d9d6d1cdf9d219ba0a4db6a23d745#file-capturedelegate-swift-L1-L46
https://gist.github.com/cgfarmer4/182d9d6d1cdf9d219ba0a4db6a23d745#file-audiosessionmanager-swift-L88-L111

atiorh assigned ZachNagengast Mar 4, 2024

ZachNagengast added enhancement Improves existing code help wanted Extra attention is needed labels Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Want to use AVCaptureSession buffers instead of AVAudioEngine #44

Want to use AVCaptureSession buffers instead of AVAudioEngine #44

cgfarmer4 commented Mar 4, 2024

atiorh commented Mar 4, 2024

cgfarmer4 commented Mar 5, 2024 •

edited

ZachNagengast commented Mar 5, 2024

cgfarmer4 commented Mar 6, 2024 •

edited

cgfarmer4 commented Mar 8, 2024 •

edited

Want to use AVCaptureSession buffers instead of AVAudioEngine #44

Want to use AVCaptureSession buffers instead of AVAudioEngine #44

Comments

cgfarmer4 commented Mar 4, 2024

atiorh commented Mar 4, 2024

cgfarmer4 commented Mar 5, 2024 • edited

ZachNagengast commented Mar 5, 2024

cgfarmer4 commented Mar 6, 2024 • edited

cgfarmer4 commented Mar 8, 2024 • edited

cgfarmer4 commented Mar 5, 2024 •

edited

cgfarmer4 commented Mar 6, 2024 •

edited

cgfarmer4 commented Mar 8, 2024 •

edited