Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want to use AVCaptureSession buffers instead of AVAudioEngine #44

Open
cgfarmer4 opened this issue Mar 4, 2024 · 5 comments
Open

Want to use AVCaptureSession buffers instead of AVAudioEngine #44

cgfarmer4 opened this issue Mar 4, 2024 · 5 comments
Assignees
Labels
enhancement Improves existing code help wanted Extra attention is needed

Comments

@cgfarmer4
Copy link
Contributor

Hey there!

First off, thanks so much for building this awesome library! Its a total pleasure to use and works great. Looking forward to the Metal update. In the meantime, I was curious if you all would accept a PR to allow for AVCaptureSession to be used in the AudioProcessor class instead of AVAudioEngine.

I was thinking of creating a way to pass in a new setupEngine function that allowed for the captureOutput delegate to be used in place of the installTap function. The reason I want to do this is it makes it easier to change the microphone in app instead of relying on the system default.

  1. Would it make sense to allow for this in the AudioProcessor? If so, Im happy to come up with a clean interface proposal.
  2. If no, perhaps theres a way to override the AudioProcessor class and provide an alternate setupEngine function?
@atiorh
Copy link
Contributor

atiorh commented Mar 4, 2024

Thanks for the note @cgfarmer4! @ZachNagengast what do you think?

@cgfarmer4
Copy link
Contributor Author

cgfarmer4 commented Mar 5, 2024

Ah I just found this code ;) // TODO: implement selecting input device

Decided against using AVCaptureSession but instead to just change the device using CoreAudio. Seems to work for macbook + continuity microphone but haven't figured out why it doesn't work for my audio interface yet. Thoughts on this if I can figure out why it doesnt work for my external audio interface?

  1. New assignMicrophoneInput function:
func assignMicrophoneInput(inputNode: AVAudioInputNode, inputDeviceID: AudioDeviceID) {
        guard let audioUnit = inputNode.audioUnit else {
            Logging.error("Failed to access the audio unit of the input node.")
            return
        }
        
        var inputDeviceID = inputDeviceID

        let error = AudioUnitSetProperty(
            audioUnit,
            kAudioOutputUnitProperty_CurrentDevice,
            kAudioUnitScope_Global,
            0,
            &inputDeviceID,
            UInt32(MemoryLayout<AudioDeviceID>.size)
        )
        
        if error != noErr {
            Logging.error("Error setting Audio Unit property: \(error)")
        } else {
            Logging.info("Successfully set input device.")
        }
    }
  1. Update setupEngine
        func setupEngine(inputDeviceID: AudioDeviceID? = nil) throws -> AVAudioEngine {
        let audioEngine = AVAudioEngine()
        let inputNode = audioEngine.inputNode
        let inputFormat = inputNode.outputFormat(forBus: 0)
        
        if let inputDeviceID = inputDeviceID {
            assignMicrophoneInput(inputNode: inputNode, inputDeviceID: inputDeviceID)
        }
  1. Update start recording function to allow for passing in the AudioDeviceID
        func startRecordingLive(inputDeviceID: AudioDeviceID? = nil, callback: (([Float]) -> Void)? = nil) throws {
        audioSamples = []
        audioEnergy = []

        audioEngine = try setupEngine(inputDeviceID: inputDeviceID)

        // Set the callback
        audioBufferCallback = callback
    }

Going to see if I can try some tactics from this thread for my interface but seems hacky.

@ZachNagengast
Copy link
Contributor

@cgfarmer4 thanks for the effort looking into this. This looks promising, although I would also support an additional method that uses AVCaptureSession to generate audioSamples in case some folks already had easy access their apps AVCaptureDevice. There is nothing specifically tied to audioengine in the protocol, we'd just need to make sure it has handling for the various different platforms that don't have access to those apis (watchOS for example doesn't support it). Curious to see how your tests go and would be happy to integrate these back into the AudioProcessor depending on the results.

@ZachNagengast ZachNagengast added enhancement Improves existing code help wanted Extra attention is needed labels Mar 5, 2024
@cgfarmer4
Copy link
Contributor Author

cgfarmer4 commented Mar 6, 2024

Decided against using AVCaptureSession since theres quite a bit of buffer conversion involved that likely adds latency (loosely held hypothesis). This meets my needs since I can take the AVCaptureSession selected AVCaptureDevice and get the AudioDeviceID from it on macOS. AVCaptureSession would give us a list of devices on other OSes but what it wont do, is allow for AVAudioEngine to have its audioUnit changed. In order to get a more comprehensive list of devices from other OSes, we'd need to figure out the buffer conversion mechanism and keep it fast enough from CMSampleBuffer to AVAudioPCMBuffer.

#51

static func getAudioDeviceID(for captureDevice: AVCaptureDevice) -> AudioDeviceID? {
        var propertySize: UInt32 = 0
        var address = AudioObjectPropertyAddress(
            mSelector: kAudioHardwarePropertyDevices,
            mScope: kAudioObjectPropertyScopeGlobal,
            mElement: kAudioObjectPropertyElementMain
        )

        AudioObjectGetPropertyDataSize(AudioObjectID(kAudioObjectSystemObject), &address, 0, nil, &propertySize)

        let deviceCount = Int(propertySize) / MemoryLayout<AudioDeviceID>.size
        var deviceIDs = [AudioDeviceID](repeating: 0, count: deviceCount)
        let status = AudioObjectGetPropertyData(AudioObjectID(kAudioObjectSystemObject), &address, 0, nil, &propertySize, &deviceIDs)

        if status == noErr {
            for id in deviceIDs {
                var uidSize: UInt32 = 0
                var uidAddress = AudioObjectPropertyAddress(
                    mSelector: kAudioDevicePropertyDeviceUID,
                    mScope: kAudioObjectPropertyScopeGlobal,
                    mElement: kAudioObjectPropertyElementMain
                )
                
                AudioObjectGetPropertyDataSize(id, &uidAddress, 0, nil, &uidSize)
                
                var deviceUID: Unmanaged<CFString>?
                var uidPropertySize = UInt32(MemoryLayout.size(ofValue: deviceUID))
                
                let uidStatus = AudioObjectGetPropertyData(id, &uidAddress, 0, nil, &uidPropertySize, &deviceUID)
                
                if uidStatus == noErr, let deviceUID = deviceUID?.takeUnretainedValue() as String? {
                    if captureDevice.uniqueID == deviceUID {
                        return id
                    }
                } else {
                    logger.error("Failed to get device UID with error: \(uidStatus)")
                }
            }
        } else {
            logger.error("Failed to get device IDs with error: \(status)")
        }

        return nil
    }

@cgfarmer4
Copy link
Contributor Author

cgfarmer4 commented Mar 8, 2024

Previously when I was working with SwiftWhisper, I could translate the format from CMSampleBuffers using the float conversion method here. This implementation was not great but good enough for demos. Im curious if theres some conversion here that might be doing similar?

https://gist.github.com/cgfarmer4/182d9d6d1cdf9d219ba0a4db6a23d745#file-capturedelegate-swift-L1-L46
https://gist.github.com/cgfarmer4/182d9d6d1cdf9d219ba0a4db6a23d745#file-audiosessionmanager-swift-L88-L111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves existing code help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants