A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
Updated
May 25, 2024 - Python
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Speaker Diarization, Recognition and Language Identification. Scripts to generate GT using our WebApp and Praat software
A PyTorch-based Speech Toolkit
Segment speech sequences based on speaker transitions, using ML and DSP.
speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names
SA-toolkit: Speaker speech anonymization toolkit in python
This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM++, etc. It is not excluded that more models will be supported in the future. At the same time, this project also supports MelSpectrogram, Spectrogram data preprocessing methods
Introduction to Speech Processing
Python toolkit for speech processing
Final project for the Speaker Recognition course on Udemy, 机器之心, 深蓝学院 and 语音之家
Toolkit aimed on Character Personalities Extraction from Literature Novel Books with Experiments organization in separate folders
Microphone audio playback
本项目使用了EcapaTdnn、ResNetSE、ERes2Net、CAM++等多种先进的声纹识别模型,同时本项目也支持了MelSpectrogram、Spectrogram、MFCC、Fbank等多种数据预处理方法
使用Tensorflow实现声纹识别
基于Kersa实现的声纹识别模型
Investigating Layer-Specific Performance in Speaker Recognition with XLS-R Architecture
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Speaker Recognition deep learning model based on feature extraction from Mel Frequency Cepstral Coefficients. Solution code for Signal Processing Cup 2024.
Add a description, image, and links to the speaker-recognition topic page so that developers can more easily learn about it.
To associate your repository with the speaker-recognition topic, visit your repo's landing page and select "manage topics."