Temporal-Enhanced DeepEmoCluster

The temporal-enhanced DeepEmoCluster adds additional sentence-level temporal modeling to further improve the DeepEmoCluster recognition performances. It uses the two proposed temporal modeling approaches:

Temporal Net- Temp-GRU, Temp-CNN, Temp-Trans
Triplet loss- Temp-Triplet

NOTE: The experiments and the provided pretrained models were based on the MSP-Podcast v1.8 corpus in the paper.

Suggested Environment and Requirements

Python 3.6+
Ubuntu 18.04
CUDA 10.0+
pytorch version 1.4.0
librosa version 0.7.0
faiss version 1.6.0
The scipy, numpy and pandas...etc common packages
The MSP-Podcast corpus (request to download from UTD-MSP lab website)

Feature Extraction & Preparation

Using the feat_extract.py to extract 128-mel spectrogram features for every speech segment in the corpus (remember to change I/O paths in the .py file). Then, use the norm_para.py to save normalization parameters for our framework's pre-processing step. The parameters will be saved in the generated 'NormTerm' folder. We have provided the parameters of the v1.8 corpus in this repo.

Training from Scratch

Prepare the 128-mel spectrogram features of the MSP-Podcast corpus (it can be any version)
Change data root & label paths (the 'labels_concensus.csv' file provided with the corpus) in main.py, the running args are,
- -ep: number of epochs
- -batch: batch size for training
- -emo: emotional attributes (Act, Dom or Val)
- -nc: number of clusters in the latent space for the cluster classifier
- -mt: temporal modeling type (Temp-GRU, Temp-CNN, Temp-Trans or Temp-Triplet)
- -unlabel: the desired size of unlabeled dataset to perform semi-supervised training (if no available unlabeled data, assign it to 0 and it will only perform fully supervised learning)
- run in the terminal
- the trained models will be saved under the generated 'trained_models' folder

python main.py -ep 30 -batch 64 -emo Val -nc 10 -mt Temp-Triplet -unlabel 15000

Evaluation for the trained models using the online_testing.py. The results are based on the MSP-Podcast pre-defined test set,
- run in the terminal

python online_testing.py -ep 30 -batch 64 -emo Val -nc 10 -mt Temp-Triplet -unlabel 15000

Pre-trained models

We provide some pretrained models based on version 1.8 of the MSP-Podcast in the 'trained_models' folder. The CCC performances of models based on the test set are shown in the following table. Note that the results are slightly different from the paper since we performed statistical test in the paper (i.e., we averaged multiple trails).

Temporal Modeling Approach	Act(30-clusters)	Dom(30-clusters)	Val(10-clusters)
Temp-Trans (SSL)	0.5741	0.4839	0.1801

Users can get these results by running the online_testing.py with the corresponding args.

End-to-End Emotion Prediction Process

We provide the end-to-end prediction process that alows users to directly make emotion predictions (i.e., arousal, domiance and valence) on your own dataset or any audio files (audio spec: WAV file, 16k sampling rate and mono channel) based on the provided pretrained models. Users just need to change the input folder path in prediction_process.py to run the predictions and the output results will be saved as a 'pred_result.csv' file under the same directory.

Reference

If you use this code, please cite the following paper:

Wei-Cheng Lin and Carlos Busso, "Deep temporal clustering features for speech emotion recognition"

@article{Lin_2024,
  author = {W.-C. Lin and C. Busso},
  title = {Deep temporal clustering features for speech emotion recognition},
  journal = {Speech Communication},
  volume = {157},
  number = {},
  year = {2024},
  pages = {103027},
  month = {February},
  doi={10.1016/j.specom.2023.103027},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
NormTerm		NormTerm
images		images
models		models
trained_models		trained_models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clustering.py		clustering.py
dataloader.py		dataloader.py
feat_extract.py		feat_extract.py
main.py		main.py
norm_para.py		norm_para.py
online_testing.py		online_testing.py
prediction_process.py		prediction_process.py
utils.py		utils.py

License

winston-lin-wei-cheng/Temporal-Enhanced-DeepEmoCluster

Folders and files

Latest commit

History

Repository files navigation

Temporal-Enhanced DeepEmoCluster

Suggested Environment and Requirements

Feature Extraction & Preparation

Training from Scratch

Pre-trained models

End-to-End Emotion Prediction Process

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages