Chunk-Level Attention SER (PyTorch)

This is a PyTorch implementation of chunk-level speech emotion recognition (SER) framework in the paper for the MSP-Podcast corpus.

Suggested Environment and Requirements

Python 3.6+
Ubuntu 18.04+
torch version 1.4.0+
CUDA 10.0+
The scipy, numpy and pandas...etc conventional packages
The MSP-Podcast corpus (request to download from UTD-MSP lab website)
The IS13ComParE LLDs (acoustic features) extracted by OpenSmile (users can refer to the opensmile-LLDs-extraction repository)

How to run

After extracted the IS13ComParE LLDs (e.g., XXX_llds/feat_mat/*.mat) for MSP-Podcast [whatever version] corpus, we use the 'labels_concensus.csv' provided by the corpus as the default input label setting.

change data & label root paths in norm_para.py, then run it to get z-norm parameters (mean and std) based on the Train set. We also provide the parameters of the v1.6 corpus in the 'NormTerm' folder.
change data & label root paths in training.py for LSTM model, the running args are,
- -iter: maximum training iterations
- -batch: batch size for training
- -emo: emotion attributes (Act, Dom or Val)
- -atten: type of chunk-level attneiton model (NonAtten, GatedVec, RnnAttenVec or SelfAttenVec)
- run in the terminal

python training.py -iter 5000 -batch 128 -emo Dom -atten SelfAttenVec

change data & label & model root paths in testing.py for the testing results based on the MSP-Podcast test set,
- run in the terminal

python testing.py -iter 5000 -batch 128 -emo Dom -atten SelfAttenVec

Pre-trained models

We provide some trained models based on version 1.6 of the MSP-Podcast in the 'trained_model_v1.6' folder. These PyTorch implementation models have been verified to have similar CCC performance trends with the original paper using Keras implementation.

Model	Act	Val	Dom
LSTM-RnnAttenVec (Keras)	0.6955	0.3006	0.6175
LSTM-SelfAttenVec (Keras)	0.6837	0.3337	0.6004
LSTM-RnnAttenVec (PyTorch)	0.6906	0.2747	0.6132
LSTM-SelfAttenVec (PyTorch)	0.7099	0.3206	0.6299

Users can get these results by running the testing.py with corresponding args.

For general usage

The implementation is for the MSP-Podcast corpus, however, the framework can be applied on general speech-based sequence-to-one tasks (e.g., speaker recognition, gender detection, acoustic event classification or SER...etc). If you want to apply the framework on your own tasks, here are some important parameters need to be specified in the DynamicChunkSplitData functions under the utils.py file,

max duration in second of your corpus (i.e., Tmax)
desired chunk window length in second (i.e., Wc)
number of chunks splitted in a sentence (i.e., C = ceiling of Tmax/Wc)
number of frames within a chunk (i.e., m)
scaling factor to increase the splited chunks number (i.e., n=1, 2 or 3 are suggested)
remember to change NN model dimensions: feat_num, time_step and C

Reference

If you use this code, please cite the following paper:

Wei-Cheng Lin and Carlos Busso, "Chunk-Level Speech Emotion Recognition: A General Framework of Sequence-to-One Dynamic Temporal Modeling"

@article{Lin_2023_4,
  	author={W.-C. Lin and C. Busso},
  	title={Chunk-Level Speech Emotion Recognition: A General Framework of Sequence-to-One Dynamic Temporal Modeling},
  	journal={IEEE Transactions on Affective Computing},
  	number={2},
  	volume={14},
	pages={1215-1227},
	year={2023},
 	month={April-June},
  	doi={10.1109/TAFFC.2021.3083821},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
NormTerm		NormTerm
images		images
trained_model_v1.6		trained_model_v1.6
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
model.py		model.py
norm_para.py		norm_para.py
run.sh		run.sh
testing.py		testing.py
training.py		training.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NormTerm

NormTerm

images

images

trained_model_v1.6

trained_model_v1.6

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

dataloader.py

dataloader.py

model.py

model.py

norm_para.py

norm_para.py

run.sh

run.sh

testing.py

testing.py

training.py

training.py

utils.py

utils.py

Repository files navigation

Chunk-Level Attention SER (PyTorch)

Suggested Environment and Requirements

How to run

Pre-trained models

For general usage

Reference

About

Releases

Packages

Languages

License

winston-lin-wei-cheng/Chunk-Level-Attention-SER-PyTorch

Folders and files

Latest commit

History

Repository files navigation

Chunk-Level Attention SER (PyTorch)

Suggested Environment and Requirements

How to run

Pre-trained models

For general usage

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages