Next week marks the beginning of the 2019 Interspeech Conference in Graz, Austria — Cogito researchers will present a paper which looks at the phenomenon of gender bias in speech emotion recognition. We propose a simple model training procedure which is both effective at mitigating bias and is more stable during training than a highly cited baseline method.

By using a very standard neural network model, based on 2D convolutional layers applied to Mel frequency coefficients, trained to recognize emotional activation on a dataset of 33,000+ naturally occurring utterances from radio shows — we demonstrate that model performance is more favorable for male speakers compared to females. A popular de-biasing approach, previously proposed by researchers at Google and Stanford (see paper), is found to be effective at improving fairness across gender but at the cost of introducing highly unstable model training and reduced accuracy.

To learn more about Cogito’s presence at Interspeech 2019, please check out this post on Medium.

More blogs by Dr. John Kane:

Dr. John Kane
Dr. John Kane

John comes to Cogito with a background in signal processing research and speech technology development. His experience in conversational analysis and measuring of social signals like tone-of-voice, voice quality and timing, make him integral to the development of next generation vocal behavioral analytics at Cogito. John holds a masters and Ph.D. in signal processing and speech technology, and a bachelors in business and marketing.