Research & Projects

Personalization for Automatic Speech Recognition

make our voice assistants understand us more accurately

Actively involved in 2021 & 2024-2025

Automatic speech recognition (ASR) is the process of automatically getting the textual information that the raw audio signal is carrying, thus essentially converting speech to text. With the technological advances of the last few years, and especially with the advent of deep learning, machines can achieve remarkable results for the... [Read More]

Speaker Role Recognition

and how to combine it with other speech processing tasks

Actively involved in 2018-2022

Individuals assume distinct roles in different situations throughout their lives and people who consistently adopt particular roles develop specific commonalities in behavior. As a result, roles can be defined in terms of observable tendencies and behavioral patterns that can be manifest through a wide range of modalities during a conversational... [Read More]

Automating Behavioral Coding in Psychotherapy

bring the machines in the game

Actively involved in 2017-2021

Psychotherapy quality assessment is typically addressed by human raters who evaluate recorded sessions along specific behavioral codes, as defined by standard coding manuals. The recordings capture the complex series of interactions between the therapist and the client, and as such, they encode the active ingredients of therapy. However, the time... [Read More]

Memory Augmented Networks for Continuous Speaker Identification

who spoke when? oh, let me remember

Actively involved in 2019

Speaker identification is the task of determining the identity of the person uttering a particular phrase, assuming a finite set of pre-enrolled speakers is given. Applying a continuous automatic speaker identification system on recorded meetings with multiple participants is the main problem I was working on during my 2019 summer... [Read More]

Siamese CNNs for Speaker Change Detection

a fancy way to compare similarities

Actively involved in 2017

Speaker change detection is the task of dividing a speech signal into speaker-homogeneous segments. In order to achieve this, the original signal is partitioned in consecutive small windows and we have to check how similar any two consecutive windows are. If they look similar enough, they are considered to belong... [Read More]

Automatic Sleep Staging Using HMMs

did you sleep well last night?

Actively involved in 2016

Human sleep can be divided into time periods with similar characteristics called sleep stages. Patterns found in biomedical signals, such as those generated by the cerebral cortex, the muscles of the face, and the movement of the eyes, are used to label small time windows in a procedure known as... [Read More]

Acoustic Features for Robust Speech Recognition

no, they are not MFCCs

Actively involved in 2015-2017

Sophisticated techhniques for acoustic and language modeling have resulted in Automatic Speech Recognition (ASR) systems which can even beat human performance under clean conditions, with the speech signal typically represented by the Mel-Frequency Cepstrum Coefficients (MFCCs). However, when the speech signal is distorted by background noise or reverberation there is... [Read More]

Identifying Saliency for Movie Summarization

let's create some trailers!

Actively involved in 2014

Humans have a unique capability of quickly identifying points of interests in a visual scene. Being able to efficiently extract, through a computational process, such salient segments in a video would lead to high-quality automated movie summaries. Motivated by neurobiological and psychophysical evidence about the way the human brain performs... [Read More]