Automatic speech recognition (ASR) is the process of automatically getting the textual information that the raw audio signal is carrying, thus essentially converting speech to text. With the technological advances of the last few years, and especially with the advent of deep learning, machines can achieve remarkable results for the...
[Read More]
Speaker Role Recognition
and how to combine it with other speech processing tasks
Individuals assume distinct roles in different situations throughout their lives and people who consistently adopt particular roles develop specific commonalities in behavior. As a result, roles can be defined in terms of observable tendencies and behavioral patterns that can be manifest through a wide range of modalities during a conversational...
[Read More]
Automating Behavioral Coding in Psychotherapy
bring the machines in the game
Psychotherapy quality assessment is typically addressed by human raters who evaluate recorded sessions along specific behavioral codes, as defined by standard coding manuals. The recordings capture the complex series of interactions between the therapist and the client, and as such, they encode the active ingredients of therapy. However, the time...
[Read More]
Memory Augmented Networks for Continuous Speaker Identification
who spoke when? oh, let me remember
Speaker identification is the task of determining the identity of the person uttering a particular phrase, assuming a finite set of pre-enrolled speakers is given. Applying a continuous automatic speaker identification system on recorded meetings with multiple participants is the main problem I was working on during my 2019 summer...
[Read More]
Siamese CNNs for Speaker Change Detection
a fancy way to compare similarities
Speaker change detection is the task of dividing a speech signal into speaker-homogeneous segments. In order to achieve this, the original signal is partitioned in consecutive small windows and we have to check how similar any two consecutive windows are. If they look similar enough, they are considered to belong...
[Read More]
Automatic Sleep Staging Using HMMs
did you sleep well last night?
Human sleep can be divided into time periods with similar characteristics called sleep stages. Patterns found in biomedical signals, such as those generated by the cerebral cortex, the muscles of the face, and the movement of the eyes, are used to label small time windows in a procedure known as...
[Read More]
Acoustic Features for Robust Speech Recognition
no, they are not MFCCs
Sophisticated techhniques for acoustic and language modeling have resulted in Automatic Speech Recognition (ASR) systems which can even beat human performance under clean conditions, with the speech signal typically represented by the Mel-Frequency Cepstrum Coefficients (MFCCs). However, when the speech signal is distorted by background noise or reverberation there is...
[Read More]
Identifying Saliency for Movie Summarization
let's create some trailers!
Humans have a unique capability of quickly identifying points of interests in a visual scene. Being able to efficiently extract, through a computational process, such salient segments in a video would lead to high-quality automated movie summaries. Motivated by neurobiological and psychophysical evidence about the way the human brain performs...
[Read More]