Technical Programme

Programme at a glance

Registration, technical sessions, lunch and coffee breaks will take place at The Wave, Sheffield.

Wednesday 14 June

Thursday 15 June

Detailed Programme

Keynote A

Speaker:

Dr. Kate Knill

Title:

Foundation Models in Spoken Language Processing: Time to go home or make hay?

Poster Session A

(A-Z by the Titles)

A Commonsense-enhanced and Flexible Task-Based Dialogue Manager for Socially Assistive Robots. Carl Strathearn

A Low-Resource Pipeline for Text-to-Speech from Found Data with Application to Scottish Gaelic. Dan Wells, Korin Richmond, William Lamb

Acoustic-to-Articulatory Inversion for Pronunciation Feedback. Charles G McGhee, Mark Gales, Katherine M Knill

An Adaptive Autoregressive Pre-Whitener for Speech and Acoustic Signals Based on Parametric NMF. Alfredo Esquivel Jaramillo

An Scéalaí – the Intelligent-CALL Platform. Neasa Ní Chiaráin

Artificial Voice Design for a Social Robot: An Empirical Investigation. Guanyu Huang

Beat-Based Scoring Systems of Rhythmicity of Poem and Oratory Belonging to Stress and Syllable-Timed Languages. Bader M Alotaibi

Comparison of New Curriculum Criteria for End-to-End ASR. Georgios Karakasidis, Mikko Kurimo, Peter Bell, Tamas Grosz

Distant Alignment between Utterances using Multi-distance N-pair Loss. Chanho Park, Thomas Hain

Do Dialogue Representations Align with Perception? An Empirical Study. Sarenne Wallbridge, Peter Bell, Catherine Lai

Do We Hyperarticulate on Zoom? Sam O'Connor Russell, Ayushi Pandey, Naomi Harte

End-To-End Spoken Language Understanding with Tree-Constrained Pointer Generator. Guangzhi Sun, Chao Zhang, Phil Woodland

Exploring Catastrophic Forgetting for Multi-Lingual Automatic Speech Recognition. Ed Storey, Naomi Harte

Identifying People with Mild Cognitive Impairment at Risk of Developing Dementia Using Speech Analysis. Bahman Mirheidari

Imaginary Mask Estimation in Complex Masking for Speech Enhancement. Georgiana-Elena Sfeclis, Ben Milner

Mol an Óige – Development of Irish Phonological Awareness and Early Literacy. Ailbhe Ni Chasaide

OverFlow: Fusing Neural HMMs and Normalising Flows for Probabilistic TTS. Shivam Mehta, Ambika D Kirkland, Harm Lameris, Jonas Beskow, Eva Szekely, Gustav Eje Henter

PAMGAN+/-: Improving Phase-Aware Speech Enhancement Performance via Expanded Discriminator Training. George L Close, Stefan Goetze, Thomas Hain

Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning. Elaf Islam, Thomas Hain

Singing Voice Banking and Conversion for Transgender Singers. Cliodhna Hughes, Ning Ma, Guy Brown

Speaker-based Information Retrieval in the Wild. Erfan Loweimi

Speaking Style Analysis on Conversational Speech Corpora. Adaeze Adigwe

Spectral Clustering-Aware Learning of Embeddings for Speaker Diarisation. Evonne Lee, Guangzhi Sun, Chao Zhang, Phil Woodland

Synthesising and Assessing Dramatic Speech. Emily Lau, Brechtje Post, Katherine M Knill

Synthetic Voices for an Endangered Language Community: the Irish Experience. Ailbhe Ni Chasaide, Andy Murphy

TAME Pain: Trustworthy AssessMEnt of Pain from Speech and Audio. Beatrice Pakenham-Walsh, Jennifer Williams

The Clarity & Cadenza Challenges: Breaking Barriers to Stimulate Progress in Signal Processing for Those with a Hearing Loss. Gerardo Roa Dabike, Jon Barker, Trevor Cox, William Whitmer, Bruno Fazenda, Alinka Greasley, Rebecca R Vos, Scott Bannister, Michael Akeroyd, Jennifer Firth, Simone Graetzer, Graham Naylor, John F Culling

Unveiling Acoustic Embedding Space: Decomposing Word Embeddings into Subword Embeddings. Amit Meghanani, Thomas Hain

Ursa: Benefits of Scaling Self-Supervised Learning for Automatic Speech Recognition. Bethan J Thomas, Benedetta Cevoli, Jamie Dougherty

Using a Large Language Model to Control Speaking Style for Expressive TTS. Atli Thor Sigurgeirsson, Simon King

VOCEX: Voice Frame-Level and Utterance-Level Attribute Extraction for Speech Synthesis. Christoph D Minixhofer, Ondrej Klejch, Peter Bell

Keynote B

Speaker
Dr. Neasa Chiaráin

Title
Speech Technology for Irish: the ABAIR initiative

Oral Session A

An Objective Measure of Lipsync Quality with Non-Aligned Speech Input
Oscar Saz, Luca McArthur, James Parr-Burman and Jan Medvesek
A Multi-Label Speech Emotion Recognition for Cross-Cultural Communication
Tassadaq Hussain, Islam H Nassar, Zhixi Cai, Hamid Rezatofighi, Munawar Hayat and Nicholas Cummins
Identifying Voices and Events from Audio: A Forensic and Law Enforcement Perspective
Anil Alexander and Finnian Kelly

Keynote C

Speaker
Dr. Enzo De Sena

Title
Speech Auralisation

Poster Session B

(A-Z by the Titles)

A Diagnostic for Quantifying Dialect Bias in Balanced Corpora: An Irish Case Study. Liam Lonergan, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ni Chasaide

A Study of Various Encoders for Language Identification Models. Jeffrey Josanne Michael, Toby Godwin, Oscar Saz, Salil Deena

Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax. Keqi Deng, Phil Woodland

An Exploration into Social Attention for Visual Context Modelling in Active Speaker Detection. Jason Clarke, Yoshi Gotoh, Stefan Goetze

An Initial Empirical Analysis of the Effect of Sampling Variability in a Forensic Voice Comparison System. Phil Weber

Analysis of the Communication Rate Gap for Users of Augmentative and Alternative Communication (AAC) Systems. Hussein S A Yusufali, Stefan Goetze, Roger Moore

Assessing Early-stage Schizophrenia based on Paralinguistic Analysis of Speech. Julianna Miss Olah, Kelly Diederen, Maite Arribas, Thomas Spencer, Nicholas Cummins

Assisting Human Detection of Audio Deepfakes. Thomas Cutts, Jennifer Williams, Sebastian Stein

CITED: Ciphered Text Data Augmentation for Low-Resource Acoustic Model Training. Muhammad Umar Farooq, Thomas Hain

Classification of Cognitive Status using Acoustic Features Extracted from Voice Assistant Commands. Melanie Jouaiti, Ravi Vaidyanathan

CognoSpeak: a Cognitive Health Assessment Tool (CcHAT). Nathan Daniel Pevy, Heidi Christensen, Daniel Blackburn

Detecting Vocal Pathologies from Speech Using Transfer Learning. Mary L Paterson, Luisa Cutillo, James Moor

Evaluating Adversarial Networks for Unsupervised Speech Recognition. Mattias George Cross, Anton Ragn

Experiments in Self-Training an ASR System for Irish. Neimhin Robinson Gunning

Fairness in Speech Processing with an Emphasis on Medical Applications. Hend ElGhazaly, Nafise Sadat Moosavi, Heidi Christensen

Geabaire (AAC) – a Voice for Those Without. Ailbhe Ni Chasaide

Investigate Privacy Risks in Speech Depression Detection: An Experimental Study on Demographic Information Leakage. Basmah M Alsenani, Tanaya Guha, Alessandro Vinciarelli

Investigating Confounding Variables Effect in Speech Models for Depression. Stefano Gloria, Nicholas Cummins

Large Vocabulary Continuous Speech Recognition of MP3 Call Centre Data. Kris Y Hong, Dmitry Sityaev

Learnable Frontends That Do Not Learn: Quantifying Sensitivity to Filterbank Initialisation. Mark Anderson, Tomi H Kinnunen, Naomi Harte

Modelling the Growth of Vocabulary in Textual Documents. Martin J Tunnicliffe, Gordon Hunter

N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space. Rao Ma, Mark Gales, Katherine M Knill, Mengjie Qian

Prosody in Referential Communication with a Human or a Computer Partner. Iona Gessinger, Benjmain Cowan

Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces. Oli D Liu, Hao Tang, Sharon Goldwater

Social Robot Nonverbal Vocalization System with Large Language Model. Chuang Yu, Guanyu Huang, Shuang Wu

Speech Audio Corrector: Using Speech from Non-Target Speakers for One-off Correction of Mispronunciations in Grapheme-Input Text-To-Speech. Jason K Y Fong

Speech Emotion Recognition Based on Hierarchical Classification Using Different Modalities in Different Levels. Nawal Alqurashi, Yuhua Li, Kirill Sidorov, David Marshall

Spontaneous TTS with Prosody Control Using Neural HMMs. Harm Lameris, Shivam Mehta, Gustav Eje Henter, Joakim Gustafson, Eva Szekely

The Drill Without the Kill: the Irregular Verb Bot. Neasa Ní Chiaráin

The Effects of Reverberation on Paralinguistic Feature Extraction in Healthy Controls: First Steps Towards Robust Mobile Health Assessments. Judith Dineley, Ewan Car, Faith Matcham, Johnny Downs, Richard Dobson, Thomas Quatieri, Nicholas Cummins

Topic Retrieval for System Development in the Wild. Mengjie Qian, Erfan Loweimi, Mark Gales

Using Artificial Intelligence to Assist in the Understand of Speech that Has Been Affected By Neurological Damage of People Living with Parkinson’s Disease. Paul Gadd

Vocal Changes and Language Use Associated with Frequent Cannabis Use. Julianna Miss Olah, Kelly Diederen, Thomas Spencer, Nicholas Cummins

Poster Session C

(A-Z by the Titles)

A Study on Microphone Array Position Calibration for Hearing Aids. Shengchang Cao, Stefan Goetze, Jon Barker

Adapting an Unadaptable ASR System. Mengjie Qian, Rao Ma, Mark Gales, Katherine M Knill

Adapting Pretrained Models for Adult To Child Voice Conversion. Protima Nomo Sudro, Anton Ragni, Thomas Hain

Adversarial Learning of Neural User Simulators for Dialogue Policy Optimization. Simon Keizer, Caroline Dockes, Norbert Braunschweiler, Svetlana Stoyanchev, Rama S Doddipatla

Analysis of Speech Datasets for Communication Scenarios for Hearing Aid Users. Robert Sutherland, Stefan Goetze, Jon Barker

Assessment of L2 Oral Proficiency Using Self-Supervised Speech Representation Learning. Stefano Bannò, Katherine M Knill, Marco Matassoni, Vyas Raina, Mark Gales

Automatic Assessment of Conversational Speaking Tests. Simon W McKnight, Arda Civelekoglu, Mark Gales, Katherine M Knill

Corpus Collection Considerations in the Minority Language Context. Neasa Ní Chiaráin

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra. Zhengjun Yue, Erfan Loweimi, Zoran Cvetkovic

Efficient Intelligibility Evaluation Using Keyword Spotting: A Study on Audio-Visual Speech Enhancement. Andrea L Aldana, Cassia Valentini, Ondrej Klejch, Peter Bell

Ensemble Prosody Prediction for Expressive Speech Synthesis. Zack Hodari, Vivian Hu

Exploring the Use of Self-Supervised Learning for Spoken Language Identification. Sam McNulty, Salil Deena

Humans to Machines: Representing Exemplars. Rhiannon Mogridge, Anton Ragni

Incremental Training Changes to Improve Synthesis Quality. Alexandra Torresquintero, Tomás Gómez Ibarrondo, Christopher G. R. Wallis, Vivian Hu, James Leoni, Devang Savita Ram Mohan, Zack Hodari

Inner Speech Decoding with Bimodal fMRI-EE. Scott DL Wellington

Investigating Sequence-Level Normalization For CTC-Like End-to-End ASR. Zeyu Zhao, Peter Bell

Language Proficiency Influences Intonational Convergence in L2 English Speech Imitation. Zheng Yuan, Alessandro D’Ausilio

Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models. Sung-Lin Yeh, Hao Tang

Leveraging Cross-Utterance Context for ASR Decoding. Robert J Flynn, Anton Ragni

Matching Acoustic and Perceptual Measures of Phonation Assessment in Disordered Speech - A Case Study. Melanie Jouaiti, Pippa Kirby, Ravi Vaidyanathan

Multimodal Dyadic Impression Recognition via Listener Adaptive Cross-Domain Fusion. Yuanchao Li, Peter Bell, Catherine Lai

On Data Sampling Strategies for Training Neural Network Speech Separation Models. William Ravenscroft, Stefan Goetze, Thomas Hain

On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning. Titouan Parcollet, Shucong Zhang, Rogier C. van Dalen, Alberto Gil C. P. Ramos, Sourav Bhattacharya

Query Based Acoustic Summarization for Podcasts. Samantha Kotey, Rozenn Dahyot, Naomi Harte

Real-Time Personalised Speech Enhancement Transformers with Dynamic Cross-attended Speaker Representations. Shucong Zhang, Malcolm Chadwick, Alberto Gil C. P. Ramos, Titouan Parcollet, Rogier C. van Dalen, Sourav Bhattacharya

Residual Energy-Based Models for Speech Synthesis. Wanli Sun, Zehai Tu, Anton Ragni

The ABAIR Suite of Irish Speech Technology and Applications: an Overview. Ailbhe Ni Chasaide

The Importance of Phonemization Accuracy for TTS Acoustic Modeling. Zack Hodari, Tomás Ibarrondo

Towards Articulatory Control of Speech Synthesis based on Optimal Control Theory. Zihang Peng

Understanding the Behavior of Automatic Speaker Recognition Systems for Application in Forensic Casework. Poppy Welch, Vincent Hughes, Jessica Wormald, Chenzi Xu, Paul Foulkes, Philip Harrison, Finnian Kelly, David van der Vloed

What Do the Measures of Utterance Fluency Employed in Automatic Speech Evaluation (ASE) Tell Us About Second Language Oral Proficiency? Zoe Handley

Why Say Anything? Roger Moore

Oral Session B

Exploring Agreement between Language Identity and Matrix Language in Code-Switched Speech
Olga Iakovenko and Thomas Hain
Efficient Control of Prosody Using Sparse Human Input
Dan Andrei Iliescu, Devang Savita Ram Mohan, Tian Huey Teh and Zack Hodari
Speech Technology in Manufacturing
Lindsay Lee

Page updated

Report abuse