Technical Programme
Programme at a glance
Registration, technical sessions, lunch and coffee breaks will take place at The Wave, Sheffield.
Wednesday 14 June
Thursday 15 June
Detailed Programme
Keynote A
Speaker:
Dr. Kate Knill
Title:
Foundation Models in Spoken Language Processing: Time to go home or make hay?
Poster Session A
(A-Z by the Titles)
A Commonsense-enhanced and Flexible Task-Based Dialogue Manager for Socially Assistive Robots. Carl Strathearn
A Low-Resource Pipeline for Text-to-Speech from Found Data with Application to Scottish Gaelic. Dan Wells, Korin Richmond, William Lamb
Acoustic-to-Articulatory Inversion for Pronunciation Feedback. Charles G McGhee, Mark Gales, Katherine M Knill
An Adaptive Autoregressive Pre-Whitener for Speech and Acoustic Signals Based on Parametric NMF. Alfredo Esquivel Jaramillo
An Scéalaí – the Intelligent-CALL Platform. Neasa Ní Chiaráin
Artificial Voice Design for a Social Robot: An Empirical Investigation. Guanyu Huang
Beat-Based Scoring Systems of Rhythmicity of Poem and Oratory Belonging to Stress and Syllable-Timed Languages. Bader M Alotaibi
Comparison of New Curriculum Criteria for End-to-End ASR. Georgios Karakasidis, Mikko Kurimo, Peter Bell, Tamas Grosz
Distant Alignment between Utterances using Multi-distance N-pair Loss. Chanho Park, Thomas Hain
Do Dialogue Representations Align with Perception? An Empirical Study. Sarenne Wallbridge, Peter Bell, Catherine Lai
Do We Hyperarticulate on Zoom? Sam O'Connor Russell, Ayushi Pandey, Naomi Harte
End-To-End Spoken Language Understanding with Tree-Constrained Pointer Generator. Guangzhi Sun, Chao Zhang, Phil Woodland
Exploring Catastrophic Forgetting for Multi-Lingual Automatic Speech Recognition. Ed Storey, Naomi Harte
Identifying People with Mild Cognitive Impairment at Risk of Developing Dementia Using Speech Analysis. Bahman Mirheidari
Imaginary Mask Estimation in Complex Masking for Speech Enhancement. Georgiana-Elena Sfeclis, Ben Milner
Mol an Óige – Development of Irish Phonological Awareness and Early Literacy. Ailbhe Ni Chasaide
OverFlow: Fusing Neural HMMs and Normalising Flows for Probabilistic TTS. Shivam Mehta, Ambika D Kirkland, Harm Lameris, Jonas Beskow, Eva Szekely, Gustav Eje Henter
PAMGAN+/-: Improving Phase-Aware Speech Enhancement Performance via Expanded Discriminator Training. George L Close, Stefan Goetze, Thomas Hain
Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning. Elaf Islam, Thomas Hain
Singing Voice Banking and Conversion for Transgender Singers. Cliodhna Hughes, Ning Ma, Guy Brown
Speaker-based Information Retrieval in the Wild. Erfan Loweimi
Speaking Style Analysis on Conversational Speech Corpora. Adaeze Adigwe
Spectral Clustering-Aware Learning of Embeddings for Speaker Diarisation. Evonne Lee, Guangzhi Sun, Chao Zhang, Phil Woodland
Synthesising and Assessing Dramatic Speech. Emily Lau, Brechtje Post, Katherine M Knill
Synthetic Voices for an Endangered Language Community: the Irish Experience. Ailbhe Ni Chasaide, Andy Murphy
TAME Pain: Trustworthy AssessMEnt of Pain from Speech and Audio. Beatrice Pakenham-Walsh, Jennifer Williams
The Clarity & Cadenza Challenges: Breaking Barriers to Stimulate Progress in Signal Processing for Those with a Hearing Loss. Gerardo Roa Dabike, Jon Barker, Trevor Cox, William Whitmer, Bruno Fazenda, Alinka Greasley, Rebecca R Vos, Scott Bannister, Michael Akeroyd, Jennifer Firth, Simone Graetzer, Graham Naylor, John F Culling
Unveiling Acoustic Embedding Space: Decomposing Word Embeddings into Subword Embeddings. Amit Meghanani, Thomas Hain
Ursa: Benefits of Scaling Self-Supervised Learning for Automatic Speech Recognition. Bethan J Thomas, Benedetta Cevoli, Jamie Dougherty
Using a Large Language Model to Control Speaking Style for Expressive TTS. Atli Thor Sigurgeirsson, Simon King
VOCEX: Voice Frame-Level and Utterance-Level Attribute Extraction for Speech Synthesis. Christoph D Minixhofer, Ondrej Klejch, Peter Bell
Keynote B
Speaker
Dr. Neasa Chiaráin
Title
Speech Technology for Irish: the ABAIR initiative
Oral Session A
An Objective Measure of Lipsync Quality with Non-Aligned Speech Input
Oscar Saz, Luca McArthur, James Parr-Burman and Jan MedvesekA Multi-Label Speech Emotion Recognition for Cross-Cultural Communication
Tassadaq Hussain, Islam H Nassar, Zhixi Cai, Hamid Rezatofighi, Munawar Hayat and Nicholas CumminsIdentifying Voices and Events from Audio: A Forensic and Law Enforcement Perspective
Anil Alexander and Finnian Kelly
Keynote C
Speaker
Dr. Enzo De Sena
Title
Speech Auralisation
Poster Session B
(A-Z by the Titles)
A Diagnostic for Quantifying Dialect Bias in Balanced Corpora: An Irish Case Study. Liam Lonergan, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ni Chasaide
A Study of Various Encoders for Language Identification Models. Jeffrey Josanne Michael, Toby Godwin, Oscar Saz, Salil Deena
Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax. Keqi Deng, Phil Woodland
An Exploration into Social Attention for Visual Context Modelling in Active Speaker Detection. Jason Clarke, Yoshi Gotoh, Stefan Goetze
An Initial Empirical Analysis of the Effect of Sampling Variability in a Forensic Voice Comparison System. Phil Weber
Analysis of the Communication Rate Gap for Users of Augmentative and Alternative Communication (AAC) Systems. Hussein S A Yusufali, Stefan Goetze, Roger Moore
Assessing Early-stage Schizophrenia based on Paralinguistic Analysis of Speech. Julianna Miss Olah, Kelly Diederen, Maite Arribas, Thomas Spencer, Nicholas Cummins
Assisting Human Detection of Audio Deepfakes. Thomas Cutts, Jennifer Williams, Sebastian Stein
CITED: Ciphered Text Data Augmentation for Low-Resource Acoustic Model Training. Muhammad Umar Farooq, Thomas Hain
Classification of Cognitive Status using Acoustic Features Extracted from Voice Assistant Commands. Melanie Jouaiti, Ravi Vaidyanathan
CognoSpeak: a Cognitive Health Assessment Tool (CcHAT). Nathan Daniel Pevy, Heidi Christensen, Daniel Blackburn
Detecting Vocal Pathologies from Speech Using Transfer Learning. Mary L Paterson, Luisa Cutillo, James Moor
Evaluating Adversarial Networks for Unsupervised Speech Recognition. Mattias George Cross, Anton Ragn
Experiments in Self-Training an ASR System for Irish. Neimhin Robinson Gunning
Fairness in Speech Processing with an Emphasis on Medical Applications. Hend ElGhazaly, Nafise Sadat Moosavi, Heidi Christensen
Geabaire (AAC) – a Voice for Those Without. Ailbhe Ni Chasaide
Investigate Privacy Risks in Speech Depression Detection: An Experimental Study on Demographic Information Leakage. Basmah M Alsenani, Tanaya Guha, Alessandro Vinciarelli
Investigating Confounding Variables Effect in Speech Models for Depression. Stefano Gloria, Nicholas Cummins
Large Vocabulary Continuous Speech Recognition of MP3 Call Centre Data. Kris Y Hong, Dmitry Sityaev
Learnable Frontends That Do Not Learn: Quantifying Sensitivity to Filterbank Initialisation. Mark Anderson, Tomi H Kinnunen, Naomi Harte
Modelling the Growth of Vocabulary in Textual Documents. Martin J Tunnicliffe, Gordon Hunter
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space. Rao Ma, Mark Gales, Katherine M Knill, Mengjie Qian
Prosody in Referential Communication with a Human or a Computer Partner. Iona Gessinger, Benjmain Cowan
Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces. Oli D Liu, Hao Tang, Sharon Goldwater
Social Robot Nonverbal Vocalization System with Large Language Model. Chuang Yu, Guanyu Huang, Shuang Wu
Speech Audio Corrector: Using Speech from Non-Target Speakers for One-off Correction of Mispronunciations in Grapheme-Input Text-To-Speech. Jason K Y Fong
Speech Emotion Recognition Based on Hierarchical Classification Using Different Modalities in Different Levels. Nawal Alqurashi, Yuhua Li, Kirill Sidorov, David Marshall
Spontaneous TTS with Prosody Control Using Neural HMMs. Harm Lameris, Shivam Mehta, Gustav Eje Henter, Joakim Gustafson, Eva Szekely
The Drill Without the Kill: the Irregular Verb Bot. Neasa Ní Chiaráin
The Effects of Reverberation on Paralinguistic Feature Extraction in Healthy Controls: First Steps Towards Robust Mobile Health Assessments. Judith Dineley, Ewan Car, Faith Matcham, Johnny Downs, Richard Dobson, Thomas Quatieri, Nicholas Cummins
Topic Retrieval for System Development in the Wild. Mengjie Qian, Erfan Loweimi, Mark Gales
Using Artificial Intelligence to Assist in the Understand of Speech that Has Been Affected By Neurological Damage of People Living with Parkinson’s Disease. Paul Gadd
Vocal Changes and Language Use Associated with Frequent Cannabis Use. Julianna Miss Olah, Kelly Diederen, Thomas Spencer, Nicholas Cummins
Poster Session C
(A-Z by the Titles)
A Study on Microphone Array Position Calibration for Hearing Aids. Shengchang Cao, Stefan Goetze, Jon Barker
Adapting an Unadaptable ASR System. Mengjie Qian, Rao Ma, Mark Gales, Katherine M Knill
Adapting Pretrained Models for Adult To Child Voice Conversion. Protima Nomo Sudro, Anton Ragni, Thomas Hain
Adversarial Learning of Neural User Simulators for Dialogue Policy Optimization. Simon Keizer, Caroline Dockes, Norbert Braunschweiler, Svetlana Stoyanchev, Rama S Doddipatla
Analysis of Speech Datasets for Communication Scenarios for Hearing Aid Users. Robert Sutherland, Stefan Goetze, Jon Barker
Assessment of L2 Oral Proficiency Using Self-Supervised Speech Representation Learning. Stefano Bannò, Katherine M Knill, Marco Matassoni, Vyas Raina, Mark Gales
Automatic Assessment of Conversational Speaking Tests. Simon W McKnight, Arda Civelekoglu, Mark Gales, Katherine M Knill
Corpus Collection Considerations in the Minority Language Context. Neasa Ní Chiaráin
Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra. Zhengjun Yue, Erfan Loweimi, Zoran Cvetkovic
Efficient Intelligibility Evaluation Using Keyword Spotting: A Study on Audio-Visual Speech Enhancement. Andrea L Aldana, Cassia Valentini, Ondrej Klejch, Peter Bell
Ensemble Prosody Prediction for Expressive Speech Synthesis. Zack Hodari, Vivian Hu
Exploring the Use of Self-Supervised Learning for Spoken Language Identification. Sam McNulty, Salil Deena
Humans to Machines: Representing Exemplars. Rhiannon Mogridge, Anton Ragni
Incremental Training Changes to Improve Synthesis Quality. Alexandra Torresquintero, Tomás Gómez Ibarrondo, Christopher G. R. Wallis, Vivian Hu, James Leoni, Devang Savita Ram Mohan, Zack Hodari
Inner Speech Decoding with Bimodal fMRI-EE. Scott DL Wellington
Investigating Sequence-Level Normalization For CTC-Like End-to-End ASR. Zeyu Zhao, Peter Bell
Language Proficiency Influences Intonational Convergence in L2 English Speech Imitation. Zheng Yuan, Alessandro D’Ausilio
Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models. Sung-Lin Yeh, Hao Tang
Leveraging Cross-Utterance Context for ASR Decoding. Robert J Flynn, Anton Ragni
Matching Acoustic and Perceptual Measures of Phonation Assessment in Disordered Speech - A Case Study. Melanie Jouaiti, Pippa Kirby, Ravi Vaidyanathan
Multimodal Dyadic Impression Recognition via Listener Adaptive Cross-Domain Fusion. Yuanchao Li, Peter Bell, Catherine Lai
On Data Sampling Strategies for Training Neural Network Speech Separation Models. William Ravenscroft, Stefan Goetze, Thomas Hain
On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning. Titouan Parcollet, Shucong Zhang, Rogier C. van Dalen, Alberto Gil C. P. Ramos, Sourav Bhattacharya
Query Based Acoustic Summarization for Podcasts. Samantha Kotey, Rozenn Dahyot, Naomi Harte
Real-Time Personalised Speech Enhancement Transformers with Dynamic Cross-attended Speaker Representations. Shucong Zhang, Malcolm Chadwick, Alberto Gil C. P. Ramos, Titouan Parcollet, Rogier C. van Dalen, Sourav Bhattacharya
Residual Energy-Based Models for Speech Synthesis. Wanli Sun, Zehai Tu, Anton Ragni
The ABAIR Suite of Irish Speech Technology and Applications: an Overview. Ailbhe Ni Chasaide
The Importance of Phonemization Accuracy for TTS Acoustic Modeling. Zack Hodari, Tomás Ibarrondo
Towards Articulatory Control of Speech Synthesis based on Optimal Control Theory. Zihang Peng
Understanding the Behavior of Automatic Speaker Recognition Systems for Application in Forensic Casework. Poppy Welch, Vincent Hughes, Jessica Wormald, Chenzi Xu, Paul Foulkes, Philip Harrison, Finnian Kelly, David van der Vloed
What Do the Measures of Utterance Fluency Employed in Automatic Speech Evaluation (ASE) Tell Us About Second Language Oral Proficiency? Zoe Handley
Why Say Anything? Roger Moore
Oral Session B
Exploring Agreement between Language Identity and Matrix Language in Code-Switched Speech
Olga Iakovenko and Thomas HainEfficient Control of Prosody Using Sparse Human Input
Dan Andrei Iliescu, Devang Savita Ram Mohan, Tian Huey Teh and Zack HodariSpeech Technology in Manufacturing
Lindsay Lee