Dr. Kate Knill

University of Cambridge, UK

Foundation Models in Spoken Language Processing: Time to go home or make hay?

Observations from automated language learning and assessment

For many years the engineering approach to spoken language processing (SLP) was to train task-specific models, using primarily supervised training on labelled data. There has been a paradigm shift in recent years with the release of models with billions of parameters, pre-trained (mostly by self or semi-supervised learning) on vast quantities of data. These 'foundation models' can be adapted to form the foundation of the solution to many different downstream tasks. They can operate in a variety of modes from zero shot learning, where no task data is required, to few shot learning, building on limited data examples, to more standard fine-tuning. In each case the resulting performance is highly impressive. For (academic) researchers it can sometimes be hard to see how we can contribute without access to the training materials, compute and human resources of the groups producing these foundation models. Alternatively, we can view this as an opportunity to solve many previously challenging, often data limited tasks. This talk will use examples from automated language learning and assessment to show why now is the time for us to make hay while the foundation model sun shines rather than pack up and go home.

Dr. Neasa Chiaráin

School of Linguistic, Speech and Communication Sciences
Trinity College Dublin, Ireland

Speech Technology for Irish: the ABAIR initiative

This talk gives a short overview of the development of speech technologies for the Irish language by the ABAIR initiative, Trinity College Dublin. The importance of speech technology for the documentation, maintenance and promotion of minority and endangered languages, such as Irish, is discussed, along with the many considerations that need to be borne in mind – quite different from what would drive technology development in a language such as English. Apart from the lack of generally suitable, readily available corpora, a major challenge concerns the diversity of dialects, and the fact that there is no spoken standard. In our experience, the ABAIR enterprise has evolved as a partnership with the language community: they provide not only the speech corpora that are required but are essential in setting the developmental priorities, not only for the core technologies of synthesis and recognition, but also in guiding the applications, of which they are the end users. The range of ABAIR technology and applications is outlined and one application in particular, An Scéalaí, an Irish language educational platform, is presented in some detail. In this UNESCO decade of Indigenous Languages 2022 – 2032 it is the hope of the ABAIR research group that they can make common cause with other such groups who are involved in (or would like to be involved in) building speech technology resources for their languages.  

Dr. Enzo De Sena

Institute of Sound Recording
Department of Music and Media
University of Surrey, UK

Speech Auralisation

When we perceive someone’s voice, the sound produced by their mouth reaches our ears after travelling through air, reflecting off the surfaces of the room, and being diffracted by our heads. If we want to replicate this experience while wearing headphones, it becomes necessary to accurately model all of these physical phenomena. This process is called “auralisation”, the audio equivalent of “visualisation”. Auralisation finds application in a variety of fields, including soundscape design, virtual and augmented reality, gaming, product design, architectural acoustics, as well as speech and hearing research. This keynote will provide an overview of the main components of an auralisation system, including modelling of head diffraction and room acoustics, as well as discuss key challenges and open questions in the field.