Multiaccent EMG-to-Speech Optimized Transduction with PerFL and MAML Adaptations

Shan Ullah, Deok Hwan Kim

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Silent speech voicing enables individuals with speech impairments to communicate solely through facial muscle movements, bypassing the need for vocalization. Typically, electromyography (EMG) is utilized in conjunction with voice signals from individuals with normal speech for training purposes. Existing studies are targeting single accent using a single acquisition device, ignoring multiple accents from diverse ethnic backgrounds which can pose challenges in developing generalized and adaptive solutions. To address this, we propose a comprehensive approach consisting of the following: 1) a multiaccent EMG-to-speech silent voicing dataset; 2) an optimized transduction model (EMG-to-speech features); 3) a model-agnostic meta-learning (MAML) approach to adapt across cross-accented data; and 4) a personalized federated learning (PerFL) solution that utilizes MAML initialization to enhance global model convergence. Our novel transduction model incorporates three key elements: 1) convolution layers with a Squeeze-and-Excitation network to enhance channel-wise interdependencies (feature recalibration); 2) a gating multilayer perceptron to enhance global context awareness by linear projections along channel dimensions; and 3) transformers that learn temporal features across time series (EMG). We validated our novel algorithm using publicly available and proprietary (from our research laboratory) datasets. To simulate real-world conditions, a proprietary dataset was generated using three different biosignal devices, yielding heterogeneous data with 1370 utterances involving eight subjects with three distinct accents. Our proposed transduction model outperformed traditional methods, with 1.3%-3.5% improvements in the word error rate (WER) on the public dataset. Moreover, we studied the impact of two different MAML variants and their impact on PerFL initialization. Detailed results, encompassing various performance metrics such as confusability, accuracy, character-error-rate (CER), and WER, are presented for both public and proprietary datasets.

Original languageEnglish
Article number2528317
JournalIEEE Transactions on Instrumentation and Measurement
Volume73
DOIs
StatePublished - 2024

Bibliographical note

Publisher Copyright:
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

Keywords

  • Electromyography (EMG)
  • EMG-to-speech
  • meta-learning
  • personalized federated learning (PerFL)
  • silent speech interface
  • transformers
  • voice synthesis

Fingerprint

Dive into the research topics of 'Multiaccent EMG-to-Speech Optimized Transduction with PerFL and MAML Adaptations'. Together they form a unique fingerprint.

Cite this