Talking human face generation: A survey

Mukhiddin Toshpulatov, Wookey Lee, Suan Lee

Research output: Contribution to journalReview articlepeer-review

25 Scopus citations

Abstract

Talking human face generation aims at synthesizing a natural human face that talks in correspondence to the given text or audio series. Implementing the recently developed Deep Learning (DL) methods such as Convolutional Neural Networks (CNN), Generative Adversarial Networks (GAN)s, Neural Rendering Fields (NeRF) for data generation, and talking human face generation has attracted significant research interest from academia and industry. They have been explored and exploited recently and have been used to address several problems in image processing and computer vision. Notwithstanding notable advancements, implementing them to real-world problems such as talking human face generation remains challenging. The generation of deepfakes created by the abovementioned methods would greatly promote many fascinating applications, including augmented reality, virtual reality, computer games, teleconferencing, virtual try-on, special movie effects, and avatars. This research reviews and discusses DL related methods, including CNN, GANs, NeRF, and their implementation in talking human face generation. We aim to analyze existing approaches regarding their implementation to talking face generation, investigate the related general problems, and highlight the open study issues. We also provide quantitative and qualitative evaluations of the existing research approaches in the related field.

Original languageEnglish
Article number119678
JournalExpert Systems with Applications
Volume219
DOIs
StatePublished - 1 Jun 2023

Bibliographical note

Publisher Copyright:
© 2023

Keywords

  • 3D face generation
  • Autoencoder
  • Datasets
  • Deep generative model
  • Evaluation metrics
  • Mel spectogram
  • Neural networks
  • Neural radiance field
  • Talking human face animation
  • Unsupervised learning

Fingerprint

Dive into the research topics of 'Talking human face generation: A survey'. Together they form a unique fingerprint.

Cite this