Towards anti-spoofing biometrics: 3D talking face


Biometric technology is able to recognize a person on the basis of the unique features of their face, fingerprint, signature, DNA or iris pattern and then impart a secure and convenient method for authentication purposes. The quest for privacy has coercedthe revolution in security technology and systems that can be deployed in many fields, including building, cash machine, personal devise and software, etc.. Of the many unique human biological representations  that can be adopted for security purposes, the human face is  one of the most discriminative bio-representation. Additionally, it is non-contact and thus convenient to be widely applied in access control and information security. As of now, face biometrics have achieved remarkable performance over the past decades, but unexpected spoofing of the static faces poses a threat to information security. As such, there has been an upsurge in demand for stable and discriminative biological modalities which are hard to be mimicked and deceived. Speech-driven 3D facial motion is a distinctive and measurable behavior-signature that is promising for biometrics.

Therefore, the development of a spoofing-proof system, that in addition overcomes the aforementioned shortfalls would be highly desirable. In this view, researchers from the School of Informatics, the University of Edinburgh, UK: Dr. Jie Zhang (currently at Beijing Technology and Business University) and Professor Robert Fisher proposed a novel 3D behaviometrics framework based on a “3D visual passcode” derived from speech-driven 3D facial dynamics. In other words, they aspired to achieve person recognition with a 3D individual-specific bio-modality, which was less likely to be deceived or mimicked and remained robust against head pose variations. Their work is currently published in the research journal, Signal Processing.

Their research begun with the 29 dimensional joint representation of 3D facial dynamics by 3D-keypoint-based measurements and 3D shape patch features, extracted from both static and speech-driven dynamic regions. In close succession, an ensemble of subject-specific classifiers was then trained over selected discriminative features; which allowed for a discriminant speech-driven 3D facial dynamics representation. Additionally, the two researchers constructed the first publicly available Speech-driven 3D Facial Motion dataset (S3DFM) that included 1030 2D-3D face video plus synchronized audio samples from 77 international participants.

The authors reported that experiments on the new dataset verified that the speech-driven dynamic face signatures were repeatable and distinctive. To be precise, the 29 statistical features had 100% separability; and the 77 subjects represented by the full 29 features had a 98.7% separability. In addition, the comparable result with other feature sets further demonstrated the effectiveness of adding dynamic information to the static 3D descriptors.

In summary, the study presented a 3D behaviometric pipeline based on speech-driven 3D facial dynamics as a “3D visual passcode”. The experimental results on the S3DFM dataset showed that the proposed pipeline achieved a face identification rate of 96.1%. In an interview with Advances in Engineering, Professor Jie Zhang pointed out that their framework and its speaking-dynamics features could be generalized to any spoken passcodes (set by users) and was invariant to speaking speed, which in turn meant that the uniqueness of the “3D visual passcode” originates from both the subject specific facial motion and the privacy of a passcode. Moreover, she further emphasized that the proposed idea could be generally applied in any biometric system where 3D video scanners were installed facing users.

Towards anti-spoofing biometrics: 3D talking face - Advances in Engineering

About the author

Jie Zhang is currently an Associate Professor in the School of Computer and Information Engineering at Beijing Technology and Business University, China. Before joining BTBU, she received her PhD from Beihang University, Beijing, China in June 2018, advised by Prof. Junhua Sun. During 2016-2018, she was a visiting PhD student at The University of Edinburgh, UK under the supervision of Prof. Robert B. Fisher, where she investigated 3D lip motion and its applications to behaviometrics. Her research interests lie at computer vision, pattern recognition, and 3D vision measurement. She was awarded Excellent Graduate of Beijing in 2014, National Fellowship of China in 2014, CSC Fellowship in 2016, National Natural Science Funding of China in 2019.

About the author

Prof. Robert Fisher has been an academic in the School of Informatics (originally in the former Department of Artificial Intelligence) at University of Edinburgh since 1984 and a full Professor since 2003. He received his PhD from University of Edinburgh (1987), investigating computer vision in the former Department of Artificial Intelligence. His previous degrees are a BS with honors (Mathematics) from California Institute of Technology (1974) and a MS (Computer Science) from Stanford University (1978).

He worked as a software engineer for 5 years before returning to study for his PhD. He has been researching 3D scene understanding since 1982, and has worked on model-based object recognition, range image analysis, parallel vision algorithms, etc. He is the BMVA Fellow and IAPR Fellow.


Jie Zhang, Robert B. Fisher. 3D Visual passcode: Speech-driven 3D facial dynamics for behaviometrics. Signal Processing, volume 160 (2019) page 164–177.

Go To Signal Processing

Jie Zhang, Korin Richmond, Robert B. Fisher, Dual-modality Talking-metrics: 3D Visual-Audio Integrated Behaviometric Cues from Speakers. 2018 International Conference on Pattern Recognition (ICPR), Bejing, August 24-28, page 3144-3149.

Check Also

Human-centered perspective of production before and within Industry 4.0 - Advances in Engineering

Human-centered perspective of production before and within Industry 4.0