Catalog Home Page

Deep Boltzmann machines for i-Vector based audio-visual person identification

Alam, M., Bennamoun, M., Togneri, R. and Sohel, F. (2015) Deep Boltzmann machines for i-Vector based audio-visual person identification. Lecture Notes in Computer Science, 9431 . pp. 631-641.

Link to Published Version:
*Subscription may be required


We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBMspeech and DBMface is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNNspeech and DBM-DNNface in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset.

Publication Type: Journal Article
Murdoch Affiliation: School of Engineering and Information Technology
Publisher: Springer Verlag
Copyright: 2016 Springer International Publishing Switzerland
Conference Website:
Other Information: Book Title: Image and Video Technology: 7th Pacific Rim Symposium on Image and Video Technology (PSIVT) 2015 Auckland, New Zealand 23 - 27 November 2015 Revised Selected Papers
Item Control Page Item Control Page