Deep Boltzmann machines for i-Vector based audio-visual person identification
Alam, M., Bennamoun, M., Togneri, R. and Sohel, F. (2015) Deep Boltzmann machines for i-Vector based audio-visual person identification. Lecture Notes in Computer Science, 9431 . pp. 631-641.
*Subscription may be required
We propose an approach using DBM-DNNs for i-vector based audio-visual person identification. The unsupervised training of two Deep Boltzmann Machines DBMspeech and DBMface is performed using unlabeled audio and visual data from a set of background subjects. The DBMs are then used to initialize two corresponding DNNs for classification, referred to as the DBM-DNNspeech and DBM-DNNface in this paper. The DBM-DNNs are discriminatively fine-tuned using the back-propagation on a set of training data and evaluated on a set of test data from the target subjects. We compared their performance with the cosine distance (cosDist) and the state-of-the-art DBN-DNN classifier. We also tested three different configurations of the DBM-DNNs. We show that DBM-DNNs with two hidden layers and 800 units in each hidden layer achieved best identification performance for 400 dimensional i-vectors as input. Our experiments were carried out on the challenging MOBIO dataset.
|Publication Type:||Journal Article|
|Murdoch Affiliation:||School of Engineering and Information Technology|
|Copyright:||2016 Springer International Publishing Switzerland|
|Other Information:||Book Title: Image and Video Technology: 7th Pacific Rim Symposium on Image and Video Technology (PSIVT) 2015 Auckland, New Zealand 23 - 27 November 2015 Revised Selected Papers|
|Item Control Page|