Skip to main content

Speaker Recognition Using Artificial Neural Networks Based on Vowel Phonemes

Research Authors
E.F.M.F.Badran , H. Selim
Research Member
Research Department
Research Year
2000
Research Journal
16th World Computer Congress, Beijing China
Research Rank
3
Research Abstract

Speaker recognition systems attempt to recognize a speaker by his/her voice through measurements of the specifically individual characteristics arising in the speakers voice. Among transformations of LPC parameters the adaptive component weighted (ACW) cepstrum has been shown to be less susceptible to channel effects than others. Text-independent and text-dependent speaker recognition systems suitable for verification and identification (open set and closed set) are presented, The system is based on locating the vowel phonemes of the test utterance. A preprocessing is applied to the speech signal. The centers of the vowel phonemes are located and identified as speech events using a three-step vowel phoneme locating process. The steps of the locating process are: (1) average magnitude function calculation; (2) vowel phoneme candidates location; and (3) ripple rejection. For each vowel phoneme (20 ms) 10 ACW cepstrum coefficients are calculated and are used as inputs to neural networks and the outputs are accumulated and averaged. The system hardware requirements are a microphone and a round card. The system software written in C++ language for windows. The system was tested with a population of 10 speakers (7 male and 3 female), and the statistics were taken (95.67% for text-dependent verification, 93% for text-dependent identification, 92.2% for text-independent verification and 88.95% for text-independent identification). There tests were done with utterances of one word having one vowel phoneme (20 msec used for recognizing the speaker). A vowel phoneme recognition application is also presented. A limited vocabulary recognition system is developed using vowel phoneme in the limited vocabulary. The feature vectors calculation is the same as in the speaker recognition system the only difference is in the neural network training and size (97.5% of word recognition)