Speech Recognition

SPEECH PROCESSING FOR INDIAN LANGUAGE

In the era of up-to-the-minute technology, where exchange of information takes place with the click of time, we still rely on feeding our text inputs in typographical manner. The dream of the total Speech-To-Speech system can be envisaged only when we can design a total Machine Translation system. Speech Processing System is an approach to provide a speech interface between user and the Computer. Basically our system zeros in on Hindi, Oriya and Bangla at our Resource Centre. Broadly the system is classified into following sections.

I-Speech Synthesis i.e., Text To Speech Conversion

II-Speech Recognition i.e., Speech To Text Conversion

III-Speaker Identification and Accent Analysis

IV-Speech Corpora development

Our objective is to design a system/algorithm, which works efficiently, produces naturalness and utilizes memory as less as possible. Our achievements includes development of TTS for Oriya, Hindi and Bangla and Oriya speech recognition system and integration of Optical Character Recognition (OCR) to the indigenously developed Text To Speech system. As the name signifies this system provides an interface through which a user enters certain text/document and it is the software developed by us that reads it as natural as human. The basic approach followed here is, first to analyse the document (language, font etc.), and then extract words from the text, try to parse individual words into vowels and consonants respectively. Then corresponding to these vowels and consonants existing (previously stored in the database) “.wav” files are concatenated and played. As the Oriya language is character based, we have designed character based concatenation for the synthesis of Oriya speech. In case of Hindi and Bangla, the synthesis is done by the syllable base concatenation method. We are working on the accent part of Hindi and Bangla speech to make the concatenation error free. The rules of Paninian Philology are very much efficient to incorporate prosody and intonation in the output. The speech processing is a pattern recognition problem. The recognition of speech is defined as an activity whereby a speech sample is attributed to a person on the basis of its phonetic-acoustic or perceptual properties. In our approach we study the nature of spoken words by different speakers. From a continuous sentence the word boundaries are detected and the nature of utterance of individual consonants and vowels are marked to study their behavior for a particular speaker. Basing on this we have designed a reader system, which works in command mode for the operation of computers by partially blind people. Using the OCR system developed in our laboratory a blind person can operate by giving voice commands and listen to any document by the use of the TTS software. The development of a voiced telephone directory system is under process.We have obtained eighteen frequency domain parameters and four time domain parameters for the recognition of the speaker and also particular phoneme. These parameters are also trained to obtain synthetic speech.In our laboratory we are working for the technological development of Indian languages, which includes Optical Character Recognition, Speech Processing and Natural Language Processing. Conglomerating all these technologies the Vision 2020, a dream of our Honourable President Dr. A. P. J. Abdul Kalam can be changed to reality, when there will be no language barrier within the Indian provinces.