Home
Research Area
Publications
Scholars
Other Activities
Contact Us
About Us
GuestEntry
FAQ
New Activities
Photo gallery
About CI
Related Site

SPEECH PROCESSING FOR INDIAN LANGUAGE

In the era of up-to-the-minute technology, where exchange of information takes place with the click of time, we still rely on feeding our text inputs in typographical manner. The dream of the total Speech-To-Speech system can be envisaged only when we can design a total Machine Translation system. Speech Processing System is an approach to provide a speech interface between user and the Computer. Basically our system zeros in on Hindi, Oriya and Bangla at our Resource Centre. Broadly the system is classified into following sections.

I-Speech Synthesis i.e., Text To Speech Conversion

II-Speech Recognition i.e., Speech To Text Conversion

III-Speaker Identification and Accent Analysis

IV-Speech Corpora development

Our objective is to design a system/algorithm, which works efficiently, produces naturalness and utilizes memory as less as possible. Our achievements includes development of TTS for Oriya, Hindi and Bangla and Oriya speech recognition system and integration of Optical Character Recognition (OCR) to the indigenously developed Text To Speech system. As the name signifies this system provides an interface through which a user enters certain text/document and it is the software developed by us that reads it as natural as human. The basic approach followed here is, first to analyse the document (language, font etc.), and then extract words from the text, try to parse individual words into vowels and consonants respectively. Then corresponding to these vowels and consonants existing (previously stored in the database) “.wav” files are concatenated and played. As the Oriya language is character based, we have designed character based concatenation for the synthesis of Oriya speech. In case of Hindi and Bangla, the synthesis is done by the syllable base concatenation method. We are working on the accent part of Hindi and Bangla speech to make the concatenation error free. The rules of Paninian Philology are very much efficient to incorporate prosody and intonation in the output. The speech processing is a pattern recognition problem. The recognition of speech is defined as an activity whereby a speech sample is attributed to a person on the basis of its phonetic-acoustic or perceptual properties. In our approach we study the nature of spoken words by different speakers. From a continuous sentence the word boundaries are detected and the nature of utterance of individual consonants and vowels are marked to study their behavior for a particular speaker. Basing on this we have designed a reader system, which works in command mode for the operation of computers by partially blind people. Using the OCR system developed in our laboratory a blind person can operate by giving voice commands and listen to any document by the use of the TTS software. The development of a voiced telephone directory system is under process.We have obtained eighteen frequency domain parameters and four time domain parameters for the recognition of the speaker and also particular phoneme. These parameters are also trained to obtain synthetic speech.In our laboratory we are working for the technological development of Indian languages, which includes Optical Character Recognition, Speech Processing and Natural Language Processing. Conglomerating all these technologies the Vision 2020, a dream of our Honourable President Dr. A. P. J. Abdul Kalam can be changed to reality, when there will be no language barrier within the Indian provinces. 

Our Products:
TEXT -TO-SPEECH FOR ORIYA LANGUAGE:

We have designed a TTS system, which can utter any Oriya document without ambiguity. The speech synthesizer for Oriya language is designed by the character based concatenation technique. In this method the words are parsed into characters and a character is segregated into pure consonant and vowel. For example we get ‘ka’ from the word ‘kaTaka’ (LVL) and further this is separated as pure consonant ‘k’ () and ‘a’ (@). Likewise, all the pure consonants and vowels are stored in the ‘.wav’ format. As the vowels dominant in the utterance, they are stored for different durations as they occur in the word. 

Using the techniques of Artificial Intelligence the synthesis of sound is obtained. Further the transition between two characters is stored by taking the help of Paninian philology to give a natural shape to the output. These rules help us to slot in different levels of pitch for incorporating prosody in the output. The classification on the silence region and the unvoiced regions are also studied to place these in proper places.

 Intonationmodeling is also an important aspect for the synthesis system.The  accuracy in the Oriya-Synthesis system is more than 75% as it is been tested by STQC, Banglore. We have received IPR  for this product from the Ministry of HRD, Govt. of India in the year 2002.In addition to Oriya language we are in the process of developing a TTS system for Hindi in syllable base concatenation technique. Here we apply ENSOLA algorithm for designing a robust system.


ORI-STT(Oriya speech-To-Text-System)

STT system designed by us is capable of recognising more than two hundred words as it is designed through a  training process of phones,diphones and triphones. Presently, we are designing a telephone directory system based on Oriya character recognition system.

 

The classification of phonemes has already been done. We are also incorporating the tools of Hidden Markov Model to make a speaker independent recognition system. We have applied to MHRD, Govt. of India for IPR for this product.

 
Speech Tech Group (STG)
RC-ILTS-ORIYA
P.G. Dept. of Comp.Science
Utkal University
Bhubaneswar, Orissa
India

 

 

 

Copyright © 2003 Department of Computer Science & Application, Utkal University, Vanivihar. All rights reserved.