Speech Recognition
SPEECH PROCESSING FOR INDIAN LANGUAGE
In the era of up-to-the-minute technology, where exchange of information takes place with the click of time, we still rely on feeding our text inputs in typographical manner. The dream of the total Speech-To-Speech system can be envisaged only when we can design a total Machine Translation system. Speech Processing System is an approach to provide a speech interface between user and the Computer. Basically our system zeros in on Hindi, Oriya and Bangla at our Resource Centre. Broadly the system is classified into following sections.
I-Speech Synthesis i.e., Text To Speech Conversion
II-Speech Recognition i.e., Speech To Text Conversion
III-Speaker Identification and Accent Analysis
IV-Speech Corpora development
Our objective is to design a system/algorithm, which works
efficiently, produces naturalness and utilizes memory as less as
possible.
Our achievements includes development of TTS for Oriya, Hindi
and Bangla and Oriya speech recognition system and integration
of Optical Character Recognition (OCR) to the indigenously
developed Text To Speech system.
As the name signifies this system provides an interface
through which a user enters certain text/document and it is the
software developed by us that reads it as natural as human. The
basic approach followed here is, first to analyse the document
(language, font etc.), and then extract words from the text, try
to parse individual words into vowels and consonants respectively.
Then corresponding to these vowels and consonants existing
(previously stored in the database) “.wav” files are
concatenated and played. As the Oriya language is character based,
we have designed character based concatenation for the synthesis
of Oriya speech. In case of Hindi and Bangla, the synthesis is
done by the syllable base concatenation method. We are working on
the accent part of Hindi and Bangla speech to make the
concatenation error free. The rules of Paninian Philology
are very much efficient to incorporate prosody and intonation in
the output. The
speech processing is a pattern recognition problem. The
recognition of speech is defined as an activity whereby a speech
sample is attributed to a person on the basis of its
phonetic-acoustic or perceptual properties. In our approach we
study the nature of spoken words by different speakers. From a
continuous sentence the word boundaries are detected and the
nature of utterance of individual consonants and vowels are marked
to study their behavior for a particular speaker. Basing on this
we have designed a reader system, which works in command mode for
the operation of computers by partially blind people. Using the
OCR system developed in our laboratory a blind person can operate
by giving voice commands and listen to any document by the use of
the TTS software. The development of a voiced telephone directory
system is under process.We
have obtained eighteen frequency domain parameters and four time
domain parameters for the recognition of the speaker and also
particular phoneme. These parameters are also trained to obtain
synthetic speech.In
our laboratory we are working for the technological development of
Indian languages, which includes Optical Character Recognition,
Speech Processing and Natural Language Processing. Conglomerating
all these technologies the Vision 2020,
a dream of our Honourable President Dr.
A. P. J. Abdul Kalam
can be changed to reality, when there will be no language barrier
within the Indian provinces.