[Sugar-devel] Fwd: Summer of Code Proposal: Furthering Speech Recognition in Sugar.

Sun Mar 22 18:43:28 EDT 2009

Greetings Satya,

In the early 1990s I did some tests for a speech recognition system
and I found a shortcut for reliably acquiring and classing words and
even phonemes: asking the subject to click the mouse or the spacebar
to mark the boundary between words. Phonemes were more difficult, but
some subjects did manage to become very proficient in demarcating
boundaries. I seem to remember reading that most of the world's
languages draw from a pool of only 50-60 phonemes. Of course machines
were many times less powerful then compared to today but this little
shortcut simplified realtime processing considerably.

As a former audio engineer I had also explored digitally-controlled
analog audio processing, in particular equalization, as another method
of reducing processing power requirements. However, my results were
inconclusive, due I believe to the slow response time of the equipment
I had available. I recall finally resorting to a first sample to
capture the speech, then conversion to analog for shaping, then
conversion to digital again for analysis. Modern realtime plugin
effects modules for recording platforms such as Pro Tools would
certainly do the job better and in a single step, but the best of
these are very highly priced proprietary closed code (example:
http://www.sonnoxplugins.com).

I did find recognition rates varied very widely with the model of
microphone used. I remember obtaining interesting results from the
proximity effect of a common studio dynamic mic, the Shure SM57; the
"colored" analog sound allowed faster transformations (this is what
led me to shaping sound in the analog domain). Recording very close in
also eliminated ambient room noise. The stumbling block I encountered
there was encouraging subjects to speak right into the mic, which some
found intrusive. I did not test with cheapo soundcard mics as
interfacing these with pro sound equipment for testing was a headache.

It's a fascinating subject

Sean

On Sun, Mar 22, 2009 at 11:00 PM, Edward Cherlin <echerlin at gmail.com> wrote:
> On Sun, Mar 22, 2009 at 2:46 PM, satya komaragiri
> <satya.komaragiri at gmail.com> wrote:
>> Hello,
>>
>> I am a final year student from India. I wish to apply to GSoC this
>> year by building upon my current work. I had discussed the feasibility
>> and advantages of having Speech Recognition for an OLPC
>
> +1
>
> Which languages to start with?
>
>> with the devel
>> list [1] in September 2008 and have been working on it since then as a
>> part of the Sarai Fellowship. My progress on that project can be
>> tracked on its wiki page [2]. I am also working on a dictation
>> activity under it though it might take some time as I am currently
>> gathering the speech corpus spoken by children.
>
> Are you looking for published data, or researching this yourself? I
> see publications on a wide range of languages.
>
>> This summer, I would like to implement a generic speech library that
>> could be used by any activity so that children can interact with Sugar
>> using voice rather than typing. I spoke to Mr. Assim Deodia(cc'ed in
>> this mail)
>
> That cc fell out somewhere along the line.
>
>> who developed an activity called 'Listen and Spell' in GSoC
>> last year and is interested in this idea.
>>
>> I can showcase one of its potential usages by integrating speech
>> capabilities to the 'Listen and Spell' activity where the child can
>> spell out the word verbally. I want to let the children speak out the
>> spelling rather than type it out. As the alphabet of any language is
>> limited (26 in the case of English, extension to any language would
>> just mean getting a few people to read out the alphabet of that
>> language).
>>
>> Having a generic library will make system-wide integration easier by
>> abstracting the interactions with the speech engine via DBUS etc.  All
>> the activities can use the speech capability as they see fit  (spoken
>> commands to control the activity is the most straightforward
>> application that strikes me).
>
> I am interested in applications to foreign language and second
> language learning, including accent reduction.
>
>> It would be really nice if the community could provide us with some
>> feedback on this proposal. :)
>>
>> Regards
>> Satya Komaragiri
>>
>> [1] Dicsussion on the devel list:
>> http://lists.laptop.org/pipermail/devel/2008-September/019136.html
>> [2] Speech recognition project page: http://wiki.laptop.org/go/Speech_to_Text
>>
>>
>> P.S. I apologize if the list gets this mail twice. My first mail bounced.
>> _______________________________________________
>> Sugar-devel mailing list
>> Sugar-devel at lists.sugarlabs.org
>> http://lists.sugarlabs.org/listinfo/sugar-devel
>>
>
>
>
> --
> Silent Thunder (默雷/धर्ममेघशब्दगर्ज/دھرممیگھشبدگر ج) is my name
> And Children are my nation.
> The Cosmos is my dwelling place, The Truth my destination.
> http://earthtreasury.net/ (Edward Mokurai Cherlin)
> _______________________________________________
> Sugar-devel mailing list
> Sugar-devel at lists.sugarlabs.org
> http://lists.sugarlabs.org/listinfo/sugar-devel
>