[Sugar-devel] GSOC 2010: Speech Recognition in Sugar

chirag jain chiragjain1989 at gmail.com
Sun Apr 4 03:49:01 EDT 2010


Hi,



On Sat, Apr 3, 2010 at 7:37 AM, Benjamin M. Schwartz <
bmschwar at fas.harvard.edu> wrote:

> I think your proposal is very interesting.  It contains a number of
> different ideas.  One major division is between Voice Commands and Speech
> Recognition.  Each of these contains many other possibilities. My biggest
> suggestion is to specify further which possibilities you want to work on.
>  I recommend you schedule the _easiest_ thing first, before moving on to
> the hard things.  Most GSoC students are too ambitious and never produce
> anything useful.
>
> Thanks Benjamin for a quick reply and providing me with some very useful
suggestions.


> Some specific ideas:
>
> Voice Commands:
>  - integrate with a text-command system like Gnome Do [1], so that the
> commands are accessible through the keyboard as well as microphone.  Also
> look at Perlbox [2].  (Note that neither Gnome Do or Perlbox can be used
> directly.)
>  - integrate with GnomeVoiceControl [3], which already uses PocketSphinx
> and should be highly compatible with Sugar.   This could allow voice
> control of unmodified Activities.
>
> I have already gone through Gnome Voice control which I think is the best
option for integrating into sugar. The reason being it uses Pocket Sphinx
which is light weight and thus should be compatible with devices like
XO-1.0. The run time memory requirements of Pocket Sphinx are upto 20 MB.
During next few days, I will be testing the functionality of Pocket Sphinx
in sugar and familiarizing myself more with Gnome voice control.


> Speech Recognition:
>  - supply text to any unmodified activity
>  - control input language easily for multilingual users
>
> [1] http://do.davebsd.com/index.shtml
> [2] http://perlbox.sourceforge.net/
> [3] http://live.gnome.org/GnomeVoiceControl
>
> I have broken the proposal into following parts that should be done in
sequence:

a) My first priority this summer is to enable "Sugar Voice Control". This
includes:

1. Testing Pocket Sphinx on Sugar
2. Studying more about Gnome Voice Control.
3. Sugarizing the Gnome Voice Control.
4. A command line interface that will start speech recognition in the
background and will start taking "Speech Commands".

b) After the successful implementation of Sugar Voice control, we can then
look into providing speech recognized text to unmodified sugar activities.
Thus activities like Write can be made to get the required inputs either
from Keyboard or through microphone. This includes:

1.  Providing a Speech recognition button in the sugar frame (for example on
Top Right hand side) which when clicked will automatically start recognizing
speech in the background. Clicking the same button again will stop the
recognition process.

2.  A key board shortcut like Alt+S for starting speech recognition

3. Speech recognition control panel for controlling the various parameters.

c) The last part can be creating an API for providing easy Speech
Recognition access to activity developers.

My aim is to atleast achieve part a) this summer and if time permits I would
also like to implement part b). Part c) can be taken care off later.

Regards
-- 
Chirag Jain

Undergraduate Student
Netaji Subash Institute of Technology
New Delhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.sugarlabs.org/archive/sugar-devel/attachments/20100403/6ba1a97d/attachment.htm 


More information about the Sugar-devel mailing list