[Sugar-devel] Summer of Code Proposal: Furthering Speech Recognition in Sugar.

Benjamin M. Schwartz bmschwar at fas.harvard.edu
Wed Mar 25 12:31:29 EDT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

satya komaragiri wrote:
> I can showcase one of its potential usages by integrating speech
> capabilities to the 'Listen and Spell' activity where the child can
> spell out the word verbally. I want to let the children speak out the
> spelling rather than type it out. As the alphabet of any language is
> limited (26 in the case of English, extension to any language would
> just mean getting a few people to read out the alphabet of that
> language).

I think this is a very good idea.  General speech recognition is
error-prone and computationally intensive, but recognizing letter-names is
a much easier problem.  It also fits very well with our emphasis on young
children who may still be learning to spell.

I must admit that I cannot say exactly what this is "useful" for.  In my
experience, there is at most a very narrow age window in which children
can spell, but not type.  It would certainly be a very nifty demo, and
might help us to "engage" users.  Your proposal will have a better chance
of being accepted if you can give a compelling use case example.

> Having a generic library will make system-wide integration easier by
> abstracting the interactions with the speech engine via DBUS etc.  All
> the activities can use the speech capability as they see fit  (spoken
> commands to control the activity is the most straightforward
> application that strikes me).

My advice is to focus on letters, not commands.  Letters are universal,
and can be applied in any application that involves typing.  Commands
would have to be different for every activity, requiring endless new
training data.  (You could, however, include words like "Control" and
"Shift", which would allow users to access commands by speaking the
shortcuts.  Commands to Sugar itself, like "Frame" or "Neighborhood view"
would also be appropriate.)  I suggest you work in two stages:

1. Take an existing activity (perhaps Listen and Spell) and add the
ability to do voice recognition of letters.  This does not require any
DBUS magic.  In this way, you can show that your speech recognition is
actually working (preferably on an XO, which can be provided to you).

2.  Convert this into a system service that listens to the microphone and
synthesizes keystrokes via XInput.  (The effect, then, is just as if the
letter had been typed on the keyboard.)  Add a device to the Sugar frame
to activate and deactivate this service.  (This frame device might also
have to mute the speakers, to avoid interference.)

Note that, apart from the switch in the Sugar frame, this system would be
applicable to any Linux (or even Linux-like) desktop.

Extra credit:
3.  Provide an interface for users to record a new set of voices for their
own alphabet and language.

- --Ben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAknKXGEACgkQUJT6e6HFtqTnPwCeOCa5PoFoNlpRdQ/lTl2x9CDn
tY8AnjMCOXjWXZsSfHZwLGpMn32gVk/n
=rbua
-----END PGP SIGNATURE-----


More information about the Sugar-devel mailing list