[Sugar-devel] [DESIGN] Design Team agenda item: TTS, e-speak and voices

Chris Leonard cjlhomeaddress at gmail.com
Wed Jul 18 13:11:59 EDT 2012


Agenda item: TTS, e-speak and voices

Summary:
We now have text-to-spech everywhere via e-speak, before we get too
far into speech engine technology "lock-in" we should thoughtfully
assess festival (or other options).  In particular, we need to think
about how we can improve upstream voices to cover more of our user's
languages.

Issues:

1) e-speak currently has a limited repertoire of languages that it
"understands":

http://espeak.sourceforge.net/languages.html

The e-speak voice list is far shorter than the list of languages that
Sugar currently supports, making this in part a i18n/L10n issue.

2) However, e-speak is flexible and can be "taught" new languages by
means of creating new voice files.

http://espeak.sourceforge.net/voices.html

AFAIK, none of us at Sugar Labs or OLPC have ever created a new voice
file for e-speak before, so I can't give you a reliable estimate of
the challenges involved.  That process should be investigated
systematically by a small team (to include at least a developer and a
localizer).

3) There are alternative speech engines and voice file formats (e.g.
festival) that have may have better quality or features, but this
almost certainly comes with trade-offs in size-on-disk and performance
(particularly on the large XO-1 installed base), making this partly a
UI developer issue of backwards compatibility.

4) Text-to-speech is an important feature for visually-impaired users,
making this in part an accessibility (a11y) issue.

5) AFAIK, e-speak is not currently packaged for Debian/Ubuntu, which
raises issues for non-Fedora-based Sugar users that could be addressed
by collaborating on packaging e-speak, which is a developer/packager
issue.

Conclusion:

TTS touches many aspects of UI design, has many stakeholders and
requires careful investigation before proceeding further along our
current path.

Suggested courses of action:

1) Identify someone willing to package / maintain e-speeak for Ubuntu
(dnarvaez?)

2) Investigate other speech engines and their collectins of voices,
the complexity of creating new voices and hardware performance
characteristics on OLPC hardware.

3) Investigate (and importantly, document) the process of developing
new voices or improving existing voices for e-speak (on assumption
that we will stick with it as our speech engine).  A good pilot effort
might be to fine-tune English (en_rp) for Australia (see OLPC_OZ
tickets linked below) or Spanish, Latin America (es_la) voices for a
national variant like Peru.  The main goal of such an effort should be
not so much the voice itself (though either of these could be great to
have), but understanding the process by which e-speak voice
development is accomplished and developing internal expertise in this
process that can be teamed up with other Sugar localizers as needed to
develop additional voices for our other supported languages.

4) Long term, establish regular developer and L10n community
collaboration to upstream new voices for the selected speech engine.

Goal of raising this at Design Team meeting:

1) Find an e-speak packager(s) for other distros.
2) Refine the issues involved.
3) Identify stakeholders willing to work on a short-term voice
development project to assess process, ideally willing to become
long-term, in-house e-speak gurus.
4) Identify other UI issues raised by TTS (i.e. touch-hover to speak
for a11y, etc.) to be developed as new agenda items for future
meetings.

References:

E-speak site:
http://espeak.sourceforge.net/index.html

Festival site:
http://www.cstr.ed.ac.uk/projects/festival/

OLPC Australia tickets on this topic:
https://dev.laptop.org.au/issues/653
https://dev.laptop.org.au/issues/731


More information about the Sugar-devel mailing list