[Sugar-devel] [DESIGN] Design Team agenda item: TTS, e-speak and voices

Gonzalo Odiard gonzalo at laptop.org
Thu Jul 19 21:21:05 EDT 2012


Thanks Chris by putting this topic on the table.
Overall, I am no sure this is a topic to the Design Team,
anyway, I will add a few comments below...

On Wed, Jul 18, 2012 at 2:11 PM, Chris Leonard <cjlhomeaddress at gmail.com> wrote:
> Agenda item: TTS, e-speak and voices
>
> Summary:
> We now have text-to-spech everywhere via e-speak, before we get too
> far into speech engine technology "lock-in" we should thoughtfully
> assess festival (or other options).  In particular, we need to think
> about how we can improve upstream voices to cover more of our user's
> languages.
>

+1
In particular, would be great have the option to use better voices.
In general better voices mean bigger data files, but if a local deployment
can decide use it or not, is ok.

We need research:
* Can we use mbrola voices? (http://espeak.sourceforge.net/mbrola.html)

* Gnome-speech: When I implemented the tts feature in sugar,
tried to use gnome-speach to have a layer where we can use espeak or
festival voices,
but at the moment I tested it, festival voices were not recognized,
then I decided use espeak only. We can change that if found a better solution.
We should test again (a quick test in F17, only show a festival voice)

* Move speech to sugar-toolkit: we have a lot of code copy pasted in
the activities
to implement tts. Having a central implementation should solve a lot
of problems.

> Issues:
>
> 1) e-speak currently has a limited repertoire of languages that it
> "understands":
>
> http://espeak.sourceforge.net/languages.html
>
> The e-speak voice list is far shorter than the list of languages that
> Sugar currently supports, making this in part a i18n/L10n issue.
>
> 2) However, e-speak is flexible and can be "taught" new languages by
> means of creating new voice files.
>
> http://espeak.sourceforge.net/voices.html
>
> AFAIK, none of us at Sugar Labs or OLPC have ever created a new voice
> file for e-speak before, so I can't give you a reliable estimate of
> the challenges involved.  That process should be investigated
> systematically by a small team (to include at least a developer and a
> localizer).
>
> 3) There are alternative speech engines and voice file formats (e.g.
> festival) that have may have better quality or features, but this
> almost certainly comes with trade-offs in size-on-disk and performance
> (particularly on the large XO-1 installed base), making this partly a
> UI developer issue of backwards compatibility.
>
> 4) Text-to-speech is an important feature for visually-impaired users,
> making this in part an accessibility (a11y) issue.
>
> 5) AFAIK, e-speak is not currently packaged for Debian/Ubuntu, which
> raises issues for non-Fedora-based Sugar users that could be addressed
> by collaborating on packaging e-speak, which is a developer/packager
> issue.
>

I think the problem is not espeak, but the gstreamer espeak plugin.


> Conclusion:
>
> TTS touches many aspects of UI design, has many stakeholders and
> requires careful investigation before proceeding further along our
> current path.
>
> Suggested courses of action:
>
> 1) Identify someone willing to package / maintain e-speeak for Ubuntu
> (dnarvaez?)
>

Check what is the missing package.

> 2) Investigate other speech engines and their collectins of voices,
> the complexity of creating new voices and hardware performance
> characteristics on OLPC hardware.
>

+1

> 3) Investigate (and importantly, document) the process of developing
> new voices or improving existing voices for e-speak (on assumption
> that we will stick with it as our speech engine).  A good pilot effort
> might be to fine-tune English (en_rp) for Australia (see OLPC_OZ
> tickets linked below) or Spanish, Latin America (es_la) voices for a
> national variant like Peru.  The main goal of such an effort should be
> not so much the voice itself (though either of these could be great to
> have), but understanding the process by which e-speak voice
> development is accomplished and developing internal expertise in this
> process that can be teamed up with other Sugar localizers as needed to
> develop additional voices for our other supported languages.
>

I don't know how difficult this can be, but quidam said
is very difficult create good voice files.

I think we need find other groups of experts in the topic
and try to create alliances. Many of the voices are created by universities.

There are a long but useful HOWTO about using different voices in Festival,
(is old and for Ubuntu, but is ok)
http://ubuntuforums.org/showthread.php?t=751169

Gonzalo


More information about the Sugar-devel mailing list