[Sugar-devel] [IAEP] More 'human' voice synth (TTS)

James Simmons nicestep at gmail.com
Tue Jun 21 10:10:22 EDT 2011

I've had some experience with TTS from developing Read Etexts.
Originally, I used speech-dispatcher, which provided a way of writing
apps that used TTS without knowing what TTS engine was being used
underneath.  There were a couple of problems with that.  Since more
than one engine was supported, the RPM for speech-dispatcher needed to
have them all installed.  Second, you needed to configure it.

Aleksey Lim came up with a gstreamer plugin for espeak that needed no
configuration, and that's what we've been using ever since.

One problem we have with TTS is doing highlighting.  An XO laptop is
not fast enough to make the highlighted word keep up with the word
being spoken.  (The gstreamer plugin does callbacks just before it
speaks a word, and these callbacks are used to highlight the words).
A slightly faster computer is enough to resolve the problem.  If
Festival needed more horsepower to run than espeak it would make a bad
situation worse.

James Simmons

On Tue, Jun 21, 2011 at 8:43 AM, Paul Fox <pgf at laptop.org> wrote:
> sridhar wrote:
>  > I'm wondering if there's anything we can do to make TTS sound more
>  > 'human'. We'd like to be able to use the XOs to teach English
>  > literacy, but the espeak voices are very robotic.
>  >
>  > My understanding is that espeak is optimised for low-power devices
>  > (great for XOs) and clear (if robotic) speech. Would it be feasible to
>  > switch to something else, like festival?
> i've run festival as part of my home automation system for many many
> years, including the last 3 or so on an XO-1 (debxo) which acts as my
> current HA server.
> the first secret is to run it in client/server mode, to avoid the
> server startup latency on every enunciation.  but even after that, i
> think the latency will be too high for your application.  i just
> tested it:  given a moderate english sentence, it took 3 seconds to
> produce output.  (i hide this on my system by caching utterances --
> that's more feasible in a menuing system than when teaching literacy.)
>    http://dev.laptop.org/~pgf/junk/festival_out.wav   (5 seconds on XO-1)
> flite is a lower cost version of festival that might be appropriate.
> it seems to reduce the conversion time to about half a second.
> but the quality suffers as well.
>    http://dev.laptop.org/~pgf/junk/flite_out.wav   (.5 seconds on XO-1)
> fyi, current festival server process footprint:
> root       999  0.0  9.4  26668 20004 ?        Ss   Jun06  10:03 /usr/bin/festival --server /usr/local/etc/nosil.scm
> i haven't used espeak -- i suspect there are API interfaces that are
> far richer than what i'm doing from the shell commandline.  i don't
> know how one might access festival at that level.
> paul
>  >
>  > This is some food for thought:
>  > http://braille.uwo.ca/pipermail/speakup/2008-July/046755.html
>  >
>  > Sridhar
>  >
>  >
>  > Sridhar Dhanapalan
>  > Technical Manager
>  > One Laptop per Child Australia
>  > M: +61 425 239 701
>  > E: sridhar at laptop.org.au
>  > A: G.P.O. Box 731
>  >      Sydney, NSW 2001
>  > W: www.laptop.org.au
>  > _______________________________________________
>  > Devel mailing list
>  > Devel at lists.laptop.org
>  > http://lists.laptop.org/listinfo/devel
> =---------------------
>  paul fox, pgf at laptop.org
> _______________________________________________
> IAEP -- It's An Education Project (not a laptop project!)
> IAEP at lists.sugarlabs.org
> http://lists.sugarlabs.org/listinfo/iaep

More information about the Sugar-devel mailing list