[Sugar-devel] [Sugar] Implement text to speech in Sugar feature - v2

Sascha Silbe silbe at activitycentral.com
Mon Jan 23 11:08:54 EST 2012


Excerpts from godiard's message of 2012-01-17 22:49:31 +0100:

> Added controls to pause/stop and addressed suggestions
> in the review process.

Please put the changelog below the marker line, i.e. below this one:

> ---

Please add a description. Besides other things I'm missing a clear
mention that the (currently undocumented) dependency on the OLPC Speech
Server [10] gets replaced by a new dependency [1] on gst-plugins-espeak.
The latter still isn't packaged for many distros, including Debian.
Given that the OLPC Speech Server isn't packaged either, it's no reason
to reject the patch, but it's definitely worth mentioning.


[data/sugar.schemas.in]
> +    <schema>
> +      <key>/schemas/desktop/sugar/speech/pitch</key>
> +      <applyto>/desktop/sugar/speech/pitch</applyto>
> +      <owner>sugar</owner>
> +      <type>int</type>
> +      <default>50</default>
> +      <locale name="C">
> +        <short>Default pitch to the speech sugar service</short>

Probably a typo: s/to/for/. 

As can be seen from the code, we retain user changes over a restart of
Sugar, so we should drop the mention of "default".


> +        <long>Pitch value used by the speech service in Sugar,
> +        can be changed by the user, with controls in a icon
> +        in the frame</long>

If we mention UI details here, we'd need to update them whenever the UI
changes. Please state only what the setting affects.


[...]
> +      <key>/schemas/desktop/sugar/speech/rate</key>
[...]

Same as above.


[extensions/deviceicon/speech.py]
> +from gettext import gettext as _
> +import gconf

gconf isn't part of the standard Python libraries, so it should be moved
to the following block of third-party module imports:

> +import glib
> +import gtk


[...]
> +class SpeechDeviceView(TrayIcon):
> +
> +    FRAME_POSITION_RELATIVE = 105

Please consider using a number that leaves us free to add more Device
Icons in between without changing existing code. 150 might be a good
choice.


> +    def __init__(self):
> +        client = gconf.client_get_default()
> +        self._color = XoColor(client.get_string('/desktop/sugar/user/color'))
> +
> +        TrayIcon.__init__(self, icon_name=_ICON_NAME, xo_color=self._color)
> +
> +        self.set_palette_invoker(FrameWidgetInvoker(self))
> +
> +        self._manager = speech.get_speech_manager()
> +
> +        self.connect('expose-event', self.__expose_event_cb)
> +
> +        self._icon_widget.connect('button-release-event',
> +                                  self.__button_release_event_cb)

Please don't spread the code this much using blank lines. In fact the
above code would be at least as readable even without any blank line.


[...]
> +    def __button_release_event_cb(self, widget, event):
> +        if event.button != 1:
> +            return False

As the Palette doesn't have a default action, left-click should open the
Palette, too.


> +    def __expose_event_cb(self, *args):
> +        self._update_info()

Where does _update_info() get defined? What's the purpose of the expose
callback?


> +class SpeechPalette(Palette):
> +
> +    def __init__(self, primary_text, manager):
> +        Palette.__init__(self, label=primary_text)
> +
> +
> +

Please use pep8 and pylint [2] to check your code prior to submission
[3,4].


[...]
> +        """
> +        self._play_item = MenuItem('Say selected text')
> +        self._play_icon = Icon(icon_name='player_play',
> +                icon_size=gtk.ICON_SIZE_MENU)
> +        self._pause_icon = Icon(icon_name='player_pause',
> +                icon_size=gtk.ICON_SIZE_MENU)
> +        self._play_item.set_image(self._play_icon)
> +        self.menu.append(self._play_item)
> +        self._play_item.show()
> +        self._play_item.connect('activate', self.__play_clicked_cb)
> +
> +        self._stop_item = MenuItem('Stop talking')
> +        self._stop_icon = Icon(icon_name='player_stop',
> +                icon_size=gtk.ICON_SIZE_MENU)
> +        self._stop_item.set_image(self._stop_icon)
> +        self.menu.append(self._stop_item)
> +        self._stop_item.show()
> +        self._stop_item.connect('activate', self.__stop_clicked_cb)
> +        self._stop_item.set_sensitive(False)
> +        """

If you don't intend to execute code, please remove it entirely (applies
to several other parts of your patch as well). Also docstrings are a
particularly bad way of marking code to not get executed: without syntax
highlighting (e.g. in emails and a lot of editors) it's hard to notice
that the above, long code block doesn't get executed. Using comments
makes it obvious as each line is marked individually. Some text editors
even have shortcuts to comment in/out the currently selected code block
to make this trivial.


[...]
> +        pitch_step = 10
> +        self._adj_pitch = gtk.Adjustment(value=self._manager.get_pitch(),
> +                                          lower=self._manager.MIN_PITCH,
> +                                          upper=self._manager.MAX_PITCH,
> +                                          step_incr=pitch_step,
> +                                          page_incr=pitch_step,
> +                                          page_size=pitch_step)

Why do you disallow coarser steps when using the keyboard to change the
value?


> +        self._hscale_pitch = gtk.HScale(self._adj_pitch)
> +        self._hscale_pitch.set_digits(0)

> +        self._hscale_pitch.set_draw_value(False)

This is up to the Design Team to decide, but I'd imagine it to be useful
to see the current value as a number, so the learner can reproducibly
set the same value (both as before and as other learners).

[Rate adjustment]

Same comments as for the pitch adjustment.


[...]
> +    def __play_clicked_cb(self, widget):
> +        if self._manager.is_paused:
> +            self._manager.restart()
> +        else:
> +            if not self._manager.is_playing:
> +                self._manager.say_selected_text()
> +            else:
> +                self._manager.pause()

This could be streamlined by using elif.


> +    def __stop_clicked_cb(self, widget):
> +        self._manager.stop()

Why do we need a Stop button in addition to Play/Pause? And where does
this callback get invoked? AFAICT the hook-up is commented out.


[extensions/globalkey/speech.py]
[...]
> +BOUND_KEYS = ['<alt>s']

This will collide with shortcuts from existing applications and
activities, especially since GTK uses <Alt> for mnemonics [7]. To give a
concrete example of a non-mnemonic case: the Scan activity [8] uses
<Alt>+S to start or stop scanning [9].

Why didn't you keep the existing shortcut (<Alt><Shift>S)?


[src/jarabe/model/speech.py]
> +import gconf

Like in extensions/deviceicon/speech.py, the gconf import should be
moved to the block of third-party module imports.

> +
> +import gst
> +import gtk
> +import gobject
> +
> +import os
> +from gettext import gettext as _
> +import logging

Standard Python library modules should be listed first, before
third-party modules (PEP-8 section "Imports", second item).


> +# TRANS: The language pitch (range [0 - 99], default 50 for English)
> +# Look at http://espeak.sourceforge.net/commands.html for details
> +DEFAULT_PITCH = int(_('50'))
> +
> +
> +# TRANS: The diction speed, in average words per minute (range [80 - 390],
> +# default 170 for English).
> +# Look at http://espeak.sourceforge.net/commands.html for details
> +DEFAULT_RATE = int(_('170'))

Why did you choose a different default for rate than espeak did (170 vs.
175)?

Do these values really need to be localised? AFAICT from the document
you referenced, the upstream default is always the same and espeak
voices will interpret the value as needed.


[...]
[class SpeechManager]
> +    def __init__(self, **kwargs):
> +        gobject.GObject.__init__(self, **kwargs)
> +        self._player = AudioGrabGst()
> +        self._player.connect('play', self._emit_signal, 'play')
> +        self._player.connect('stop', self._emit_signal, 'stop')
> +        self._player.connect('pause', self._emit_signal, 'pause')
> +        logging.debug('SpeechManager setting default parameters')

Does this log message actually aid in debugging? The values always get
set on instantiation and before instantiation there's no SpeechManager
to access.


[...]
> +        try:
> +            self._loading = True
> +            self.restore()
> +            self._loading = False
> +        except:
> +            pass

Please don't catch exceptions unconditionally except in very specific
cases. I'm not even sure we should catch exceptions here at all, as
anything that could happen in self.restore() would be the result of a
severe problem and should be reported to the user, rather than ignored
silently.

You're leaving self._loading set if restoring the settings fails, which
will also prevent changes to be saved later on. But you can get rid of
self._loading completely if you set the internal variables (self._pitch
and self._rate) in self.restore() directly rather than using the
accessors.


> +    def _emit_signal(self, player, signal):
> +        self._is_playing = (signal == 'play')
> +        self._is_paused = (signal == 'pause')
> +        self.emit(signal)

This method does more than just emitting a signal. Something like
_update_state() would be a better name.


[...]
> +    def save(self):
> +        client = gconf.client_get_default()
> +        client.set_int('/desktop/sugar/speech/pitch', self.get_pitch())
> +        client.set_int('/desktop/sugar/speech/rate',
> +                self.get_rate())

You can access the private variables directly here; that would get rid
of the line break.

Also, if we save on every change, we should listen for gconf updates so
that outside changes (e.g. using gconf-editor or gconftool-2) will be
used by Sugar instead of getting overwritten on the next invocation of
save().


> +class AudioGrabGst(gobject.GObject):

What's the purpose of this class? What does it grab? Please consider
renaming the class and / or providing a useful docstring.

> +
> +    def __init__(self):
> +        gobject.GObject.__init__(self)
> +        self._pipeline = None
> +        self._quiet = True

What do we need self._quiet for? AFAICT it's only set, never read.


> +    def restart_sound_device(self):
> +        if self._pipeline is None:
> +            return

Calling restart_sound_device() shouldn't be allowed to be called if
there's no pipeline set up. The UI should deactivate the buttons while
there's no text to be spoken. restart_sound_device() et al. should
throw an error when called out-of-order.


[...]
> +    def make_pipeline(self, cmd):

Re. "cmd": Please don't abbreviate unless you have a good reason to do
so [6].


[...]
> +    def __pipe_message_cb(self, bus, message):
> +        logging.error('Gst message %s' % message)

Why do you mark every GStreamer message as error? AFAICT you will
receive these messages as part of normal operation.

Also please pass parameters to the logging function instead of doing the
formatting yourself. To quote [5]: "This is more robust and formatting
only happens if we actually log something."


> +        if message.structure.get_name() == 'espeak-mark' and \
> +                message.structure['mark'] == 'end':
> +            self.emit('stop')

Do we need to set the pipeline state to gst.STATE_NULL like in
self.stop_sound_device() or does that happen automatically?


> +    def speak(self, pitch, rate, voice_name, text):

Why do we need to pass the parameters in every time, rather than
settings the properties a) on init and b) when the gconf / UI ?


> +        # XXX workaround for http://bugs.sugarlabs.org/ticket/1801

Please get rid of the XXX, it does not add any value. If we intend to
remove this some time after SL#1801 gets fixed, we should mark it as
TODO.


> +        if not [i for i in text if i.isalnum()]:
> +            return

The user will be rather puzzled why saying some texts works and others
simply get ignored. We should filter out the individual characters that
gst-plugins-espeak doesn't like, rather than bailing out completely.


[...]
> +    def get_all_voices(self):

This should probably be a private method (i.e. be called
_get_all_voices()).


> +        all_voices = {}
> +        for i in gst.element_factory_make('espeak').props.voices:
> +            name, language, dialect = i

        for voice in gst.element_factory_make('espeak').props.voices:

Please consider using "voice" or a similar speaking name in place of
"i". The latter is traditionally used for _i_ntegers (or _i_ndexes,
which are also integers).


> +            #if name in ('en-rhotic','english_rp','english_wmids'):
> +                # these voices don't produce sound
> +             #   continue

Please remove dead code.


> +            all_voices[language] = name

Are the voices in ...props.voices ordered such that the preferred voice
for a given language is listed last? Can the user influence the
ordering? If not, we should make the voice configurable (which is easy,
given that we already pass it through SpeechManager), even if just as a
gconf setting with no UI in Sugar.


[...]
> +    def get_default_voice(self):
> +        """Try to figure out the default voice, from the current locale ($LANG)
> +           Fall back to espeak's voice called Default."""
> +        voices = self.get_all_voices()
> +
> +        try:
> +            lang = os.environ['LANG']
> +            if lang.find('.') > --1:
> +                lang = lang[0:lang.find('.')]
> +                lang = lang.replace('_', '-').lower()
> +        except:
> +            lang = ""
> +
> +        best = "default"
> +
> +        try:
> +            best = voices[lang]
> +        except:
> +            try:
> +                lang = lang[0:lang.find('-')]
> +                best = voices[lang]
> +            except:
> +                pass

The following would be more robust, not masking coding mistakes (like
"--1") by catching exceptions unconditionally. Not to mention that it's
a lot less code:

        locale = os.environ.get('LANG', '')
        language_location = locale.split('.', 1)[0].replace('_', '-').lower()
        language = language_location.split('-')[0]
        return voices.get(language_location) or voices.get(language) or 'default'


> +def get_speech_manager():
> +    global _speech_manager
> +
> +    if _speech_manager == None:

PEP8, "Programming Recommendations", second item: "Comparisons to
singletons like None should always be done with 'is' or 'is not', never
the equality operators."


Sascha

[1] https://wiki.sugarlabs.org/go/Development_Team/Code_Review#Dependencies
[2] https://wiki.sugarlabs.org/go/Development_Team/Code_guidelines#Tools
[3] https://wiki.sugarlabs.org/go/Development_Team/Code_Review#Proposal
[4] https://wiki.sugarlabs.org/go/Development_Team/Code_guidelines
[5] https://wiki.sugarlabs.org/go/Development_Team/Code_guidelines#Miscellaneous
[6] https://wiki.sugarlabs.org/go/Development_Team/Code_guidelines#Code_style
[7] http://www.gtk.org/api/2.6/gtk/GtkButton.html#gtk-button-new-with-mnemonic
[8] http://git.sugarlabs.org/scan-activity
[9] http://git.sugarlabs.org/scan-activity/scan-activity/blobs/master/scan.py#line655
[10] http://wiki.laptop.org/go/Speech_Server
-- 
http://sascha.silbe.org/
http://www.infra-silbe.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://lists.sugarlabs.org/archive/sugar-devel/attachments/20120123/daae667d/attachment.pgp>


More information about the Sugar-devel mailing list