[Sugar-devel] GSoC Translation Server Proposal

Erik Price erik.price16 at gmail.com
Thu Apr 25 13:42:51 EDT 2013


Hey Aneesh,

Thanks a for going through, I'll try to answer your questions and
clarify a bit for anyone else who's interested.

Comments / criticisms from others on the mailing list are very welcome!

I do apologize for the length of my messages, I wanted to be as specific
as possible.

> It would be good if you could add the expected time you'll require to
> complete each of these phases below. Also, leave a buffer of 3 days
> between phases for review + feedback + other additions.
>

Okay, will do. I think I'll have to wait until I figure out exactly what
will be required in order to make an informed analysis of the time, but
I expect the server / translation backend components to occupy the first
half of the allocated time, finishing up some time around the midterm
assessments. The client API should be a much simpler process, and will
hopefully take no longer than 2 weeks.

Again, I'll refine this to be far more specific when some more decisions
are finalized.

>>
>> ...snip...
>>
>> The first of these plug-ins would be using Apertium, the FOSS project
>> already used by Sugar through the #meeting-es irc channel on
>> freenode. Next, Bing Translate will likely be added, due to it being
>> one of the major web translators that provides a free API key.
>>
>
> Just FYI: Bing only provides free service for upto 2 million
> characters per month. (
> https://datamarket.azure.com/dataset/1899a118-d202-492c-aa16-ba21c33c06cb)
>

Yeah, and the cost for the next 2 million ($40 USD) is pretty
prohibitive in my view. Still, 2 million characters per month is not
terrible if it's being used by a relatively small distribution of
XOs / students.

I do agree though that if there is a "more free" service, it should take
preference.

> How about Bablefish? They don't have an API, but there is nothing which
> prevents you from creating one. And it seems like 20 lines of python code
> to me.
>

Assuming you're talking about babelfish.com since there appear to be
a couple services named babelfish.

I didn't notice anything in their terms of use that prohibited screen
scraping, so this definitely may look like a service to look in to. I'm
not at all familiar with the quality of the translations though.

I'm also a little wary of the fact that there appears to be so little
information on them as a service. It seems that for all intents and
purposes, the site barely exists.

There's also babelfish.de which looks somewhat promising, but prohibits
using automated scripts to grab data without the owner's permission.

>> Google Translate is another high priority service due to its quality,
>> but will not be added initially because its API has no free tier for
>> usage.
>>
>
> Do their terms and condition state that we can't make more than "n"
> requests? I read some threads on SO where people mentioned that they used
> some PHP code to make post requests to the google translate server. I just
> want to know that will this be illegal or will it void some of their terms
> and conditions?
>

Google deactivated their free Translate API a few years back, so there's
no official way to get a free translation from the service any more. It
is possible (and very easy) to screen scrape Google Translate, but it's
explicitly prohibited by the terms of service. Going against that would
look bad for Sugar Labs, especially considering that this would be a
Google Summer of Code project.

I personally am very much against the idea of going around any terms of
service agreement, but if someone really wanted to, there's nothing
stopping someone from developing a third-party plug-in for the server
independently of this project.

>> - How do the clients become aware of the server? Is it configured, or
>> is there some kind of auto-detection?
>>
>
> I'd say we setup a public domain and hardcode it in the code!
>

I'm not sure I understand your intent here. By this, you mean having
only one server globally? In my mind, that would undermine the goal of
having a server in the first place.

Sure, having a "default" server run by Sugar Labs (or whoever) would
make things more convenient, but I don't think it should be the only
service. Part of the idea of this project is to allow users to
create their own servers customized to what they need to do.

Obviously having every activity that uses the API to rediscover the
translation server is far from ideal, and would result in a lot of
duplication. I'm not sure here, perhaps discovery could somehow be tied
into the jabber server the XO is connected to?

>> - Is it reasonable to establish large servers with more resources to
>> be used by XO users who may not have access to a server or the
>> technical abilities to manage one? How would abuse be prevented?
>>
>
> What abuse?

By abuse, I mean someone taking advantage of the server to provide them
with unlimited translations. This is essentially only problematic when
the server operator uses non-free translation services that may be
limited by number of characters or is rate-limited.

You also obviously don't want malicious random users to try to translate
several MB of text at a time just to overwhelm the server or hit an API
limit.

> and what do you mean by "XO users who may not have access to a
> server or the technical abilities to manage one?" if they don't have access
> to the server they can't translate the message.

Here, I mean that many schools or organizations who may wish to use this
may not have a person on staff that has the ability to monitor and setup
the server. The idea of having a default server instance available for
anyone to use would probably be suitable for this case.


- Erik Price


More information about the Sugar-devel mailing list