PhpRiot
News Archive
PhpRiot Newsletter
Your Email Address:

More information

Translating Twitter

Note: This article was originally published at Planet PHP on 5 January 2011.
Planet PHP

Translating Twitter

London, UK Wednesday, January 5th 2011, 09:04 GMT

As the author of Xdebug I am interested in finding out what people think of it, and whether they have problems or compliments. I've set-up a twitter account for Xdebug, @xdebug, and my twitter client Haunt also shows me all tweets with the search term xdebug.

However, sometimes I get tweets in a language I can't read; for example Brazilian Portuguese:

Debugando aplicaAAues PHP com Xdebug e Eclipse PDT: http://bit.ly/ffJC4G

junichi_y

or Japanese:

@pomu0325 aaSaOaaa"a-aaaia"aXdebugaaaa-aaaaa'eaaaYa"aaaia"a"aauaaaaa"aaaa

Ken

Once in a while, I would send these tweets through Google's language tools but then my friend Elizabeth tweeted:

Hey Lazyweb, is there a twitter client that lets me filter tweets by language?

Elizabeth Naramore

Instead of a manual copy and paste in into the language tools, I thought it'd be nice to embed it directly into the client when it is requested.

Sadly, tweets don't have a language associated with them, so the first step is to actually find out which language a tweet is in. Google provides a web service called "Language Detect". To use this service, you only have to query a specific URL containing the text you want to guess the language off. and parse the returned JSON structure. The Google website has an example which basically boils down to requesting the following URL: https://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=Hola,%20mi%20amigo

It returns the following JSON struct:

{ "responseData": { "language":"es", "isReliable":false, "confidence":0.08829542 }, "responseDetails": null, "responseStatus": 200 }

If the responseStatus is 200, then it worked. responseData-language contains the found language, and responseData-isReliable/responseData/confidence describe how sure Google is that the language found is actually correct. The larger the text, the easier it is to find out of course. In this case, although the confidence is low, the language is guessed correctly: es, for Spanish.

Now we have the language, we can use another web service from Google to translate the text from the guessed language to our target language which in my case is English. This Translate service wants the text and a language pair for translations. Google suggests you add a key, and an userip, but this is not strictly necessary. The language pair has the format source-language-code|destination-language-code; which is in our case es|en. The service is again very simple to use as you can see in this example. It boils again down to requesting an URL, such as: https://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=Hola,%20mi%20amigo!&langpair=es%7Cen

It returns the following JSON struct:

{ "responseData": { "transla

Truncated by Planet PHP, read more at the original (another 764 bytes)