Wednesday, June 17, 2009

Year: 2009 Week: 29 Number: 108

  • [Privacy violation?] - There was a bit of drama on the Foundation mailing lists when a KnowPrivacy study - a research project by the School of Information from University of California in Berkeley - hit the German media. It showed findings that Wikipedia, among many other websites, used "web bugs" like Google Analytics on its pages. Further investigation revealed this was triggered by content on vlswiki & huwiki. The former was added to confusion that it was a privacy violation and it turns out the Hungarian Wikipedia was just a false alarm (it was a private stats generator)
... that Google has released a very handy tool for translating a Wikipedia article?

Google offers since long a free automatic translation service. Based on that Google Toolkit is released. You have a personal overview page, "a desk", where you can upload text from files or websites.

For text from (the English) Wikipedia is there a special interface designed for translating Wikipedia content. The source text must be English, the target language options are limited but there is a fair collection.

The basic idea is that the toolkit will automatically translate the text for you and then you can make fixes to the text. Not only does this help you get the translations done but it also helps Google Translate to learn from your changes and become better and better. That translation in progress can be shared with other people so the can also work on it.

How good the automatic translation is will probably differ from language to language. Translations from English to Arabic should be fair because Google used that explicitly as in example in there press release.

This tool can make it very easy to translate an article from the huge English Wikipedia to one of the many small wiki's in other languages. The toolkit may change the way Wikipedia grows in other languages. If used correctly. It can also be used to import a lot crappy new articles to a wiki.

Google Toolkit could also be useful for making translations of Wikizine, if one is interested in doing so.


  1. Walter, as a longtime and grateful reader of Wikizine I know that this is your own publication, not a Wikipedia article subjected to NPOV, so you are entitled to your own opinions here.

    But still I find it very strange that you ridicule the privacy concerns of Michael Snow, Tim Bartels, Domas Mituzas, David Gerard and other senior Wikimedia figures as "drama". And it is not correct that the case of the Hungarian Wikipedia "was just a false alarm". An individual set up a private server surveilling all reading and editing activity on that project for years, without consent or even knowledge of the Wikimedia Foundation.
    The web bug was removed by a steward as "violation of privacy policy" [1] and apparently has not been reinstated.

    [1] http://hu.wikipedia.org/w/index.php?title=MediaWiki:Lastmodifiedat&oldid=4493139

  2. Hey Anwech, thank you for your comment. I actually wrote that piece, so Walter asked me to comment here.

    The word "drama" was not necessarily intended to belittle the concerns, rather to express the fact that there was quite a bit of controversy and conflicting opinions on the matter. If you read foundation-l, you'd see that there was also some messages from hu.wp people who were like "you shouldn't have done this without talking to us first... :-|". You are probably right that it was still a violation of the privacy policy, but in the context of the study it was a false alarm. The study attempted to search for web bugs (third-party companies) and mis-identified the hu.wp stats generator as "Doubleclick" (which it wasn't).

    Also, I wouldn't ridicule the privacy concerns of those people considering I know many of them and work with them on a day-to-day basis. :-)

    I hope this clears it up, maybe "false alarm" and "drama" weren't the best choice of words -- would you be interested in helping us in writing Wikizine? If you subscribe to the editors list (low traffic), you can updates on the status of new publications. You can also help copyedit/review them before they're published, this might help us catch confusing wordings like this in the future.

