KUALA LUMPUR: Google is urging Malaysians to publish more content in their native languages to help improve Google Translate service, which is a free service provided by Google to translate a section of text, document or webpage, into another language.
Google research scientist Ashish Venugopal admits that the system is not perfect and said that one way people can help improve it is to publish more high quality content in their native languages to provide more data for the translation models to work with.
“It got to a point where it worked so well on average that our expectations changed and we expect it to be perfect all the time. But by putting more content online, especially bilingual documents, it enriches the source data and allows the system to learn,” he said.
The service leverages on statistical machine translation, which involves the pattern matching of documents, which have existing translations hosted online.
Ashish added that due to the fact that it is an automated process, machines cannot discern the difference in levels of errors, which would be glaring to humans.
The challenges faced in ensuring accurate translations include the complexity of the language, how much available material exists and the ease of finding a corresponding word between documents.
Ashish said that traditional translation methods to date haven’t been able to scale up to the explosion of content and languages on the Internet, a problem Google Translate wishes to address via automated translations.
“It’s not a question of which is better, one just scales up better than the other,” he said, adding that the services currently supports 64 languages.
While not intended to be a replacement for quality translations done by people, the mission for Google Translate is to “break down the language barriers” online.
“We want to be able to translate the easy stuff so that professionals can focus on the high value translations. There is a big difference in translating what was said and being able to convey nuances such as sentiment and emotion,” he said, adding that a toolkit has also been released for professional translators and linguists.
When asked how explosive was the growth of non-English content, Ashish shared that in 2010, Chinese-language content accounted for 22% of web content with a growth of 277%, while Arabic accounted for 3.3% of web content but has experienced an explosive 2,000% growth.
Did you find this article insightful?