How Google Translate Works, and Why It Doesn’t Measure Up Posted by Transparent Language on Sep 2, 2015 in Archived Posts
With over 200 million daily translations, there’s no denying that Google Translate is a wildly popular translation service. Indeed, machine translation has come very far since its infancy in the early 2000s. Instead of translating words at face value, machine translators have developed complex algorithms to deliver more accurate translations, and some even take into account colloquial language and idioms. Still, the very nature of machine translators prevents them from ever doing a human’s job. Let’s take a look at how machine translators (such as Google Translate) work, what their limitations are, and why they can’t replace the quintessential human touch.
How machine translation works
Google Translate, as well as other machine translators, operate on statistics rather than rules. That is, they look for patterns in hundreds of millions of documents that have already been translated by human translators. Google Translate makes special use of UN documents, which are translated in all six official UN languages, and thus provide ample linguistic data. This way, they can weigh a plethora of options for phrases presented by various different (human) translations, and select an educated guess based on the one that occurs most frequently. For example, they detect that, in Spanish, the phrase “darse cuenta” is usually translated as “realize” in English. Therefore, based on statistics, Google Translate will correctly translate the phrase as “realize”, rather than a word-for-word translation, which would appear more like “give account”.
Finding linguistic data large enough to create legitimate statistical analyses is no easy feat. Given that more documents are available in English than in any other language, the data almost always uses English as an intermediary step when translating between two languages that aren’t English. For example, when translating from Russian to Spanish, Google Translate will first translate the text from Russian to English, and then from English into Spanish. As a result, when translating in languages other than English, machine translations actually involve two iterations.
In fact, some language pairs involve even more iterations. If you want to translate some text from, say, Catalan to Japanese, Google will translate it first into Spanish, as most existing Catalan translations are in Spanish. Then, this translated Spanish-language version of the original Catalan text will be translated into English. And finally, the English version of the Spanish version of the Catalan text will finally make it to Japanese — and if you’re lucky, it will still bear some resemblance to the original meaning.
Why it doesn’t make the cut
Google Translate does a good job with very basic translations — especially those whose target language is English — and now even offers alternative interpretations for certain words and phrases. However, the very methodology upon which Google Translate is based prevents it from ever competing with human translators. Here’s why:
Statistics don’t have feelings. Google Translate is based on statistics — it chooses the “best” translation based on how certain words and phrases have been translated in other documents. As a result, machine translators choose the most probable translation, but not the most interesting or poetic one. As a result, even if translations are accurate (which they often aren’t), they adopt a robotic, lifeless tone. It takes a human translator, with feelings and creativity, to reproduce the tone, color, and vibrancy of the original text.
Machine translations struggle with complex grammar. Language is based on rules, and as a result, a statistics-based translator like Google will struggle with complex grammatical concepts, such as the difference between the imperfect and preterite past tenses in Romance languages. This is especially true given that Google almost always uses English — a language that does not grammatically distinguish between preterite and imperfect tenses — as an intermediary step when translating into Romance languages. Therefore, Google Translate often incorrectly translates the imperfect past as the preterite past (and vice versa), making ongoing or habitual acts seem like one-time, completed events.
Google can’t write for an audience. Every translator knows that you need to tailor your work to whom you’re writing for. For example, if this article were written for a casual blog, my use of the word “whom” in the previous sentence may come off as overly formal. However, given that this article appears on a language interest blog, grammarians and language experts may applaud my correct distinction between “who” and “whom” (though they may scoff at my decision to end a sentence in a preposition). Machines cannot make such judgment calls — Google cannot take into account who the intended audience is for the article it translates. Only a human translator can make that kind of decision.
Google Translate vs. a human being
To illustrate the difference between Google Translate and a living, breathing human translator, I will employ both to translate the following text in English, which appears on a website selling Argentine wine. Try to guess which one was written by a human, and which was produced by a machine (spoiler alert: it won’t be hard).
Después de una excelente cosecha como la que le precedió, la cosecha 2009 muestra sus virtudes en este vino base Cabernet Sauvignon, mas el ensamble de tres variedades de gran personalidad que encontraron en San Rafael el terruño ideal para la expresión de sus mejores cualidades. Vino aun de color rojo violáceo intenso a pesar de los años en botella, ya en la copa se nos muestra intenso y seductor con aromas especiados que se entremezclan con nítidos y frescos aromas a frutas de ciruelas, cerezas negras y moras, mientras que se van desprendiendo lentamente los aromas tostados que recuerdan a granos de café molidos.
After an excellent harvest like the one that preceded it, the 2009 vintage shows its virtues in this Cabernet Sauvignon base wine, plus the assembly of three varieties of great personality that found in San Rafael the ideal terroir for the expression of its best qualities. It was still intense violet red despite years in the bottle, already in the glass it is intense and seductive with spicy aromas that intermingle with clear and fresh aromas of plums, black cherries and blackberries, while leaving slowly releasing toasted aromas reminiscent of ground coffee beans.
After a great harvest like the one that preceded it, the 2009 harvest shows its virtues in this cuvée Cabernet Sauvignon. It’s an expressive mixture, articulated by three varieties of great personality that are found in San Rafael, the perfect region to bring out its best qualities. The wine still preserves a strong purplish-red color, in spite of the amount of years gone by since it was bottled. Once poured into the glass, it remains intense and seductive. Its spiced scents mix together with clear and fresh fruit aromas of plum, black cherry and blackberry, while its toasted scents release slowly, reminiscent of the delicious smell of ground coffee beans.
You probably guessed right: the first translation was done by Google; the second, by a professionally trained bilingual translator. As you can see, the machine translation is comprehensible, but inelegant and sometimes confusing. On the other hand, the human translation flows smoothly, is organized coherently, and matches the elegant tone of the original article.
Machine translation isn’t without its perks. It can be a life-saver in a pinch, when you need to get a rough idea of what a certain phrase means, or when you need to decipher a street sign while traveling to a foreign country. However, when it comes to translating important documents, the limitations of machine translation prevent it from being a viable option. As the above examples demonstrate, a translation that is based on statistical patterns will never match the quality of one created by a professional, who understands the rules and nuances of language. Indeed, machine translation has come a long way, but it’s still far from replacing the human touch.
The following post is from Paul, an English teacher who lives in Argentina. Paul writes on behalf of Language Trainers, a language teaching service which offers foreign language movie reviews as well as other free language-learning resources on their website. Check out their Facebook page or send an email to paul(at)languagetrainers.com for more information.
Want to hear more? Sign up for one of our newsletters!
For more free resources, advice, and language news from Transparent Language, sign up for the newsletter(s) most interesting to you.