Translation technology (and its impact on English’s dominance)

By Brian Flaherty
February 11, 2024

Today we’re exploring translation technology.

With over 7,000 languages worldwide, overcoming the language barrier is one of the oldest problems in human history.

For decades, English has been the world’s “glue,” providing the foundation for trade, tourism, and science.

But now that translation technology is getting scarily good, will this affect English’s role as the global lingua franca?

Let’s go 👇

Note: Most of this issue is free, but to read the end you’ll need the All-Access Pass 🎟️

Table of Contents

Solving the language barrier

Language barriers have been a problem since writing was invented 5,400 years ago.

Logically, there are only two ways to overcome a language barrier:

Get all parties to use a common tongue (even if it’s not their primary one), or

Have everything translated

Translation used to be labor-intensive and costly. It just wasn’t practical for most people. So establishing common languages became the best way to break through the barrier.

But now we have some astounding recent developments in translation technology. So this dynamic may be starting to change.

And if it does, the implications for the future of language and communication are profound.

While the famous trilingual Rosetta Stone is over 2,000 years old, the earliest known bilingual tablets are twice as old. Image: Stefan von Imhof, from The Debate over Artifact Ownership

To understand how language barriers might be overcome in the future, let’s start with how they’ve been overcome in the recent past — through English.

How English became the world’s language

The world has settled on English as the common language.

It’s the most popular language in the world, with 300 million more speakers than the next runner-up, Mandarin.

Worldwide speakers per language. English is the global “lingua franca“ – which is a bit ironic since that phrase roughly means “language of the French.” Image: Statista

The vast majority of English speakers learn it as a second language.

English has just 400k native speakers (well behind Mandarin or Spanish) — implying that over a billion non-native speakers have learned it.

Duolingo’s 2023 language report shows that in countries where English is not the native language, English is usually the most popular choice. But is Duolingo’s business model at risk? Is this why they recently added math and music? Image: Duolingo

English’s dominance is best exemplified by the fact that the European Union Commission operates in English – even though Britain left in 2020, and English is only the official language in two relatively small member states (Ireland and Malta).

And English’s reach spreads far beyond government. It’s the global standard for both international business and technological innovation:

Business: Multinational companies have basically all adopted English as their official inter-company language, regardless of where they’re based. Examples include Nokia (Finland), Airbus (France), and Samsung (Korea).

Science: Every one of the world’s 100 most influential scientific journals is published in English. Non-English papers have significantly lower citation rates.

Air travel: English is literally the language of the skies. It’s the official language of international air traffic control.

Programming: The world’s most common programming languages are all English-based. (Though there are exceptions)

It’s easy to forget that most of the world’s programming languages are based in English. Developers around the world need to learn English in addition to Javascript, Python, etc.

So how did English become the global standard?

Network effects

The value of learning a language is strongly impacted by the number of people who already speak it.

This is the economic idea of agglomeration, or network effects. It constantly pops up: Social media companies, venture capital, ski resorts, etc.

Whichever language has a slight head start in total use can quickly snowball into a global standard.

English got its head start with the rise of the British Empire, which at its peak in 1921 ruled over nearly a quarter of the world (in both land area and population).

Britain may have peaked as far as being a world superpower goes. But English-speaking countries punch way above their weight. (Too bad you can’t patent a language!) Image: Nationmaster

Languages in the same family are easier to learn

One underrated fact about English is that it’s an Indo-European language.

Languages in the same family tend to have similar alphabets and structures, making them easier to learn as a secondary language. Today, about half the world speaks a language in this Indo-European family.

This is why Mandarin is unlikely to replace English as a global language. As a member of the Sino-Tibetan language family, Mandarin is fundamentally harder for Indo-European speakers to learn.

Unlike Mandarin, English is part of the Indo-European language family (colored here in green). Image: Alumnum

But there is one area where English is less dominant than you might expect — media.

Is English’s grip on media weakening?

English is considered the language of the internet. But in reality, the online world is actually pretty multilingual.

Studies of the top languages on Twitter find that only half of tweets are in English. A plurality, sure. But not exactly total dominance.

English is hardly the only language on Twitter, with about 19% of tweets in Japanese and 9% in Portuguese. This data mirrors what we see with content sites. Image: PARC

Moreover, foreign-language media has been having something of a renaissance in recent years, including:

Films and shows like Money Heist and Parasite,

Musical artists like Bad Bunny and BTS.

Anime is now reaching a more mainstream audience through apps like Crunchyroll.

It’s an economic solution because it’s dominant in practical areas like coding, research, and communicating with foreign travelers. But it’s less dominant in cultural domains.

In fact, culture is becoming tailored for a global audience, undermining the old assumption that international consumers must adapt to English:

The rise of moviegoers in China has created an incentive for Hollywood to make movies that appeal to Chinese audiences as well as Western ones… You get a lot of movies…which dominated the Chinese box office but barely registered in the US. -Conor Sen, Bloomberg (2018)

English’s dominance is mostly about economics

To get things done, the world needs to overcome language barriers.

We need a tool, and English is simply the most cost-effective tool to do so.

But here’s the thing — if English is really nothing more than a cost-effective solution to overcome language barriers, it’s liable to be replaced by an even better solution.

And in recent years, one viable alternative to global English domination has emerged: universal translation technology.

A quick history of machine translation

Machine translation has been around for a while, but it’s only recently that computers have been able to translate anywhere close to human level.

The ultimate goal is a universal translator, a device commonly found in sci-fi that allows instantaneous communication across any language barrier.

The idea of a universal translator is a common trope in science fiction, including Dr. Who, Men in Black, and Star Trek. Image: TravisTranslator

How far has machine translation come, and how far is there left to go?

The ‘rules-based’ approach

One of the earliest attempts at machine translation was done by SYSTRAN, founded in California in 1968.

In another example of how America’s national security industry launched Silicon Valley, SYSTRAN’s earliest customer was the US military, who wanted to translate Russian documents.

SYSTRAN was an early innovator of the rules-based approach to machine translation.

It’s a three-step process:

Deconstruct the sentence into individual words (and their role in the sentence)

Translate each word into the target language, one by one

Reconstruct the sentence using the rules of the target language

Sounds good in theory, right?

Unfortunately language is too messy for this rules-based approach to work well.

First, words can have multiple definitions. “Book” can translate to libro or reservar in Spanish, depending on the context.

Second, rules-based systems struggle with ambiguity that doesn’t align with the rules of a language (which is often the case with speaking transcriptions).

Rules-based approach was considered cutting-edge until someone realized that translation is less about linguistics, and more about statistics.

Statistical Machine Translation (SMT)

The ideas behind statistical machine translation (SMT) were known as far back as the 50s, but it wasn’t until IBM started experimenting with stats in the 80s that people began taking the approach seriously.

Frederick Jelinek led the IBM team working on SMT. He’s attributed with the famous quote: “Every time I fire a linguist, the performance of the speech recognizer goes up,” demonstrating that translation isn’t strictly a language problem. Image: Johns Hopkins

Statistical approaches are more contextual than rules-based ones, working to understand context and phrases.

As the name implies, these translations are based on statistical probabilities, with likelihoods generated by learning from underlying data sets.

While a rules-based approach would understand “the curry restaurant” as three separate words, a statistical approach recognizes that it’s all part of the same phrase. Image: Kantanmtblog

Despite some drawbacks, SMT is very powerful. For years, this was the technology powering Google Translate.

But in 2016, Google swapped statistical models for an even better approach: neural machine translation.

The modern standard: Neural machine translation (NMT)

In translation, context is everything.

Just as SMT started translating phrases instead of words, NMT started translating not just phrases, but entire sentences.

NMT models owe their reliance on artificial intelligence. The first major NMT system was introduced by Chinese company Baidu in 2015. A year later, Google followed suit with their own model, dramatically improving accuracy. Image: Google

NMT models have come closer to matching human performance in translation than any other technology.

Upon introduction in 2016, Google’s GNMT performed much better than PBMT (a statistical model) — although both still lagged behind human performance. Image: Google

NMT models have also opened the door to a fascinating new capability: being able to translate between language pairs, for which the model has no bilingual training data! (This is called zero-shot translation).

Google’s old SMT model had an ”intermediary” translation to English. But this seriously hampered its accuracy, and NMT models don’t need it.

Here’s how zero-shot translation works:

An NMT model learns to translate between a primary pair with plenty of data (like Korean to English).

The model also learns to translate between a secondary pair, (like English to Japanese).

Having fine-tuned its parameters, the model can spontaneously translate between Korean and Japanese — despite never having seen any Korean-to-Japanese translation.

Google’s previous SMT approach worked like a giant game of ‘telephone’. Translating from Japanese to Korean meant first translating from Japanese to English, then English to Korean. In contrast, zero-shot translation is direct.

Similarly, NMTs have enabled the possibility of doing direct speech-to-speech translation, without first having to transcribe audio into text.

Google and Meta have both successfully experimented with this approach, which could help retain non-linguistic aspects of conversations (like timing, pauses, and laughter) during real-time translations.

The future of translation

English has historically been used as a common language to overcome language barriers.

But now, translation technology is finally good enough to be a cost-effective replacement for English.

So, what does this mean for English? What does the future of translation look like? And what companies are building that future?

The most exciting translation companies

Unsurprisingly, Big Tech is still the biggest player in translation, with Google, Amazon, Microsoft, and Meta all boasting expansive research in the space.

But here are some startups around the world to pay attention to:

Keep reading with a 7-day free trial

Get the All-Access Pass to access our best content, forever

Sign in to your account.

Share

International Investing

Author

Brian Flaherty

Brian's interest in finance started from an early age, when he used money saved from working summer jobs to purchase his first mutual fund at 15. He went on to pursue the field in school, eventually graduating from the University of Virginia with a Bachelor's degree in Economics. After graduation, Brian put his expertise to work advising institutions and high-net-worth investors as a strategist at a wealth management firm. Recently, Brian transitioned to pursue a career as a financial writer, where he leverages his writing skills and his financial knowledge to help investors uncover the best opportunities and make intelligent use of their capital.

Related Posts

Nomad Capitalism: How to embrace global citizenship

Nomad Capitalist is a company offering what I’d call “high-end nomad lifestyle consulting.”

April 18, 2024

International Private Credit

Most private debt discussion is focused on the US. But a majority of the world’s private credit assets are invested outside America! So we’re exploring the intersection of international investing and private credit.

April 11, 2024

lucha libre

Let’s invest in Mexico

Tequila Industry Cash Flow Part 1, Tequila Industry Cash Flow Part 2, STRs in Puerto Vallarta, and More!

March 6, 2024

yuri zatarain gallery

Investing in Mexico: A recap of our investor field trip to Jalisco

A comprehensive recap of our very first alternative investor retreat: A unique tequila-themed journey to the heart of Mexico.

March 3, 2024

Recently Published

A three-part framework for investing in prefab housing startups

April 25, 2024

nuclear

Let’s Nuke AI

April 24, 2024

Inverse Cramer Weekly Update — Apr 21

April 22, 2024

Website

Address

651 North Broad Street, Suite 206 Middletown, DE 19709

Subscribe to Newsletter

© 2023 Alt Assets, Inc. All Rights Reserved

Disclaimer: The authors of Alt Assets, Inc. are not finance or tax professionals. They are self-taught accredited investors, sharing information, research, and lessons learned. The published content is unique, based on certain assumptions and market conditions at the time of publishing and is intended to serve solely as research, not financial advice. Alts I LLC (the “Fund”) is an affiliate of Alt Assets, Inc. and the Fund has conducted a private placement offering under Rule 506(c) of Regulation D of the Securities Act of 1933, as amended. The Fund may invest in one, several or all of the alternative asset classes that Alt Assets, Inc. publishes content on its site. Any published articles on Alt Assets, Inc. that an alternative asset has a “buy”, “pass”, “overvalued” or “undervalued” designation does not factor into the asset classes that the Fund through its manager ultimately invests in, and thus, any of the Fund’s investments that have positive designations on the Alt Assets, Inc.’s site are purely coincidental as the Fund is actively managed and guided by its own investment parameters as summarized in the relevant private placement memorandum.

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.