A Random Walk through the English Language

Jordan Ellenberg

Here’s a game Claude Shannon, the founder of information theory, invented in 1948. He was trying to model the English language as a random process. Go to your bookshelf, pick up a random book, open it and point to a random spot on the page, and mark the first two letters you see. Say they’re I and N. Write down those two letters on your page.

Now, take another random book off the shelf and look through it until you find the letters I and N in succession. Whatever the character *following *“IN” is—say, for instance, it’s a space—that’s the next letter of your book. And now you take down yet another book and look for an N followed by a space, and once you find one, mark down what character comes next. Repeat until you have a paragraph

“IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID

PONDENOME OF DEMONSTURES OF THE REPTAGIN IS

REGOACTIONA OF CRE”

That isn’t English, but it kind of *looks* like English.

Shannon was interested in the “entropy” of the English language, a measure, in his new framework, of how much information a string of English text contains. The Shannon game is a Markov chain; that is, it’s a random process where the next step you take depends only on the current state of the process. Once you’re at LA, the “IN NO IST” doesn’t matter; the chance that the next letter is, say, a B is the probability that a randomly chosen instance of “LA” in your library is followed by a B.

And as the name suggests, the method wasn’t original to him; it was almost a half-century older, and it came from, of all things, a vicious mathematical/theological beef in late-czarist Russian math.

There’s almost nothing I think of as more inherently intellectually sterile than verbal warfare between true religious believers and movement atheists. And yet, this one time at least, it led to a major mathematical advance, whose echoes have been bouncing around ever since. One main player, in Moscow, was Pavel Alekseevich Nekrasov, who had originally trained as an Orthodox theologian before turning to mathematics. His opposite number, in St. Petersburg, was his contemporary Andrei Andreyevich Markov, an atheist and a bitter enemy of the church. He wrote a lot of angry letters to the newspapers on social matters and was widely known as Neistovyj Andrei, “Andrei the Furious.”

The details are a bit much to go into here, but the gist is this: Nekrasov thought he had found a mathematical proof of free will, ratifying the beliefs of the church. To Markov, this was mystical nonsense. Worse, it was mystical nonsense wearing mathematical clothes. He invented the Markov chain as an example of random behavior that could be generated purely mechanically, but which displayed the same features Nekrasov thought guaranteed free will.

A simple example of a Markov chain: a spider walking on a triangle with corners labeled 1, 2, 3. At each tick of the clock, the spider moves from its present perch to one of the other two corners it’s connected to, chosen at random. So, the spider’s path would be a string of numbers

1, 2, 1, 3, 2, 1, 2, 3, 2, 3, 2, 1 …

Markov started with abstract examples like this, but later (perhaps inspiring Shannon?) applied this idea to strings of text, among them Alexander Pushkin’s poem *Eugene Onegin*. Markov thought of the poem, for the sake of math, as a string of consonants and vowels, which he laboriously cataloged by hand. Letters after consonants are 66.3 percent vowels and 33.7 percent consonants, while letters following vowels are only 12.8 percent vowels and 87.2 percent consonants.

So, you can produce “fake Pushkin” just as Shannon produced fake English; if the current letter is a vowel, the next letter is a vowel with probability 12.8 percent, and if the current letter is a consonant, the next one is a vowel with probability 66.3 percent. The results are not going to be very poetic; but, Markov discovered, they can be distinguished from the Markovized output of other Russian writers. Something of their style is captured by the chain.

Nowadays, the Markov chain is a fundamental tool for exploring spaces of conceptual entities much more general than poems. It’s how election reformers identify which legislative maps are brutally gerrymandered, and it’s how Google figures out which Web sites are most important (the key is a Markov chain where at each step you’re at a certain Web site, and the next step is to follow a random link from that site). What a neural net like GPT-3 learns—what allows it to produce uncanny imitation of human-written text—is a gigantic Markov chain that counsels it how to pick the next word after a sequence of 500, instead of the next letter after a sequence of two. All you need is a rule that tells you what probabilities govern the next step in the chain, given what the last step was.

You can train your Markov chain on your home library, or on *Eugene Onegin,* or on the huge textual corpus to which GPT-3 has access; you can train it on anything, and the chain will imitate that thing! You can train it on baby names from 1971, and get:

Kendi, Jeane, Abby, Fleureemaira, Jean, Starlo, Caming, Bettilia …

Or on baby names from 2017:

Anaki, Emalee, Chan, Jalee, Elif, Branshi, Naaviel, Corby, Luxton, Naftalene, Rayerson, Alahna …

Or from 1917:

Vensie, Adelle, Allwood, Walter, Wandeliottlie, Kathryn, Fran, Earnet, Carlus, Hazellia, Oberta …

The Markov chain, simple as it is, somehow captures something of the *style* of naming practices of different eras. One almost experiences it as creative. Some of these names aren’t bad! You can imagine a kid in elementary school named “Jalee,” or, for a retro feel, “Vensie.”

Maybe not “Naftalene,” though. Even Markov nods.

Go to Source

Author: Jordan Ellenberg {authorlink}

https://static.scientificamerican.com/sciam/assets/Image/newsletter/salogo.png Scientific American Content: Global http://rss.sciam.com/ScientificAmerican-Global