Automating History’s First Draft  

Automating History’s First Draft  

As 2019 draws to a close, prepare for endless roundups of the year’s most important news stories. But few of those stories may be remembered by 2039: new research shows the difficulty of predicting which events will make the history books.

Philosopher Arthur Danto argued in 1965 that even the most informed person, an “ideal chronicler,” cannot judge a recent event’s ultimate significance because depends on chain reactions that have not happened yet. Duncan Watts, a computational social scientist at the University of Pennsylvania, had long wanted to test Danto’s idea. He got his chance when Columbia University historian Matthew Connelly suggested analyzing a set of two million declassified State Department cables sent between 1973 and 1979, along with a compendium of the 0.1 percent of them that turned out to be the most historically important (compiled by historians decades after their transmission).

Connelly, Watts and their colleagues first scored each cable’s “perceived contemporaneous importance” (PCI), based on metadata such as how urgent or secret had been rated. This score corresponded only weakly with inclusion in the later compendium, they reported in September in Nature Human Behaviour: the highest-scoring cables were only four percentage points more likely to be included than the lowest-scoring ones. The most common prediction errors were false positives—cables that got high scores but later proved unimportant. “I do think there’s a kind of narcissism of the present,” Connelly says. “I’ve been struck by how many times sports fans say, ‘That’s one for the history books.’”

Next, Watts says, to approximate an ideal chronicler, the scientists decided to “build the beefiest, fanciest machine-learning model we could and throw everything into —all the metadata, all the text.” The resulting AI algorithm significantly outperformed humans’ contemporaneous judgment. In one statistical measure of its ability to pick out cables later deemed significant, where 1 denotes no incorrect inclusions or exclusions, it scored 0.14, whereas the PCI scored 0.05. Although the algorithm’s performance was far from perfect, the researchers suggest that such an “artificial archivist” could help to narrow the field of events to highlight for posterity. When tuned for this purpose, their model weeded out 96 percent of the cables while retaining 80 percent of those that wound up in the compendium.

Emily Erikson, a sociologist at Yale University, who was not involved in the new research, says that despite its use of imperfect data—compendium inclusion was up to the subjective judgment of a few historians, for example—the study offers a practical tool and Danto’s hypothesis. “To see a machine-learning empirical test of this conceptual puzzle is really exciting,” she says, “and just kind of fun to think through.”