Intro to Textual Indices: Ngrams & Newspapers

There have been a lot of popular papers in the past decade or so that make use of textual analysis. A fun one is “The Mainstreaming of Marx” by Magness & Makovi. They use Google Ngram to analyze the popularity of people mentioned in books and determine when Karl Marx became popular.  “Measuring Economic Policy Uncertainty” by Baker, Bloom, & Davis is one of my favorites. They use set theory to detect terms in newspapers that denote economic policy uncertainty. In this post, I’m just going to describe practical differences between the two data sources and how the interpretations differ.

Ngram

Ngram measures takes a term and measures how popular that term is in its corpus of book text, which is about 6% of all books ever written (in English, anyway). Because popularity is expressed as a percent, we can make direct popularity level comparisons among words. For example: “Cafe” & “Coffee Shop”. In the figure below, we can see that the word “cafe” was more popular in books until very recently.

Continue reading