There have been a lot of popular papers in the past decade or so that make use of textual analysis. A fun one is “The Mainstreaming of Marx” by Magness & Makovi. They use Google Ngram to analyze the popularity of people mentioned in books and determine when Karl Marx became popular. “Measuring Economic Policy Uncertainty” by Baker, Bloom, & Davis is one of my favorites. They use set theory to detect terms in newspapers that denote economic policy uncertainty. In this post, I’m just going to describe practical differences between the two data sources and how the interpretations differ.
Ngram
Ngram measures takes a term and measures how popular that term is in its corpus of book text, which is about 6% of all books ever written (in English, anyway). Because popularity is expressed as a percent, we can make direct popularity level comparisons among words. For example: “Cafe” & “Coffee Shop”. In the figure below, we can see that the word “cafe” was more popular in books until very recently.
What does it mean? Well, in this case, the difference is a bit trivial. The corpus doesn’t include periodicals or any measure of the number of people who read these words. But, because books take a lot of costs and time to write, produce, and distribute (non-digital books anyway), the the meaning of these terms are deeper in the culture and not merely fleeting. Further, academic people write non-fiction, which means that specialist ideas are detected too (assuming that they are in at least 40 books).

Newspapers
Services such as newspapers.com and Proquest do something similar for periodicals (omitting books). For newspapers, authors tend to measure the number of articles that contain certain terms. But, due to audience, specialization, and total article count differences among newspapers, an index must be created for each set of terms such that the levels are not comparable.* But, the results are series that can be compared in terms of percent change. We can’t measure the popularity of ideas, but we can measure the relative change in popularity of ideas. (Again, we’re missing the number of eyeballs on each article.) Most indices are constructed at the monthly level. Below are two indices comparing the relatively change in “café” and “coffee shop” over time.
What does it mean? It’s pretty well documented that newspapers adjust their content to fit the interests and preferences of their readers. These tend to be entertainment, real shocks, policy, and financial news. It’s also well documented that the precision of the salience is relevant down to a day or two. So, newspaper mentions tend to measure more immediate salience. Except for entertainment, this salience has clear economic motivations for consumer interest. What we see in the below figure is that something happened around 1920 (*cough-cough*, prohibition, *cough*) that caused a substantial interest and use in the term “Coffee Shop”, and that it began a decades long rise in the term’s relevance. That crash in values at 1933? That’s when prohibition ended (Cafe also increased at the beginning of prohibition).

Note that the Ngram can tell us that ‘café’ was always more popular than “Coffee shop”. But, the newspapers can give use much finer detail, being monthly rather than annual data. Even the above figure uses a 3-month moving average in order to reduce the noise. That frequency reflects the different meaning of the sources. We read newspapers today or this week. We dwell on the ideas in books for eras.
* The procedure is to count the proportion of articles that satisfy the search criteria for each period. Then standardize the results by newspaper. Average across newspapers, and then force the index average to mimic an arbitrary number, say 100. One could also force them to the same value in a particular period.