I’m going to teach text mining in the upcoming week. Most of my students have never heard of it. We have spent the semester talking about what do to with structured data, which includes some of the basic concepts from traditional statistics.
I often ask them to think about what computers can do. We talk about why “data analytics” classes are happening in 2020 and did not happen in 1990. Hardware and software innovations have expanded the boundaries of what computers can do for us.
The gritty details of how text mining works can make for a boring lecture, so I’m going to use the following narrative to get intellectually curious students on board. It always helps to start with fighting Nazis. Alan Turing helps defeat the Nazis by using a proto-computer to crack codes. The same brilliant Turing was smart enough to realize that computer could play chess someday (acknowledgement for me knowing that trivia: Average is Over). Turing didn’t live to see computers beat humans in chess but, in a sense, it didn’t take very long. Only about 50 years later, computers beat humans at chess.
Maybe chess is exactly the kind of thing that is hard for humans and easy for computers. When we discuss basic data mining, I tell students to think about how computers can do simple calculations much faster than humans can. It’s their comparative advantage.
Could Turing ever have imagined that a human seeking customer service from a bank could chat with a bot? Maybe text mining is a big advance over chess, but it only took about one decade longer for a computer (developed by IBM) to beat a human in Jeopardy. Winning Jeopardy requires the computer to get meaning from a sentence of words. Computers have already moved way beyond playing a game show to natural language processing.
How computers make sense of words starts with following simple rules, just as computer do to perform data mining on a spreadsheet of numbers. As I explain those rules to my students this week, I’m hoping that starting off the lecture with fighting Nazis will help them persevere through the algorithms.