Book Review: Big Data Demystified

Last year, our economics department launched a data analytics minor program. The first class is a simple 2 credit course called Foundations of Data Analytics. Originally, the idea was that liberal arts majors would take it and that this class would be a soft, non-technical intro of terminology and history.

However, it turned out that liberal arts majors didn’t take the class and that the most popular feedback was that the class lacked technical challenge. I’m prepping to teach the class and it will have two components. A Python training component where students simply learn Python. We won’t do super complicated things, but they will use Python extensively in future classes. The 2nd component is still in the vein of the old version of the course.

I’ll have the students read and discuss “Big Data Demystified” by David Stephenson. He spends 12 brief chapters introducing the reader to the importance of modern big data management, analytics, and how it fits into an organization’s key performance indicators. It reads like it’s for business majors, but any type of medium-to-large organization would find it useful.

Davidson starts with some flashy stories that illustrate the potential of data-driven business strategies. For example, Target corporation used predictive analytics to advertise baby and pregnancy products to mothers who didn’t even know that they were pregnant yet. He wets the appetite of the reader by noting that the supercomputers that could play Chess or Go relied on fundamentally different technologies.

The first several chapters of the book excite the reader with thoughts of unexploited potentialities. This is what I want to impress upon the students. I want them to know the difference between artificial intelligence (AI) and machine learning (ML). I want them to recognize which tool is better for the challenges that they might face and to see clear applications (and limitations).

AI uses brute force, iterating through possible next steps. There are multiple online tic-tac-toe AI that keep track records. If a student can play the optimal set of strategies 8 games in a row, then they can get the general idea behind testing a large variety of statistical models and explanatory variables, then choosing the best.

But ML is responsive to new data, according to what worked best on previous training data. There are multiple YouTubers out there who have used ML to beat Super Mario Brothers. Programmers identify an objective function and the ML program is off to the races. It tries a few things on a level, and then uses the training rounds to perform quite well on new levels that it has never encountered before.

There are a couple of chapters in the middle of the book that didn’t appeal to me. They discuss the question of how big data should inform a firm’s strategy and how data projects should be implemented. These chapters read like they are written for MBAs or for management. They were boring for me. But that’s ok, given that Stephenson is trying to appeal to a broad audience.

The final chapters are great. They describe the limitations of big data endeavors. Big data is not a panacea and projects can fail for a variety of what are very human reasons.

Stephenson emphasizes the importance of transaction costs (though he doesn’t say it that way). Medium sized companies should outsource to experts who can achieve (or fail) quickly such that big capital investments or labor costs can be avoided. Or, if internals will be hired instead, he discusses the trade-offs between using open source software, getting locked in, and reinventing the wheel. These are a great few chapters that remind the reader that data scientists and analysts are not magicians. They are people who specialize and can waste their time just as well as anyone else.

Overall, I strongly recommend this book. I kinda sorta knew what machine learning and artificial intelligence were prior to reading, but this book provides a very accessible introduction to big data environments, their possible uses, and organizational features that matter for success. Mid and upper level managers should read this book so that they can interact with these ideas prudentially. Those with a passing interest in programming should read it for greater clarity and to get a better handle on the various sub-fields. Hopefully, my students will read it and feel inspired to be on one side or the other of the manager- data analyst divide with greater confidence, understanding, and a little less hubris.

Are Special Elections Special?

While the United States does have its problems with democracy, one area where we shine is direct democracy. Rare at the federal level, at the state and local level direct democracy is quite common in the US, much more so than most other democracies (Switzerland also stands out). Almost half the states have some form of citizen initiative or referendum process, and it is used frequently in most of those states. But even more direct democracy takes place at the local level.

And much of that direct democracy at the local level takes place through what are called special elections. I’m not talking about elections to fill unexpected vacancies in office — though of course those do happen. I’m talking about actual voting on issues. Many of these issues revolve around questions of public finance: whether to raise a local sales tax, to approve a property tax millage, or to issue bonds for a capital project.

One very relevant example for me is an upcoming special election in my city of Conway, Arkansas. Citizens are being asked to approve the issuing of bonds to construct a community center, pool, soccer fields, and some other amenities. The bonds would be secured by a tax on restaurants. The tax already exists — city councils can put these in place without a public vote. But to issue bonds, the citizens must be asked. I wrote an op-ed about it in my local paper (if that is gated, try this blog post).

The key is that this is a special election. There are no other issues on the ballot. It takes place on February 8th, not a date that probably stands out in voters minds as an election date. What will this special election mean for voter turnout? A lot of academic research, including a paper that I wrote (currently under review, but summarized here), finds clear evidence that voter turnout will be much lower. Will the result be different? Again, a lot of evidence suggests yes. For example, property tax elections in Louisiana were less likely to pass with higher turnout, and less likely to pass in a general election (my research finds a similar result for sales tax elections in Arkansas).

But why are tax increases less likely to pass in special elections? On this question there are many theories, but they are hard to test. Is it because different kinds of voters show up at special elections, representing a different sample of the population? Possibly, but evidence is hard to find.

A new paper just published in the American Political Science Review sheds some light on these questions.

Continue reading