Book Review: Big Data Demystified

Last year, our economics department launched a data analytics minor program. The first class is a simple 2 credit course called Foundations of Data Analytics. Originally, the idea was that liberal arts majors would take it and that this class would be a soft, non-technical intro of terminology and history.

However, it turned out that liberal arts majors didn’t take the class and that the most popular feedback was that the class lacked technical challenge. I’m prepping to teach the class and it will have two components. A Python training component where students simply learn Python. We won’t do super complicated things, but they will use Python extensively in future classes. The 2nd component is still in the vein of the old version of the course.

I’ll have the students read and discuss “Big Data Demystified” by David Stephenson. He spends 12 brief chapters introducing the reader to the importance of modern big data management, analytics, and how it fits into an organization’s key performance indicators. It reads like it’s for business majors, but any type of medium-to-large organization would find it useful.

Davidson starts with some flashy stories that illustrate the potential of data-driven business strategies. For example, Target corporation used predictive analytics to advertise baby and pregnancy products to mothers who didn’t even know that they were pregnant yet. He wets the appetite of the reader by noting that the supercomputers that could play Chess or Go relied on fundamentally different technologies.

The first several chapters of the book excite the reader with thoughts of unexploited potentialities. This is what I want to impress upon the students. I want them to know the difference between artificial intelligence (AI) and machine learning (ML). I want them to recognize which tool is better for the challenges that they might face and to see clear applications (and limitations).

AI uses brute force, iterating through possible next steps. There are multiple online tic-tac-toe AI that keep track records. If a student can play the optimal set of strategies 8 games in a row, then they can get the general idea behind testing a large variety of statistical models and explanatory variables, then choosing the best.

But ML is responsive to new data, according to what worked best on previous training data. There are multiple YouTubers out there who have used ML to beat Super Mario Brothers. Programmers identify an objective function and the ML program is off to the races. It tries a few things on a level, and then uses the training rounds to perform quite well on new levels that it has never encountered before.

There are a couple of chapters in the middle of the book that didn’t appeal to me. They discuss the question of how big data should inform a firm’s strategy and how data projects should be implemented. These chapters read like they are written for MBAs or for management. They were boring for me. But that’s ok, given that Stephenson is trying to appeal to a broad audience.

The final chapters are great. They describe the limitations of big data endeavors. Big data is not a panacea and projects can fail for a variety of what are very human reasons.

Stephenson emphasizes the importance of transaction costs (though he doesn’t say it that way). Medium sized companies should outsource to experts who can achieve (or fail) quickly such that big capital investments or labor costs can be avoided. Or, if internals will be hired instead, he discusses the trade-offs between using open source software, getting locked in, and reinventing the wheel. These are a great few chapters that remind the reader that data scientists and analysts are not magicians. They are people who specialize and can waste their time just as well as anyone else.

Overall, I strongly recommend this book. I kinda sorta knew what machine learning and artificial intelligence were prior to reading, but this book provides a very accessible introduction to big data environments, their possible uses, and organizational features that matter for success. Mid and upper level managers should read this book so that they can interact with these ideas prudentially. Those with a passing interest in programming should read it for greater clarity and to get a better handle on the various sub-fields. Hopefully, my students will read it and feel inspired to be on one side or the other of the manager- data analyst divide with greater confidence, understanding, and a little less hubris.

Everyone’s an Expert: Easy Data Maps in Excel

I love data, I love maps, and I love data visualizations.

While we tend not to remember entire data sets, we often remember some patterns related to rank. Speaking for myself anyway, I usually remember a handful of values that are pertinent to me. If I have a list of data by state, then I might take special note of the relative ranking of Florida (where I live), the populous states, Kentucky (where my parents’ families live), and Virginia (where my wife’s family lives). I might also take special note of the top rank and the bottom rank. See the below table of liquor taxes by State. You can easily find any state that you care about because the states are listed alphabetically.

A ranking is useful. It helps the reader to organize the data in their mind. But rankings are ordinal. It’s cool that Florida has a lower liquor tax than Virginia and Kentucky, but I really care about the actual tax rates. Is the difference big or small? Like, should I be buying my liquor in one of the other states in the southeast instead of Florida? Without knowing the tax rates, I can’t make the economic calculation of whether the extra stop in Georgia is worth the time and hassle. So, the most useful small data sets will have both the ranking and the raw data. Maybe we’re more interested in the rankings, such as in the below table.

But, tables take time to consume. A reader might immediately take note of the bottom and top values. And given that the data is not in alphabetical order, they might be able to quickly pick out the state that they’re accustomed to seeing in print. But otherwise, it will be difficult to scan the list for particular values of interest.  

Continue reading

Inflation for Thee, But Temporarily Not for FL

On May 6, 2022, the governor of Florida, Ron DeSantis, signed House Bill 7071. The bill was touted as a tax-relief package for Floridians in order to ease the pains caused by inflation. In total, the bill includes $1.2 billion in forgone tax revenues by temporarily suspending sales taxes that are levied on a variety of items that pull at one’s heartstrings. Below is the list of affected products.

A minor political point that I want to make first is that the children’s items are getting a lot of press, but they are only about 18.4% of the tax expenditures. The tax break on hurricane windows and doors received 37% of the funds and gasoline is receiving another 16.7%. There are ~$150 million in additional sales, corporate, and ad valorem tax exemptions. Looking at the table, it seems that producers of hurricane windows and doors might be the biggest beneficiary and that that the children’s items are there to make the bill politically palatable. Regardless, this is probably not the best use of $1.2 billion.


There are at least three economic points worth making.

Continue reading

Not so Great Expectations

People have expectations about the world. When those expectations are violated, they usually change their behavior in order to account for the new information (on the margin at least). Does unexpected inflation affect people’s behavior? Of course. William Phillips thought so (the famous version of the Phillips Curve assumes constant inflation expectations).

Macroeconomists often separate the world into reals and nominals. Sometimes we produce more and other times we produce less. Those are the reals. The prices that we pay and the money that we spend are the nominals. There is what’s sometimes called a ‘loose joint’ between reals and nominals. That is, they do not move in tandem, nor are they entirely independent. If the Fed suddenly slows the growth of the money supply, then economic activity growth might also slow – but not by the same amount. In the long run, reals and nominals are largely independent. Whether we have 2% vs 3% annual inflation over the course of some decade is probably not important for our real output at the end of that decade.

It Takes Two to Tango.

It is often said that the Fed can achieve any amount of total spending in the economy that it prefers. It can achieve any NGDP. But, the Fed doesn’t control NGDP as a matter of fiat. The Fed changes interest rates and the money supply in order to change the total spending in our economy. Importantly, the effect of Fed policy changes is contingent on how the public reacts. After all, the Fed can increase the money supply. But it is us who decides how much to spend.

Continue reading

Post Pandemic Vacation Arbitrage

My wife traveled to Ireland with a friend after she graduated with her bachelor’s degree. She had lived in Europe as a child and had travelled for mission trips. But travelling to the Irish Republic as a young adult, for the singular purpose of celebration and leisure, made a big impact on my eventual wife and she recounted it for years.

Remember pre-Covid when life was so easy? Many of us had planned trips, for business and leisure, that were interrupted. By now, the vast majority of people are back to ‘normal’ (I think?). Classes are in-person, masks are largely optional, and there is no more line stretching out down the sidewalk near the Trader Joe’s. With all this normalcy, one might ask:

Where’s your next vacation?

Continue reading

The Economics of Good Gift Giving

This post was co-authored with a recent AMU Economics Graduate, Michael Maynard (Linkedin here). It is based on his senior thesis entitled “The Highest Virtue: Re-examining gift Giving and Deadweight Loss”

When my older sister was in middle school, she received a book of baby animal stories. She loved that book and read it every day. A couple of years later my mother accidentally donated it, and my sister was heartbroken. We went to the thrift store repeatedly that week hoping to encounter it before it sold, but we never found it. Years later, our father scoured the internet trying to find the lost book – to no avail.

Years after that, I stumbled onto the exact same copy of the book in the for-sale corner of a nearby library. For a single dollar and negligible effort, I purchased the book that had long frustrated my family’s searching. Shortly before the birth of her first child, I gave the book to my sister for Christmas. It was one of the best Christmas gifts she had ever received.

Economic theory typically assumes that individuals have perfect information. Therefore, they are best suited to purchase their own gifts. That’s what motivates the not-so-romantic economist prescription to give a gift card or cash for birthdays, Christmas, graduations, etc. The theory states that, if we do not intimately know the receiver’s preferences, then we have incomplete information and it’s better to give a money-gift rather than to give a gift from which the receiver would enjoy less additional utility.

Continue reading

Human Capital is Socially Contingent

The Deaf community is interesting.

Before I did research, I thought that deaf people simply could not hear. After seeing the Spiderman episodes that featured Daredevil, I believed that it was plausible and likely that deaf people had some sort of cognitive or sensory compensatory skill.

But it wasn’t until recently that I learned of the Deaf Studies field. There is an entire field that’s dedicated to studying deaf people. It’s related to, but not the same as Disability Studies. In fact, there are some sharp divisions between the two fields.

Continue reading

Children Are Not 3/55ths of a Person

In the past several years there has been increasing salience and support of pronatalist policies. Several people have turned to the IRS income tax code, which already includes some incentives regarding children. The Child Tax Credit (CTC), which lowers a person’s tax liability on a dollar-per-dollar basis, is the most obvious item that addresses children. The other tax credit is for child care expenses, but I won’t be focusing on that here.

Below are the 2021 marginal tax rate brackets and the standard deductions.  The standard deduction reduces the taxable income, and then the tax rates are applied.

After the tax liability is calculated, it’s reduced by any tax credits, such as the CTC. In 2021, households earned a credit of $3,600 for every child under 6 years old and $3,000 for every child under 18 years old.  Median household income in 2020 was $67,521.  That means that the tax liability was reduced by 5.3% – or 3/55ths – of median gross income. But, I have a problem with that.

Continue reading

Covid-19 Didn’t Break the Supply Chains. You Did.

This is my last post in a series that uses the AS-AD model to describe US consumption during and after the Covid-19 recession. I wrote about US consumption’s broad categories, services, and non-durables. This last one addresses durable consumption.

During the week of thanksgiving in 2020, our thirteen-year-old microwave bit the dust. NBD, I thought. Microwaves are cheap, and I’m willing to spend a little more in order to get one that I think will be of better quality (GE, *cough*-*cough*). So, I filtered through the models on multiple websites and found the right size, brand, and wattage. No matter the retailer, at checkout I learned that regardless of price, I’d be waiting a good two months before my new, entirely standard, and unexceptional microwave oven would arrive. I’d have to wait until the end of January of 2021.

¡Que Ridiculo!

Continue reading

AS-AD: From Levels to Percent

The aggregate supply & aggregate demand model (AS-AD) is nice because it’s flexible and clear. Often professors will teach it in levels. That is, they teach it with the level of output on one axis, and the price level on the other axis. This presentation is convenient for the equation of exchange, which can be arranged to reflect that aggregate demand (AD) is a hyperbola in (Y, P) space. Graphed below is the AD curve in 2019Q4 and in 2020Q2 using real GDP, NGDP, and the GDP price deflator.

The textbook that I use for Principles of Macroeconomics, instead places inflation (π) on the vertical axis while keeping the level of output on the horizontal axis. The authors motivate the downward slope by asserting that there is a policy reaction function for the Federal Reserve. When people observe high rates of inflation, state the authors, they know that the Fed will increase interest rates and reduce output. Personally, I find this reasoning to be inadequate because it makes a fundamental feature of the AS-AD model – downward sloping demand – contingent on policy context.

At the same time, I do think that it can be useful to put inflation on the vertical axis. Afterall, individuals are forward looking. We expect positive inflation because that’s what has happened previously, and we tend to be correct. So, I tell my students that “for our purposes”, placing inflation on the vertical axis is fine. I tell them that, when they take intermediate macro, they’ll want to express both axes as rates of change. I usually say this, and then go about my business of teaching principles.

But, what does it look like when we do graph in percent-change space?

Continue reading