Data Analytics with R Textbook

For an advanced undergraduate analytics class for business school students, I use a textbook by Saltz and Stanton called

Data Science for Business with R (Amazon link)

This textbook teaches R and analytics at the same time. The professor does not have to provide a separate R curriculum or require students to buy a second book.

The running example in the textbook is an airline business scenario that is interesting and builds with the complexity of the subject matter. The authors provide the dataset that students can work with for the airline case study. Many examples in the textbook use data that is available online and therefor can be imported to R with just a few lines of code.

One semester is not enough time to cover every chapter in the book. I emphasize predictive analytics, so I skip the chapters on maps and shiny apps.

I do some supplemental lectures on concepts in predictive analytics before students reach the chapters on regression and decision trees. For example, overfitting is a new concept to undergraduates. I want them to have a more intuitive grasp of that subject before learning the R code to separate data into training and validation sets.

Note that these students have already taken what has traditionally been called Business Statistics, so they already understand basic descriptive statistics and graphing. The book is no substitute for that primary class.

There are free supplementary materials online for learning R. Students find message boards especially helpful in pinpointing answers for questions that come up while coding.

A Gauche Gift

We’ve written about gifts before.

  1. We’ve written posts recommending Christmas gifts (here and here and here and here).
  2. I wrote that gifts might be good for the macroeconomy.
  3. Joy Wrote about birthday presents at school parties.
  4. James wrote about considering supply chain status when ordering a gift.

Michael Maynard and I wrote about giving a good gift. A good gift is one in which the giver has an information advantage. Gifting an object or a service can provide a consumption bundle to the recipient that they didn’t know was even possible or that they didn’t know that they would prefer. They would have chosen the items themselves, if only they had known about them. Giving a gift card can be similar if the recipient did not know about the vendor previously. Cash is a good gift when the giver does not have an information advantage over the recipient.

In our previous post, we showed diagrammatically that ‘better off’ was indicated by the higher utility. But this spurs an important question:

Can good gifts cost the giver zero dollars?

Continue reading

Market Concentration & Inflation

We are living in volatile times. With covid-19, big federal legislation packages, and the Ruso-Ukrainian conflict disruptions to grain, seed oils, and crude oil, relative prices are reflecting sudden drastic ebbs of supply and demand. I want to make a small but enlightening point that I’ve made in my classes, though I’m not sure that I’ve made it here.

Economists often get a bad rap for being heartless or unempathetic. Sometimes, they are painted as ideologues who just disguise their pre-existing opinions in painfully specific terminology and statistics. Let’s do a litmus test.

Consider two alternative markets. One is a perfect monopoly, the other has perfect competition. All details concerning marginal costs to firms and marginal benefits to consumers are the same. In an erratic world, which market structure will result in greater price volatility for consumers? Try to answer for yourself before you read below. More importantly, what’s your reasoning?

Extreme Market Power

A distinguishing difference between a competitive market and a monopoly concerns prices. While firms maximize profits in both cases, the price that consumers face in a competitive market is equal to the marginal cost that the firms face. There is no profit earned on that last unit produced. In the case of monopoly, the price is above the marginal cost. Profits can be positive or negative, but the consumer will pay a price that is greater than the cost of producing the last unit.

Below are two graphs. Given identical marginal costs of production and benefits that the consumers enjoy, we can see that:

  1. The monopoly price is higher.
  2. The monopoly quantity produced is lower.

But static models only go so far. What about when there is volatility in the world?

Volatile Costs

Oil and gasoline are important inputs for producing many (most?) physical goods. Not only that, they are short-lived, meaning that they disappear once they are used, making them intermediate goods. Therefore, changes in the price of oil constitutes a change in the marginal cost for many firms. If the price of oil rises, or is volatile otherwise, then which type of market will experience greater price and quantity volatility?

Below are two figures that illustrate the same change in the marginal cost. We can see that:

  1. Monopoly price volatility is lower (in absolute terms and percent).
  2. Monopoly quantity produced volatility is lower (in absolute terms, though no different as a percent).

The take-away: While monopoly does constrict supply and elevate prices, Monopoly also reduces price and output volatility when there are changes in the marginal cost.  

Volatile Demand

That covers the costs. But what about volatile demand? A large part of the Covid-19 recession was the huge reallocation of demand away from in-person services and to remote services and goods. What is the effect of market power when people suddenly increase or decrease their demand for goods?

Below are two figures that illustrate the same change in demand. We can see that:

  1. Monopoly price volatility is higher (in absolute terms, though no different as a percent).
  2. Monopoly quantity produced volatility is lower (in absolute terms, though no different as a percent).

Monopolies Don’t Cause Inflation

Economists know that inflation can’t very well be blamed on greed (does less greed beget deflation?). Another problematic story is that market concentration contributes to inflation. But the above illustrations demonstrate that this narrative is also a bit silly. Monopolistic markets cause the price level to be higher, it’s true. But inflation is the change in prices. Changing market concentration might be a long term phenomenon, but can’t explain acute price growth. If demand suddenly rises, monopolies result in no more price growth than perfectly competitive markets. If the marginal cost of production suddenly rises, monopolies result in less price growth.

All of this analysis entirely ignores welfare. Also, no market is perfectly competitive or perfectly monopolistic. They are the extreme cases and particular markets lie somewhere in between.

Did you guess or reason correctly? Many econ students have a bias that monopolies are bad. So, in any side-by-side comparison, students think that “monopolies-bad, competition-good” is a safe mantra. But the above illustrations (which can be demonstrated mathematically) reveal that economic reasoning helps to reveal truths about the world. Economists are not simply a hearty band of kool-aid drinking academics.

Three Tips for More Effective Learning, from Andrew Watson

I just ran across a short article [1] summarizing a talk with some techniques on learning more efficiently, which seemed worth sharing here. It may be something for professors to pass along to their students.

The speaker was Andrew Watson, who is an expert on learning and the brain, and currently a teacher at the Loomis Chaffee School in Connecticut. He noted three key ways that students (and adults) can work with the ways the brain learns information. The last two points are good but well known, while the first point was not something I have seen emphasized much:

( 1 ) Retrieve information while studying:

To study better, students should focus on the idea of retrieval rather than review. Trying to recall information before looking back at it produces more remembering than simply reading it through again. He suggested creating flash cards and using visual hints and clues as effect retrieval techniques.

( 2 ) Change the environment to avoid distractions:

The environment in which someone studies also affects how well they retain information because the human brain works best when it focuses on one activity at a time.

(My comment: That is absolutely true for me, I can’t stand any distraction when I am studying or writing, but I know people who claim they study more effectively with a TV show or music going in the background…I wonder what academic studies show about that.)

( 3 ) Bolster your health:

The brain, like the rest of the body, benefits from a healthy lifestyle, including eating well and exercising regularly. Ample sleep helps the brain to process and solidify information absorbed during the day. If homework is everything that helps a person learn and if sleep help you learn, then sleep is a part of homework.

[1] “Brain Hacks for Brainiacs” in the Loomis Chaffee Magazine, Spring 2022, page 13.

Teaching with ACS regional data

If you are teaching a quantitative college course, then you have probably thought about where to get data that students can practice with.

Public Use Microdata Areas (PUMAs) are non-overlapping, statistical geographic areas that partition each state or equivalent entity into geographic areas containing no fewer than 100,000 people each. The image here shows PUMAs around Birmingham, AL. I created a dataset for my students that includes demographic data from the American Community Survey (ACS) for the region around our university.

For just about any topic you would teach in stats, I can create a mini assignment using data on the people around us. Any American metro area has clusters of high-income households and clusters of low-income households. One example of a an exercise is to create summary statistics on income by PUMA. Students will be surprised to learn the facts about their own city.

Zachary has blogged about how great IPUMS is. The way I obtained the data was to make a free account with IPUMS. If you asked for data on every American, you’ll end up with an unwieldy big file. The trick is to filter out all but a handful of PUMAs. I also recommend restricting it to just one year unless you are teaching time series techniques.

I originally got the idea from Matt Holian. Matt wrote fantastic book called Data and the American Dream. The book has data and R codes that allow you to reproduce the findings from several interesting econ papers that all use ACS data. I’m not teaching material that overlaps perfectly with Matt’s book, so I couldn’t assign it to my students, but I did borrow some elements of his idea and even (with his permission) some of his code.

Book Review: Big Data Demystified

Last year, our economics department launched a data analytics minor program. The first class is a simple 2 credit course called Foundations of Data Analytics. Originally, the idea was that liberal arts majors would take it and that this class would be a soft, non-technical intro of terminology and history.

However, it turned out that liberal arts majors didn’t take the class and that the most popular feedback was that the class lacked technical challenge. I’m prepping to teach the class and it will have two components. A Python training component where students simply learn Python. We won’t do super complicated things, but they will use Python extensively in future classes. The 2nd component is still in the vein of the old version of the course.

I’ll have the students read and discuss “Big Data Demystified” by David Stephenson. He spends 12 brief chapters introducing the reader to the importance of modern big data management, analytics, and how it fits into an organization’s key performance indicators. It reads like it’s for business majors, but any type of medium-to-large organization would find it useful.

Davidson starts with some flashy stories that illustrate the potential of data-driven business strategies. For example, Target corporation used predictive analytics to advertise baby and pregnancy products to mothers who didn’t even know that they were pregnant yet. He wets the appetite of the reader by noting that the supercomputers that could play Chess or Go relied on fundamentally different technologies.

The first several chapters of the book excite the reader with thoughts of unexploited potentialities. This is what I want to impress upon the students. I want them to know the difference between artificial intelligence (AI) and machine learning (ML). I want them to recognize which tool is better for the challenges that they might face and to see clear applications (and limitations).

AI uses brute force, iterating through possible next steps. There are multiple online tic-tac-toe AI that keep track records. If a student can play the optimal set of strategies 8 games in a row, then they can get the general idea behind testing a large variety of statistical models and explanatory variables, then choosing the best.

But ML is responsive to new data, according to what worked best on previous training data. There are multiple YouTubers out there who have used ML to beat Super Mario Brothers. Programmers identify an objective function and the ML program is off to the races. It tries a few things on a level, and then uses the training rounds to perform quite well on new levels that it has never encountered before.

There are a couple of chapters in the middle of the book that didn’t appeal to me. They discuss the question of how big data should inform a firm’s strategy and how data projects should be implemented. These chapters read like they are written for MBAs or for management. They were boring for me. But that’s ok, given that Stephenson is trying to appeal to a broad audience.

The final chapters are great. They describe the limitations of big data endeavors. Big data is not a panacea and projects can fail for a variety of what are very human reasons.

Stephenson emphasizes the importance of transaction costs (though he doesn’t say it that way). Medium sized companies should outsource to experts who can achieve (or fail) quickly such that big capital investments or labor costs can be avoided. Or, if internals will be hired instead, he discusses the trade-offs between using open source software, getting locked in, and reinventing the wheel. These are a great few chapters that remind the reader that data scientists and analysts are not magicians. They are people who specialize and can waste their time just as well as anyone else.

Overall, I strongly recommend this book. I kinda sorta knew what machine learning and artificial intelligence were prior to reading, but this book provides a very accessible introduction to big data environments, their possible uses, and organizational features that matter for success. Mid and upper level managers should read this book so that they can interact with these ideas prudentially. Those with a passing interest in programming should read it for greater clarity and to get a better handle on the various sub-fields. Hopefully, my students will read it and feel inspired to be on one side or the other of the manager- data analyst divide with greater confidence, understanding, and a little less hubris.

Everyone’s an Expert: Easy Data Maps in Excel

I love data, I love maps, and I love data visualizations.

While we tend not to remember entire data sets, we often remember some patterns related to rank. Speaking for myself anyway, I usually remember a handful of values that are pertinent to me. If I have a list of data by state, then I might take special note of the relative ranking of Florida (where I live), the populous states, Kentucky (where my parents’ families live), and Virginia (where my wife’s family lives). I might also take special note of the top rank and the bottom rank. See the below table of liquor taxes by State. You can easily find any state that you care about because the states are listed alphabetically.

A ranking is useful. It helps the reader to organize the data in their mind. But rankings are ordinal. It’s cool that Florida has a lower liquor tax than Virginia and Kentucky, but I really care about the actual tax rates. Is the difference big or small? Like, should I be buying my liquor in one of the other states in the southeast instead of Florida? Without knowing the tax rates, I can’t make the economic calculation of whether the extra stop in Georgia is worth the time and hassle. So, the most useful small data sets will have both the ranking and the raw data. Maybe we’re more interested in the rankings, such as in the below table.

But, tables take time to consume. A reader might immediately take note of the bottom and top values. And given that the data is not in alphabetical order, they might be able to quickly pick out the state that they’re accustomed to seeing in print. But otherwise, it will be difficult to scan the list for particular values of interest.  

Continue reading

Nudging Students to Choose a Major

In one sense, it seems like advice does not work. Advice is often ignored and sometimes even resented. People are going to just do what they want.

And yet, many people were in fact influenced by advice at some point in some situation. Many people can tell you about a mentor they spoke with or a book they read. Somehow, we do indeed need to learn about our environments and make choices about career and health and relationships. So, advice does work, sometimes.

A trivial example is why I stopped putting sugar in my coffee. A random anonymous message board post said that you should stop putting sugar in your coffee and your taste will adjust. “You won’t even miss it,” the anonymous poster told me. From that day forward, I stopped putting sugar in my coffee. I’m healthier and I don’t miss it. I was “nudged”. I was also predisposed to make this healthy decision, and I had sought out advice.

We might overestimate the effectiveness of advice because when people bother to talk about it, they mention the one time it affected them. First, they fail to mention the thousands of messages that had no effect (personally I still eat all kinds of junk food that contain sugar despite getting warnings to stop). And secondly, some decisions (perhaps including my coffee-sugar example) would have been made eventually without the advice event. Even recognizing those limitations, I still believe that messaging works sometimes.

It is tempting to think that, at almost zero cost, you could nudge people into making different decisions, just by sending them messages. There is a growing literature on this topic. Economists like myself are collecting data on whether it works.

One of these papers was just published:

Halim, Daniel, Elizabeth T. Powers, and Rebecca Thornton. 2022. “Gender Differences in Economics Course-Taking and Majoring: Findings from an RCT.” AEA Papers and Proceedings, 112: 597-602.

We implemented an RCT among undergraduate students enrolled in large introductory economics courses at the University of Illinois at Urbana Champaign. Two treatment arms provided encouragement to major in economics. A “prosocial” treatment provided information emphasizing the wide variety of career options and personal benefits associated with the major, while an “earnings” treatment provided information on financial returns. We evaluate the effects of the two treatments on subsequent choices to take another economics course and declaration of the economics major by the end of the student’s junior year using student-level matched administrative data. … Our primary aim is to evaluate whether women can be “nudged” into a major with low-cost, theoretically grounded, encouragement/information interventions.

Our primary sample consists of 1,976 students who were freshmen or sophomores during the focal course.

We find that the average male student receiving either treatment is more likely to take at least one more economics course after the focal course, but there is little evidence of increased majoring. The average woman appears unresponsive to either treatment.

Treated women with better than-expected focal-course performance are nudged to take an additional economics course. The likelihood that a woman takes another course in response to treatment increases by 5.6-5.9%-points with a favorable one-third- grade “surprise”. The hypothesis of treatment effects on women’s majoring, mediated or not, is rejected. Men’s susceptibility to treatment is invariant with respect to focal course performance.

Women did not demonstrate a bias towards a pro-social framing, and men did not demonstrate a bias towards a pro-earnings framing.

The pile of null results for messaging, when it is randomly assigned, is growing. It’s good to see null results get published though.

One of my current projects is related, but with a focus on computer programming instead of majoring in economics.

The Economics of Good Gift Giving

This post was co-authored with a recent AMU Economics Graduate, Michael Maynard (Linkedin here). It is based on his senior thesis entitled “The Highest Virtue: Re-examining gift Giving and Deadweight Loss”

When my older sister was in middle school, she received a book of baby animal stories. She loved that book and read it every day. A couple of years later my mother accidentally donated it, and my sister was heartbroken. We went to the thrift store repeatedly that week hoping to encounter it before it sold, but we never found it. Years later, our father scoured the internet trying to find the lost book – to no avail.

Years after that, I stumbled onto the exact same copy of the book in the for-sale corner of a nearby library. For a single dollar and negligible effort, I purchased the book that had long frustrated my family’s searching. Shortly before the birth of her first child, I gave the book to my sister for Christmas. It was one of the best Christmas gifts she had ever received.

Economic theory typically assumes that individuals have perfect information. Therefore, they are best suited to purchase their own gifts. That’s what motivates the not-so-romantic economist prescription to give a gift card or cash for birthdays, Christmas, graduations, etc. The theory states that, if we do not intimately know the receiver’s preferences, then we have incomplete information and it’s better to give a money-gift rather than to give a gift from which the receiver would enjoy less additional utility.

Continue reading

Human Capital is Socially Contingent

The Deaf community is interesting.

Before I did research, I thought that deaf people simply could not hear. After seeing the Spiderman episodes that featured Daredevil, I believed that it was plausible and likely that deaf people had some sort of cognitive or sensory compensatory skill.

But it wasn’t until recently that I learned of the Deaf Studies field. There is an entire field that’s dedicated to studying deaf people. It’s related to, but not the same as Disability Studies. In fact, there are some sharp divisions between the two fields.

Continue reading