data analytics

My students recently assembled a list of podcasts about data analytics that are a click away if this is a topic of interest.

The show they pulled from the most was In Machines We Trust produced by the MIT Technology Review.

“Attention Shoppers, You’re Being Tracked“

“Hired by an Algorithm” (“I would recommend this podcast to anyone who will be applying for a job in the near future.”)

“Encore: When an Algorithm Gets It Wrong”

“Can AI Keep Guns out of Schools“

“How retail is using AI to prevent fraud“

Other episodes, not from In Machines We Trust:

More or Less Behind the Statistics: Can we use maths to beat the robots? (Might be of interest to the folks who like to debate “new math” in schools.)

“How Data Science Enables Better Decisions at Merck“

Data Science at Home: State of Artificial Intelligence 2022

Emoji as a Predictor – Data skeptic

Data Skeptic: Data Science Hiring Process

“True Machine Intelligence just like the human brain” (Ep. 155)

“How to Thrive as an Early-Career Data Scientist” – Super Data Science

No matter how you feel about intelligent machines, you’ll be talking to them soon.

Talking to voice assistants right now feels stilted because it's slow, inaccurate, and you can't interrupt.

Widely-available high-quality fast ASR and TTS, paired with LLMs, is coming soon and will enable much more natural conversations. https://t.co/8UQP5YRZ5l
— Nat Friedman (@natfriedman) September 4, 2022

They are delivering food already.

This restaurant has several robots delivering food and drinks to tables. It’s strange at first but easy to get used to. The robots can tolerate people walking in front of them and mild harassment from children. pic.twitter.com/F47IcPqruU
— Joy Buchanan (@aboutJoy) September 3, 2022

Last year, our economics department launched a data analytics minor program. The first class is a simple 2 credit course called Foundations of Data Analytics. Originally, the idea was that liberal arts majors would take it and that this class would be a soft, non-technical intro of terminology and history.

However, it turned out that liberal arts majors didn’t take the class and that the most popular feedback was that the class lacked technical challenge. I’m prepping to teach the class and it will have two components. A Python training component where students simply learn Python. We won’t do super complicated things, but they will use Python extensively in future classes. The 2nd component is still in the vein of the old version of the course.

I’ll have the students read and discuss “Big Data Demystified” by David Stephenson. He spends 12 brief chapters introducing the reader to the importance of modern big data management, analytics, and how it fits into an organization’s key performance indicators. It reads like it’s for business majors, but any type of medium-to-large organization would find it useful.

Davidson starts with some flashy stories that illustrate the potential of data-driven business strategies. For example, Target corporation used predictive analytics to advertise baby and pregnancy products to mothers who didn’t even know that they were pregnant yet. He wets the appetite of the reader by noting that the supercomputers that could play Chess or Go relied on fundamentally different technologies.

The first several chapters of the book excite the reader with thoughts of unexploited potentialities. This is what I want to impress upon the students. I want them to know the difference between artificial intelligence (AI) and machine learning (ML). I want them to recognize which tool is better for the challenges that they might face and to see clear applications (and limitations).

AI uses brute force, iterating through possible next steps. There are multiple online tic-tac-toe AI that keep track records. If a student can play the optimal set of strategies 8 games in a row, then they can get the general idea behind testing a large variety of statistical models and explanatory variables, then choosing the best.

But ML is responsive to new data, according to what worked best on previous training data. There are multiple YouTubers out there who have used ML to beat Super Mario Brothers. Programmers identify an objective function and the ML program is off to the races. It tries a few things on a level, and then uses the training rounds to perform quite well on new levels that it has never encountered before.

There are a couple of chapters in the middle of the book that didn’t appeal to me. They discuss the question of how big data should inform a firm’s strategy and how data projects should be implemented. These chapters read like they are written for MBAs or for management. They were boring for me. But that’s ok, given that Stephenson is trying to appeal to a broad audience.

The final chapters are great. They describe the limitations of big data endeavors. Big data is not a panacea and projects can fail for a variety of what are very human reasons.

Stephenson emphasizes the importance of transaction costs (though he doesn’t say it that way). Medium sized companies should outsource to experts who can achieve (or fail) quickly such that big capital investments or labor costs can be avoided. Or, if internals will be hired instead, he discusses the trade-offs between using open source software, getting locked in, and reinventing the wheel. These are a great few chapters that remind the reader that data scientists and analysts are not magicians. They are people who specialize and can waste their time just as well as anyone else.

Overall, I strongly recommend this book. I kinda sorta knew what machine learning and artificial intelligence were prior to reading, but this book provides a very accessible introduction to big data environments, their possible uses, and organizational features that matter for success. Mid and upper level managers should read this book so that they can interact with these ideas prudentially. Those with a passing interest in programming should read it for greater clarity and to get a better handle on the various sub-fields. Hopefully, my students will read it and feel inspired to be on one side or the other of the manager- data analyst divide with greater confidence, understanding, and a little less hubris.