Introducing Students to Text Mining II

In the Fall of 2020, I blogged about how I introduce students to text mining, as part of a data analytics class.

Could Turing ever have imagined that a human seeking customer service from a bank could chat with a bot? Maybe text mining is a big advance over chess, but it only took about one decade longer for a computer (developed by IBM) to beat a human in Jeopardy. Winning Jeopardy requires the computer to get meaning from a sentence of words. Computers have already moved way beyond playing a game show to natural language processing.

https://economistwritingeveryday.com/2020/11/07/introducing-students-to-text-mining/

I told the students that “chat bots” are getting better and NLP is advancing. By July 2020, OpenAI had released a beta API playground to external developers to play with GPT-3, but I did not sign up to use it myself.

In April of 2022, I added some slides inspired by Alex’s post about the Turing Test that included output from Google’s Pathway Languages Model. According to Alex, “It seems obvious that the computer is reasoning.”

This week in class, I did something that few people could have imagined 5 years ago. I signed into the free new GPTChat function in class and typed in questions from my students.

We started with questions that we assumed would be easy to answer:

Then we were surprised that it answered a question we had thought would be difficult:

And then we asked two questions that prompted the program to hedge, although for different reasons.

It seems like the model is smarter than it lets on. For now, the creators are trying hard not to offend anyone or get in the way of Google’s advertising business. Overall, the quality of the answers are high.

Because of when I was born, I believe that something I have published will make it into the training data for these models. Will that turn out to be more significant than any human readers we can attract?

Of course, GPT can still make mistakes. I’m horrified by this mischaracterization of my tweets:

Fight for $15? $25? $40?

Remember the “Fight for $15”? It’s a 10-year-old movement to raise the federal minimum wage to $15 per hour. While there hasn’t been any increase in the federal minimum wage since the movement began in 2012, plenty of states and localities have done so.

I won’t rehash the entire debate on the minimum wage here, but I will point you to this post from Joy on large minimum wage changes, and here are several other posts on this blog on the same topic. But lately I have seen an increasing call for even larger minimum wage increases, well beyond $15.

A prominent recent call for a higher wage comes from the SEIU, the second largest labor union in the nation. They are calling for a $25 minimum wage in Chicago, where the legal minimum wage just recently crossed $15 last year. Again, without getting into the detailed debates about the economics of the minimum wage, we can recognize that this would be a massively high minimum wage, given that median hourly wage for the Chicago MSA was $22.74 in May 2021. It’s certainly a bit higher in 2022, and the city of Chicago is probably a bit higher than the entire MSA. Still, we are talking about a minimum wage that would cover roughly half the workforce. Well, at least half the current workforce. The negative employment effects would potentially be large.

Here I will dabble a little bit in the minimum wage literature. One of the most famous recent papers that suggests increasing the minimum wage doesn’t have large negative employment effects is a 2019 paper by Cengiz, et al. This paper only looks at legal minimum wages that go up to 59% of the median market wage, which is the highest wages have been pushed up so far. By contrast, that $25 minimum wage in Chicago would be somewhere around 100% (!) of the local median market wage. That’s huge, and goes far beyond what even the most sympathetic-to-the-minimum-wage research has looked at.

But here’s the most recent minimum wage call that really takes the cake: over $40 per hour in Hawaii. That comes from, in a way, a Tweet from Hal Singer:

Now in fairness, he doesn’t exactly call for a $40 minimum wage in Hawaii, but he does say we should use the minimum wage as a tool to address homelessness, and then points to a study showing that you would need to earn $40/hour in Hawaii to afford a two-bedroom apartment. That’s pretty close. The median wage in Hawaii? About $23 in May 2021. In fact, the 75th percentile wage in Hawaii was $36.50 in 2021! So, depending on exactly how much wage growth there has been in Hawaii since May 2021, we are likely talking about a $40 minimum wage covering 75% of the workforce! That would likely have some “bite,” as economists say.

Thanksgiving Dinner is Once Again More Expensive (But Not the Most Expensive Ever)

Last year inflation hadn’t quite hit the levels we would see in 2022, but they were already rising. When Thanksgiving rolled around, many media sources were reporting that it was the “most expensive Thanksgiving ever.” In nominal terms that was true, though in nominal terms it isn’t that surprising. In a post last year, I compared the prices of Thanksgiving dinners (using the same data from Farm Bureau) to median earnings going back to 1986. While 2021 was more expensive the 2020, it turned out it was still the second lowest it had been since 1986.

As you might expect, this year’s Thanksgiving dinner is even more expensive than last year in nominal terms. It’s up about 20% since last year or over $10 more, according to Farm Bureau. That’s certainly more than the overall rate of inflation (7.7% in the past 12 months) and more than inflation for groceries (12.4% in the past 12 months). But how does that compare with median wages? Comparing the 3rd quarter of this year with the same quarter in 2021, median wages are only up about 7%, certainly not enough to keep up with those rising turkey prices.

When we add 2022 to the historical chart, here’s what it looks like.

The spike in the last 2 years is clear in the chart but notice that at about 6% of median weekly earnings, we have essentially returned to the average level of the entire series. From 2017-2021, we could be thankful that the price of your Thanksgiving dinner had dropped below that 6% level. We’ll have to find something else to be thankful for this year.

The Unimportance of Inflation: Stocks & Flows

One of my specializations in graduate school at George Mason University was monetary theory. It included two classes taught by Larry White who specializes in free-banking, Austrian macroeconomics, and monetary regimes. Separately, my dad was a libertarian and I’ve attended multiple Students for Liberty events. Right now, I’m writing from my hotel room at a Catholic/Crypto conference, where I learned that the deepest trench in Dante’s Inferno includes money debasers.

Everything about my pedigree suggests that I should have a disdain for the Federal Reserve and cast a wistful gaze toward the perpetually falling value of the US dollar. But I don’t. I certainly do have opinions about what the Fed should be doing and how our monetary system could work. But I’m not excited by the long-run depreciation of the dollar.

Let me tell you why.

Learning a little bit of theory is a dangerous thing. Monetary theory is especially hard because we examine the non-good side of the transaction: the medium of exchange. In frantic excitement, enthusiasts often point out that the value of the dollar has lost very much of its value in the past 100 years. They describe that loss is by describing the lower quantity of something that a dollar can purchase now versus what it could have purchased historically. That information is incapsulated in the price of a good. The price of a good is the number of dollars that one must exchange in order to purchase the good. Similarly, the price of a dollar is the number of goods that one must give up in order to purchase the dollar.

We can consider a variety of goods. Below is a graph that describes the quantity price of the dollar where the quantities are CPI basket units, gold, and housing. In the 35 years following 1986, a single dollar purchases 60% less of the consumer basket, 74% fewer houses (not quality adjusted), and 76% less gold.

Continue reading

New Data: State Regulatory Procedures

Released this April, but I just heard about it today. Researchers did the painstaking work of going through all 50 states to determine which steps must be taken in each state before new regulations can take effect. For instance, it turns out half of states require economic analysis for new regulations, and half don’t. The paper is here: https://www.mercatus.org/publications/regulation/50-state-review-regulatory-procedures

The Price of Food: Farm to the Table

If you’re like me, then you are very fond of food. What determines the price of food? Supply and demand of course!

We can consider food as a commodity because just about anyone can buy and sell it. Almost all foods have partial substitutes. Therefore, the long-run price in the competitive market for food is largely dictated by the marginal cost. Demand has an impact on the price only in the short run.

A long-run driver of food prices are the costs that food producers face. The US Bureau of Labor Statistics divides the Producer Price Index into multiple categories that are relevant for a variety of sectors and points within the production process. Below is a table of the most fundamental, relatively unprocessed farm products and their weight among all farm products in December 2021. Cotton is a relatively large component for farm products even though it’s not a food and I include it for completeness. Fruits, veggies, and nuts makeup the overwhelming proportion of the cost of farm products. I was at first surprised that grains composed such a small proportion. But, being dirt cheap, it makes sense.

We all know that inflation has been in the news. It’s been elevated since the second quarter of 2021. Consumer prices tend to lag producer prices. One indicator of where food prices will be in the near future is where the producer prices are now. Below is a graph that displays the above seasonally adjusted farm product prices since the start of 2021*.

Continue reading

Two Types of News: Elections vs Crashes

Some events are like elections: it was obvious that some big political news would break on Election Day, we just had to wait to find out what exactly would happen. Others are like market crashes: you might know in principle they’re a thing that can happen, but you don’t really expect any particular day to be the day one happens, so they seem to come out of the blue. As it turns out, for one of the largest crypto exchanges the day of the crash also happened to be Election Day.

FTX.com is facing a bank run sparked by competitor Binance tanking the price of the token that backed some of their assets. Customers are having issues withdrawing their money, Binance has withdrawn its offer to bail out FTX by taking them over, and bankruptcy seems likely. Supposedly this doesn’t affect Americans using FTX US, but I’d be nervous about any funds I had there, or indeed with funds in any centralized crypto exchange or stablecoin (Tether and even USDC seem to be having issues holding their pegs). All this was especially shocking because many considered FTX founder Sam Bankman-Fried one of the most trustworthy people in the often sketchy world of crypto. He was always meeting with US regulators and lawmakers, and seems not to be motivated by greed; he had already begun to give away his fortune at scale.

After any surprising event like this, some people claim it was actually obvious and they saw it coming (despite usually never having said so beforehand), while others start looking back for warning signs they missed. The most interesting one is something that shocked me when I first heard it March, but I never considered the risk it implied for FTX until the crash:

Going forward, red flags to watch out for seem to be topping a list of youngest billionaires (as Elizabeth Holmes also did) and buying naming rights to a stadium.

In contrast to this crash, the election happened right when we all expected, and at least largely how I expected. Like markets, I underestimated Democrats a bit; polls overall were impressively accurate this year, though they of course missed on some particular races. Votes are still being counted, and as of now we don’t even know for sure which party will control Congress (PredictIt currently gives Democrats a 90% chance in the Senate and a 20% chance in the House). But here are some early attempts to assess forecast accuracy. As I said, some polls were quite good:

Some polls weren’t so good, which means its important to weight better pollsters more heavily when you aggregate them. Some attempts at that were also quite good:

Oddly, some no money (Metaculus) / play money (Manifold Markets) forecasting sites seem to have done better than the real-money prediction sites:

The Sins of TikTok, Part 1: Extreme Privacy Theft by China-Based Company

Social media apps are nosy by nature; it is no secret that their main business model is to snoop out information about you, the user, and package and sell that information to advertisers who can target you. But there is one wildly popular app which goes beyond the norms of intrusiveness and privacy invasion AND is targeted largely at children and adolescents AND is based in China and thus is subject to Big Brother’s request for any and all data. That app is TikTok.

To avoid a bunch of re-wording, I will largely share excerpts from “ The Privacy Risks of TikTok – Why This Invasive App is So Dangerous “ by Priscilla Sherman at VPNOverview. Other articles echo her concerns with TikTok:

TikTok is an extremely popular social media video app owned by the Chinese tech company ByteDance. On TikTok, users can create and share short-form videos using a variety of filters and effects. The platform is full of dancing, comedy, and other entertaining videos….

Several agencies and news outlets are now sounding the alarm and reporting on the many problems that have surfaced. ByteDance claims to want to break away from its Chinese background in order to serve a global audience and says it will never share data with the Chinese government. This claim, however, seems impossible now that new security laws have been introduced in Hong Kong.

TikTok’s user base mostly consists of children and adolescents, which many consider to be vulnerable groups. This is a main reason for different authorities to express their worries. However, it isn’t just the youth that might be in danger from TikTok. From December 2019 onwards, U.S. military personnel were no longer allowed to use TikTok, as the app was considered a ‘cyber threat’…

[Hacker group] Anonymous has published a video listing the many dangers of TikTok. They quote a source that has done extensive research on TikTok: “Calling it an advertising platform is an understatement. TikTok is essentially malware that is targeting children. Don’t use TikTok. Don’t let your friends and family use it. Delete TikTok now […] If you know someone that is using it, explain to them that it is essentially malware operated by the Chinese government running a massive spying operation.”

These claims fit in with the recent developments surrounding TikTok. For example, Apple researchers announced that TikTok deliberately spies on users.

Claims keep piling up, showing that TikTok is a very invasive application that poses a substantial privacy risk. It seems that the data collection at TikTok goes much further than other social platforms such as Facebook or Instagram. This is surprising, since both of these companies have already faced backlash for the way they’ve dealt with user privacy. TikTok seems to collect data on a much larger scale than other social media platforms do. This, combined with TikTok’s origins makes it quite plausible that the Chinese government has insight into all of this collected data…..

Research from a German data protection website has revealed that TikTok installs browser trackers on your device. These track all your activities on the internet. According to ByteDance, these trackers were put in place to recognize and prevent “malicious browser behavior”. However, they also enable TikTok to use fingerprinting techniques, which give users a unique ID. This enables TikTok to link data to user profiles in a very targeted way.

Unfortunately, this happens with a great disregard of privacy – perhaps intentionally so. The German researchers indicate, for example, that IP addresses aren’t anonymized when TikTok uses Google Analytics, meaning your online behavior is directly linked to your IP address. An IP address provides information about your location and, indirectly, about your identity…

A user on Reddit used reverse engineering to figure out more about TikTok. Anonymous quoted the results in the video we mentioned earlier. The Reddit user discovered that TikTok collects all kinds of information:

  • Your smartphone’s hardware (CPU type, hardware IDs, screen size, dpi, memory usage, storage space, etc.);
  • Other apps installed on your device;
  • Network information (IP, local IP, your router’s MAC address, your device’s MAC address, the name of your Wi-Fi network);
  • Whether your device was rooted/jailbroken;
  • Location data, through an option that’s turned on automatically when you give a post a location tag (only happens on some versions of TikTok);

Additionally, the app creates a local proxy server on your device, which is officially used for “transcoding media”. However, this is done without any form of authentication, making it susceptible to misuse….

We asked investigative journalist and writer Maria Genova about her vision on TikTok. … Genova says: There’s a reason several countries have banned it. It’s unbelievable how much information an app like that pulls from your phone”…

TikTok needs access to your camera and microphone in order to work properly… However, there aren’t any specifications explaining how exactly these permissions are used. Therefore, TikTok could theoretically record conversations and sounds using your microphone, even when you aren’t filming a TikTok video.

We could go on and on with the technical details here, but you get the point. The fact that “IP addresses aren’t anonymized“ is really a big, bad deal. The article concludes:

The current findings and concerns surrounding TikTok are reason enough for us [the staff at VPNOverview] to remove the app from our devices. Whether TikTok’s main target group – young people between 14 and 25 – is sensitive to the privacy concerns that have come to light, remains to be seen.

Indeed.

One more quote , from Brendan Carr of the U.S. Federal Communications Commission (FCC), regarding the reliability of TikTok’s claims that they do not share data with the Chinese government:

“China has a national security law that compels every entity within its jurisdiction to aid its espionage and what they view as their national security efforts,” Carr said earlier this year, alluding to the fact that Chinese companies must make all the data they collect available to the Chinese Communist Party (CCP).

Stay tuned for Part 2, dealing with some larger market ramifications of TikTok’s evasion of  Apple and Android privacy protections.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

This just in from BuzzFeed (added to original post here):

“Leaked Audio From 80 Internal TikTok Meetings Shows That US User Data Has Been Repeatedly Accessed From China”

For years, TikTok has responded to data privacy concerns by promising that information gathered about users in the United States is stored in the United States, rather than China, where ByteDance, the video platform’s parent company, is located. But according to leaked audio from more than 80 internal TikTok meetings, China-based employees of ByteDance have repeatedly accessed nonpublic data about US TikTok users — exactly the type of behavior that inspired former president Donald Trump to threaten to ban the app in the United States.

The recordings, which were reviewed by BuzzFeed News, contain 14 statements from nine different TikTok employees indicating that engineers in China had access to US data between September 2021 and January 2022, at the very least. Despite a TikTok executive’s sworn testimony in an October 2021 Senate hearing that a “world-renowned, US-based security team” decides who gets access to this data, nine statements by eight different employees describe situations where US employees had to turn to their colleagues in China to determine how US user data was flowing. US staff did not have permission or knowledge of how to access the data on their own, according to the tapes.

“Everything is seen in China,” said a member of TikTok’s Trust and Safety department in a September 2021 meeting.

A Dragonfly’s View of Election Day 2022

This is my last post before the US midterm elections on Tuesday, so I’ll leave you with a prediction for what’s coming.

Who is the best predictor of elections? Nate Silver at FiveThirtyEight has had a pretty good run since 2008 using weighted polls. Ray Fair, an economics professor at Yale has a venerable and well-credentialed model based on fundamentals. I typically favor prediction markets, because they incorporate a wide range of views weighted by how willing people are to put their money where their mouth is, and traders are able to incorporate other sources of information (including predictors like FiveThirtyEight). But which prediction market should we trust? There are now many large prediction markets, and the odds often differ substantially between them.

When there are many reasonable ways of answering a question or looking at a problem, it can be hard to choose which is best. Often the best answer is not to choose- instead, take all the reasonable answers and average them. Dan Gardner and Philip Tetlock call this approach Dragonfly Eye forecasting, since dragonfly’s eyes see through many lenses. So what does the dragonfly see here?

Lets start with the US House, since everyone covers it.

  • FiveThirtyEight’s latest forecast shows that Republicans have an 85% chance of taking the House; it shows a range of possible outcomes, but on average predicts that Republicans win the popular vote by 4.3% and take 231 House seats (substantially over the 218 needed for a majority)
  • The Fair Model predicts that Democrats will win 46.6% of the two-party vote share (leaving Republicans with 53.4%). This has Republicans winning the popular vote by 6.8%, a moderately bigger margin than FiveThirtyEight. The reasoning is interesting; the economy is roughly neutral since “the negative inflation effect almost exactly offsets the positive output effect”, so this is mainly from the typical negative effect of having an incumbent party in the White House.
  • Prediction markets: PredictIt currently gives Republicans a 90% chance to take the House. Polymarket gives them 87%. Insight Prediction also gives them 87%. Kalshi doesn’t have a standard market on this, but their contest (free to enter, 100k prize) predicts 232 Republican seats.

Its a bit tricky to average all these since they don’t all report on the same outcome in the same way. But the overall picture is clear: Republicans are likely to do well in the House, with an ~87% chance to win a majority, expected to win the popular vote by ~5.55% and take ~232 seats.

The Senate is closer to a coin flip and harder to evaluate.

  • FiveThirtyEight gives Republicans a 53% chance to win a majority (51+ seats for them; Democrats effectively win if the Senate stays 50-50 since a Democratic Vice President breaks ties for at least 2 more years). The most likely seat counts are 50-50 or 51-49, but confidence intervals are pretty wide and 54-46 either direction isn’t ruled out.
  • The Fair Model doesn’t make Senate predictions, only House and Presidential predictions.
  • Prediction markets: PredictIt gives Republicans a 70% chance to win a Senate majority, probably with 52-54 seats. PolyMarket gives Republicans a 65% chance, as does Insight Prediction. Kalshi predicts 53 Republican seats.

Overall we see a much higher variance of predictions in the Senate; a 17pp gap between the highest (70%) and lowest (53%) estimates of Republican chances, vs just a 5pp gap for the House (90% to 85%). This shows up with the seat counts too; everyone agrees there’s a substantial chance Republicans lose the Senate, but if they do win, it will probably be by more than one seat. The average estimate is ~52 Republican seats. FiveThirtyEight and PredictIt agree that the closest Senate races will be Georgia, Pennsylvania, Arizona, Nevada, and New Hampshire (though they rank order them differently), so those are the races to watch.

Forecasts for governors aren’t as comprehensive, but FiveThirtyEight predicts we’ll get about 28 Republican (22 Democratic) governors, while PredictIt expects 31+ Republicans; I’ll split the difference at 30. Everyone agrees that Oregon is surprisingly competitive because of an independent drawing Democratic votes. The biggest difference I see is on New York, where PredictIt gives Republican challenger Lee Zeldin a real chance (26%) but FiveThirtyEight doesn’t (3%).

Overall forecast: moderate red wave, Republicans take the House and most governorships, probably the Senate too. But if they lose anything it is almost certainly the Senate.

These forecasts seem about right to me. Democrats are weighed down by an unpopular (-11) President and the highest inflation in 40 years. This would lead to a huge red wave, but Republicans have their own weaknesses; an unpopular former President lurking in the background, and the Supreme Court making a big unpopular change voters blame them for. This shrinks the red wave, but I don’t think its enough to eliminate it. The effect of Roe repeal is fading with time, and the unpopular Biden is more salient than the unpopular Trump; Biden is the one in office and is more prominent in media coverage. Facebook and recently-acquired Twitter may be doing Republicans a favor by keeping Trump banned through Election day. But if he drags Republicans down anywhere, it will be the Senate, where candidate quality (not just party affiliation) is crucial and his endorsements pushed some weak/weird/extreme candidates through primaries. We’ll also see this “extremist” Trump effect (abetted by cynical Democratic donations to extreme-right candidates) dragging down Republicans in some key governor’s races like Pennsylvania, where Democrats are now 90/10 favorites..