Update on Game Theory Teaching

I wrote at the end of the summer about some changes that I would make to my Game Theory course. You can go back and read the post. Here, I’m going to evaluate the effectiveness of the changes.

First, some history.

I’ve taught GT a total of 5 time. Below are my average student course evaluations for “I would recommend this class to others” and “I would consider this instructor excellent”. Although the general trend has been improvement, improving ratings and the course along the way, some more context would be helpful. In 2019, my expectations for math were too high. Shame on me. It was also my first time teaching GT, so I had a shaky start. In 2020, I smoothed out a lot of the wrinkles, but I hadn’t yet made it a great class. 

In 2021, I had a stellar crop of students. There was not a single student who failed to learn. The class dynamic was perfect and I administered the course even more smoothly. They were comfortable with one another, and we applied the ideas openly. In 2022, things went south. There were too many students enrolled in the section, too many students who weren’t prepared for the course, and too many students who skated by without learning the content. Finally, in 2023, the year of my changes, I had a small class with a nice symmetrical set of student abilities.  

Historically, I would often advertise this class, but after the disappointing 2022 performance, and given that I knew that I would be making changes, I didn’t advertise for the 2023 section. That part worked out perfectly. Clearly, there is a lot of random stuff that happens that I can’t control. But, my job is to get students to learn, help the capable students to excel, and to not make students *too* miserable in the process – no matter who is sitting in front of me.

Continue reading

Do People Trust ChatGPT Writing?

My new working paper with Will Hickman is up on SSRN: Do People Trust Humans More Than ChatGPT?

We study whether people will pay for a fact-check on AI writing. ChatGPT can be very useful, but human readers should not trust every fact that it reports. Yesterday’s post was about ChatGPT writing false things that look real.

The reason participants in our experiment might pay for a fact-check is that they earn bonus payments based on whether they correctly identify errors in a paragraph. If participants believe that the paragraph does not contain any errors, they should not pay for a fact-check. However, if they have doubts, it is rational to pay for a fact-check and earn a smaller bonus, for certain.

Abstract: We explore whether people trust the accuracy of statements produced by large language models (LLMs) versus those written by humans. While LLMs have showcased impressive capabilities in generating text, concerns have been raised regarding the potential for misinformation, bias, or false responses. In this experiment, participants rate the accuracy of statements under different information conditions. Participants who are not explicitly informed of authorship tend to trust statements they believe are human-written more than those attributed to ChatGPT. However, when informed about authorship, participants show equal skepticism towards both human and AI writers. There is an increase in the rate of costly fact-checking by participants who are explicitly informed. These outcomes suggest that trust in AI-generated content is context-dependent.

Our original hypothesis was that people would be more trusting of human writers. That turned out to be only partially true. Participants who are not explicitly informed of authorship tend to trust statements they believe are human-written more than those attributed to ChatGPT.

We presented information to participants in different ways. Sometimes we explicitly told them about authorship (informed treatment) and sometimes we asked them to guess about authorship (uninformed treatment).

This graph (figure 5 in our paper) shows that the overall rate of fact-checking increased when subjects were given more explicit information. Something about being told that a paragraph was written by a human might have aroused suspicion in our participants. (The kids today would say it is “sus.”) They became less confident in their own ability to rate accuracy and therefore more willing to pay for a fact-check. This effect is independent of whether participants trust humans more than AI.

We are thinking of fact-checking as often a good thing, in the context of our previous work on ChatGPT hallucinations. So, one policy implication is that certain types of labels can cause readers to think critically. For example, Twitter labels automated accounts so that readers know when content has been chosen or created by a bot.

Our working paper is currently trending on SSRN top ten lists such as this one.

Suggested Citation:
Buchanan, Joy and Hickman, William, Do People Trust Humans More Than ChatGPT? (November 16, 2023). GMU Working Paper in Economics No. 23-38, Available at SSRN: https://ssrn.com/abstract=4635674

GPT-4 Generates Fake Citations

I am happy to share my latest publication at The American Economist: ChatGPT Hallucinates Non-existent Citations: Evidence from Economics

Citation: Buchanan, J., Hill, S., & Shapoval, O. (2024). ChatGPT Hallucinates Non-existent Citations: Evidence from Economics. The American Economist. 69(1), 80-87  https://doi.org/10.1177/05694345231218454

Blog followers will know that we reported this issue earlier with the free version of ChatGPT using GPT-3.5 (covered in the WSJ). We have updated this new article by running the same prompts through the paid version using GPT-4. Did the problems go away with the more powerful LLM?

The error rate went down slightly, but our two main results held up. It’s important that any fake citations at all are being presented as real. The proportion of nonexistent citations was over 30% with GPT-3.5, and it is over 20% with our trial of GPT-4 several months later. See figure 2 from our paper below for the average accuracy rates. The proportion of real citations is always under 90%. GPT-4, when asked about a very specific narrow topic, hallucinates almost half of the citations (57% are real for level 3, as shown in the graph).

The second result from our study is that the error rate of the LLM increases significantly when the prompt is more specific. If you ask GPT-4 about a niche topic for which there is less training data, then a higher proportion of the citations it produces are false. (This has been replicated in different domains, such as knowledge of geography.)

What does Joy Buchanan really think?: I expect that this problem with the fake citations will be solved quickly. It’s very brazen. When people understand this problem, they are shocked. Just… fake citations? Like… it printed out reference for papers that do not actually exist? Yes, it really did that. We were the only ones who quantified and reported it, but the phenomenon was noticed by millions of researchers around the world who experimented with ChatGPT in 2023. These errors are so easy to catch that I expect ChatGPT will clean up its own mess on this particular issue quickly. However, that does not mean that the more general issue of hallucinations is going away.

Not only can ChatGPT make mistakes, as any human worker can mess up, but it can make a different kind of mistake without meaning to. Hallucinations are not intentional lies (which is not to say that an LLM cannot lie). This paper will serve as bright clear evidence that GPT can hallucinate in ways that detract from the quality of the output or even pose safety concerns in some use cases. This generalizes far beyond academic citations. The error rate might decrease to the point where hallucinations are less of a problem than the errors that humans are prone to make; however, the errors made by LLMs will always be of a different quality than the errors made by a human. A human research assistant would not cite nonexistent citations. LLM doctors are going to make a type of mistake that would not be made by human doctors. We should be on the lookout for those mistakes.

ChatGPT is great for some of the inputs to research, but it is not as helpful for original scientific writing. As prolific writer Noah Smith says, “I still can’t use ChatGPT for writing, even with GPT-4, because the risk of inserting even a small number of fake facts… “

Follow-Up Research: Will Hickman and I have an incentivized experiment on trust that you can read on SSRN: Do People Trust Humans More Than ChatGPT?

@IMurtazashvili has pointed me to a great resource for AI-era literature review work. “AI-Based Literature Review Tools” from Texas A&M

OpenAI, IZA, and The Limits of Formal Power

Companies and non-profit organizations tend to be managed day-to-day by a CEO, but are officially run by a board with the legal power to replace the CEO and make all manner of changes to the company. But last week saw two striking demonstrations that corporate boards’ actual power can be much weaker than it is on paper.

The big headlines, as well as our coverage, focused on the bizarre episode where OpenAI, the one of the hottest companies (technically, non-profits) of the year, fired their CEO Sam Altman. They said it was because he was not “consistently candid with the board”, but refused to elaborate on what they meant by this; they said a few things it was not but still not what really motivated them.

Technically it is their call and they don’t have to convince anyone else, but in practice their workers and other partners can all walk away if they dislike the board’s decisions enough, leaving the board in charge of an empty shell. This was starting to happen, with the vast majority of workers threatening to walk out if the board didn’t reverse their decision, and their partner Microsoft ready to poach Sam Altman and anyone else who left.

After burning through two interim CEOs who lasted two days each, the board brought back ousted CEO Sam Altman. Formally, the big change was board member Ilya Sutskever switching sides, but the blowback was enough to get several board members to resign and agree to being replaced by new members more favored by the workers (including, oddly, economist Larry Summers).

A similar story played out at IZA last week, though it mostly went under the radar outside of economics circles. IZA (aka the Institute for Labor Economics) is a German non-profit that runs the world’s largest organization of labor economists. While they have a few dozen direct employees, what makes them stand out is their network of affiliated researchers around the world, which I had hoped to join someday:

Our global research network ist the largest in labor economics. It consists of more than 2,000 experienced Research Fellows und young Research Affiliates from more than 450 research institutions in the field.

But as with OpenAI, the IZA board decided to get rid of their well-liked CEO. Here at least some of their reasons were clear: they lost their major funding source and so decided to merge IZA with another German research institute, briq. Their big misstep was choosing for the combined entity to be run by the the much-disliked head of the smaller, newer merger partner briq (Armin Falk), instead of the well-liked head of the larger partner IZA (Simon Jaeger). Like with OpenAI, hundreds of members of the organization (though in this case external affiliates not employees, and not a majority) threatened to quit if the board went through with their decision. Like with OpenAI, this informal power won out as Armin Falk backed off of his plan to become IZA CEO.

Each story has many important details I won’t go into, and many potential lessons. But I see three common lessons between them. First is the limits to formal power; the board rules the company, but a company is nothing without its people, and they can leave if they dislike the board enough. Second, and following directly from this, is that having a good board is important. Finally, workers can organize very rapidly in the internet age. At OpenAI nearly all its employees signed onto the resignation threat within two days, because the organizers could simply email everyone a Google Doc with the letter. Organizers of the IZA letter were able to get hundreds of affiliates to sign on the same way despite the affiliates being scattered all across the world. In both cases there was no formal union threatening a strike; it was the simple but powerful use of informal power: the voice and threatened exit of the people, organized and amplified through the internet.

Axios Survey of Americans on AI Regulation

Axios just surveyed over 2,000 U.S. adults to find that “Americans rank the importance of regulating AI below government shutdowns, health care, gun reform…” Without pressure from the public to pass new legislation, Congress might do nothing for now, which will lead to the rapid advance of LLM chatbots into life and work.

The participants seem more worried about AI taking jobs than they are excited about AI making life better. There is some concern about misinformation.** So, they don’t think AI will have no impact on society, but they also don’t see enacting regulation as a top priority for government.

At my university, the policy realm I know best, we will probably not be “regulating” AI. We have had task forces talking about it, so it’s not because no one has considered it.

The Axios poll revealed gender gaps in attitudes toward AI. Women said they would be less permissive about kids using AI than men. Also, “Parents in urban areas were far more open to their children using AI than parents in the suburbs or rural areas.” Despite those gaps in what people say, I expect that the gaps in what their children are currently doing with technology are smaller. Experimental economists are suspicious of self-reported data.

**Our results did not change much when we ran the exact same prompts through GPT-4. A version of my paper on AI errors that I blogged about before is up on SSRN, but a new manuscript incorporating GPT-4 is under review: Buchanan, Joy, Stephen Hill, and Olga Shapoval (2023). “ChatGPT Hallucinates Nonexistent Citations: Evidence from Economics”. Working Paper.

Fear of the Unknown and Fear of the Known

Alfred Hitchcock’s ‘Psycho’ famously omits graphic violence. You never see the bad guy stab anyone – though it’s heavily implied. Some say that this accounts for the impact of the film. The most thrilling parts are left to the viewer’s imagination. And a person’s imagination can be pretty terrifying. The delight of the unseen was especially appropriate at a time of 13 inch televisions and black-and-white movies. If the graphics on the screen couldn’t carry the movie, then the graphics in a person’s mind would do the trick.

Fast forward to ‘Burn Notice’. I don’t watch this show, but my in-laws do. They have a huge TV with a super high resolution. The TV has a diagonal span that almost surpasses my height. I’m short, but not that short. This is a big TV.  I’ve only seen Burn Notice at their house. It strikes me as poorly acted, poorly written, and self-serious to the point of absurdity. I keep expecting that self-referential nod to the open secret that the show is ridiculous, but it never comes. It’s a bad show. From all that I can see in high definition, there’s nothing worth seeing.

What is so good that I watch? Although I’m seven years late, I’ve recently been watching Marvel’s Luke Cage. Being a superhero show, some of the standards are lowered. The script is weak at times, the acting is OK, and the plot has some credibility holes. But the point of the show is to explore a world in which superheroes exist, and one of them happens to live in Harlem. Luke Cage is part of the earlier Marvel cadre of post-acquisition-by-Disney shows that also includes Iron Fist, Daredevil, & Jessica Jones. These shows are less tongue-in-cheek and comedic than the later shows like Loki, Wandavision, or Moon Knight. I enjoy watching Luke Cage on a small 40 inch television, and occasionally on my phone.  

Then I stayed at an Airbnb last weekend that had a HUGE TV. This thing easily had a diagonal measure that surpassed my height. After getting the kids down and answering emails, I sat down to enjoy my current go-to show before hitting the hay. And dang it if I wasn’t distracted the entire time. On this massive screen I could see every pore on everyone’s face and every blank stare parading as acting. I could see each and every glare of poor lighting and every character’s ill-timed reply and change of expression.  Most of the show is one big charade.

Much to my dismay, I had discovered that I was watching ‘bad tv’. Let me be clear. I’m not supposed to watch bad tv. That’s the realm of those other people. But me? I have enlightened preferences and a refined pallet. I’m not a person who watches bad tv. But that grandiose self-conception has been dashed by this serendipitous visit to a nice Airbnb.

I’ve had some time to dwell on my new revelation and this is what I’ve settled on. First, I’m going to keep watching Luke Cage on my small TV and I’m going to enjoy it. There is little that I can do now about the nagging knowledge that, given a higher resolution, it’s not a good show. You can’t unknow things. Second, maybe Burn Notice isn’t a bad show. Maybe it’s just a bad show when I can see too much detail, such as on my in-law’s TV. Maybe I would enjoy it on a TV with lower resolution. Regardless, I’m not going to watch it.

Third, now I have a new margin of preference over shows and movies. Now I consider whether a show or movie would be helped or hurt by more visual detail. Quick-paced, big-budget action shows like Jack Ryan are probably better in greater detail. Game of Thrones is probably better as a 4k experience. But shows in which the comedy or the drama unfolds by virtue of the circumstances, rather than the visual spectacle, are probably best watched at a lower resolution. When the audience experience hinges on implications and connections that occur in the viewer’s mind, that’s probably a better show at a lower resolution. Luke Cage is a ‘good’ show in low-res. In high-res, I’m afraid that see too much.

When Hitchcock omitted visual detail, he leaned on the mind’s eye to fill in the gaps. He was guiding the brain toward conjuring the unnerving scenes that he could not as easily mimic on screen. Advances in home entertainment have moved the goalpost. A more detailed viewing experience changes the type of shows that we are willing to watch because we have a new criteria for fitness. The supply side response on the part of studios is that shows lacking visual stimulation will need to lean more on the mind’s eye and our interpretations of social interactions in order to for audiences to experience the best version of the show. Because the best version won’t be in front of us. We know too much.

Is the repair revolution coming?

Every sentence in this article is fascinating, since I have been writing about fast fashion.* Anything I put in quote form comes from The Guardian.

The word “revolution” in the title of this article is minor clickbait. Perhaps it would be more accurate to say: “Clothes repaired in workshop, 19 people employed” That wouldn’t get any clicks. However, I am an idealist, and I am going to stay a bit on board with the revolution. I, too, have pondered and grieved over the amount of waste heading into landfills. There could be some kind of revolution ahead, whether it is of the repair type or not.

The communal garden and bespoke textile art lend a creative startup feel, and the slogan “repair is the new cool” appears everywhere. But what’s happening here is far from ordinary startup stuff. At United Repair Centre (URC), newcomers to the Netherlands from across the world, many of them former refugees, are using their tailoring skills to mend clothes on behalf of some of the world’s biggest brands. 

Immigrants are sewing, but no Dickensian horrors here. This place “has a laid-back Dutch vibe.”

Ambrose, who greets me, mans the front desk. He’s a 20-year-old Palestinian fashion fan, who was born in Syria and lived in Abu Dhabi before moving to the Netherlands in May; he is working in parallel with studying for a fashion and design diploma. Ambrose started at URC in May and loves it: the way he gets to work in collaboration with the tailors, giving advice and learning from their years of experience. “It’s really easy, fun, chill … “

The verdict is in. Work is fun.

Repair might be cool, but is it new? Consider Jo March from “Little Women” who was an American bouncing around between rich and poor status in the 1860s. American GPD per capita in 1860s was less than $3,000. That would be considered very poor today. Since manufactured goods were expensive and Jo March had a low opportunity cost of time, she spent lots of time mending clothes. Her passion was writing but she had no choice – that was how she contributed to her household production. Very few families at that time, even in the upper class, could afford to regularly buy new clothes from a shop.

Don Boudreaux explained that even modern rich people “recycle” clothes when it’s in one’s selfish interest. Washing and “re-use” of clothes, typically, is beneficial enough to outweigh the cost of maintaining and storing them. Sometimes we go above and beyond by donating them or maintaining them specifically because we are trying not to “waste” something, but that comes at an individual cost to us.

The author of the article writes:

I take a taxi from the station to URC because I’m running late, but I’m taken aback when en route the driver points out the many conveniently located stations and tram stops I could use for my return journey.

This is a perfect encapsulation of why rich people do not repair clothes. They are zipping around to high-productivity work meetings. The opportunity cost of time has gone up. Taking the bus is costly in terms of time, the scarcest resource of the rich.

Where I see hope for the repair “revolution” is in artificial intelligence (AI). AI can make up for our scarce time and attention. If AI can make repairs less costly in terms of time, then rich people might do it. If it doesn’t make economic sense, then it won’t scale the way the author is hoping.

Currently, the “revolution” is employing 19 people full-time. By the year 2027, all they are hoping for is to expand to 140 tailors. Hardly a revolution on the jobs front. But that’s the hopeful scenario. If it’s labor-intensive, then it won’t work. (See my ADAMSMITHWORKS post on cloth production and labor.)

Is repair reaching a tipping point?

There’s one unlikely scenario in which expensive repairs will get paid for. What rich people resoundingly want is kitchen renovations and new clothes, partly because it confers status. Could it become cool to live with those outdated cabinets and wear that repaired Patagonia vest for the next two decades? … could it? Vision: “Wow. I see that you guys have outdated ugly countertops. Nice. You resisted the desire to renovate your kitchen even though it’s within your budget.”

Even changing status markers are unlikely to tip the scale in the case of broken equipment or torn clothes. AI might allow us to repair a refrigerator instead of trash it.

URC tracks repairs using software initially developed by Patagonia, which it has built on and uses for the other brands involved.

There it is. Software makes the dream work.

Shein and the like are out there, churning out, in dizzying volumes, fast fashion that can’t be repaired.

In my conversations with Americans, many do not know what “fast fashion” is. That’s fast fashion. The 19-140 tailors are currently no match for Shein.

There isn’t always much common language – operational manager Hans says they resort to Google Translate quite a bit – but there’s plenty of laughter.

The AI, again! We are living in the globalized AI-powered future.

Lastly, the article was brought to my attention on Twitter (X) by Bronwyn Williams and Anna Gat.

* I’m going to have a fashion article coming soon in this series: https://www.cato.org/defending-globalization

Video for new ChatGPT users

Have you not gotten around to trying ChatGPT for yourself yet?

Ethan and Lilach Mollick have released a series of YouTube videos that encapsulate some current insights, aimed at beginners, posted on Aug. 1, 2023. It covers ChatGPT, Bing, and Bard. Everyday free users are using these tools.

Practical AI for Instructors and Students Part 2: Large Language Models (LLMs)

If you are already using ChatGPT, then this video will probably feel too slow. However, they do have some tips that amateurs could learn from even if they have already experimented. E. Mollick says of LLMs “they are not sentient,” but it might be helpful to treat them as if they are. He also recommends thinking of ChatGPT like an “intern” which is also how Mike formulated his suggestion back in April.

  • I used GPT-3.5 a few times this week for routine work tasks. I am not a heavy user, but if any of our readers are still on the fence, I’d encourage you to watch this video and give it a try. Be a “complement” to ChatGPT.
  • I’ll be posting new updates about my own ChatGPT research soon – the errors paper and also a new survey on trust in AI.
  • I hear regular complaints from my colleagues all over the country about poor attempts by college students to get GPT to do their course work. The experiment is being run.
  • Ethan Mollick has been a good Twitter(X) follow for the past year, if you want to keep up with the evolution and study of Large Language Models. https://twitter.com/emollick/status/1709379365883019525
  • Scott wrote this great recent tutorial on the theory behind the tools: Generative AI Nano-Tutorial
  • It was only back in December 2023 that I did a live ChatGPT demonstration in class, and figured that I was giving my students there first ever look at LLMs. Today, I’d assume that all my students have tried it for themselves.
  • In my paper on who will train for tech jobs, I conclude that the labor supply of programmers would increase if more people enjoyed the work. LLMs might make tech jobs less tedious and therefore more fun. If labor supply shifts out, then quantity should increase and wages should fall – good news for innovative businesses.

The Internet Knows EVERYTHING: Stopping My Car Alarm from Randomly Triggering

I have an oldish Honda that still runs smoothly. It is true that the cruise control does not work, and the left front fender is held on by a large binder clip, and I had to patch over a big rust hole in a rear wheel well, but as I said, it runs.

I sometimes park it down at the end of the street, under some shade trees, to get it out of the hot summer sun. A couple of times, for no reason, the antitheft system kicked on, so the car was honking and honking for hours on end because we didn’t hear it down there. Some neighbors down there finally figured out who it was and came and told us. They were nice about it, but I heard some other folks down there were pretty irritated.

That happened again two weeks ago, so I decided to keep it in front of our house all the time where we could keep an ear on it. Supposedly the alarm is triggered when the car thinks that a door or the trunk or the front hood has been opened without a legitimate unlocking by a key or a fob. Therefore, I opened and closed all four doors, and the trunk and the hood, and locked the car and hoped all will go well. But a few hours later there it was: honk, honk, honk….

As a temporary measure, I simply left it unlocked, so the system would not arm. But that’s not a long-term fix. So, I rolled up my sleeves and went to the internet to see what help I could find there. One common suggestion was to find the fuse that controls the alarm system and just pull it out of the fuse box. That would be great, but I checked multiple fuse diagrams for my model, and it does not seem to be a fuse that controls just the alarm system.

Other web sites mentioned that day sensor on the front hood latch is a common failure point. The sensor there can start giving spurious signals when it gets old. If you are sure that’s the problem, you can have a garage replace it for labor plus maybe 100 bucks for the replacement latch.

Alternatively, you can just pull apart the connector that connects the hood latch sensor to the alarm system. That connection is in plain sight near the latch. If the latch is the problem, disconnecting that sensor should make the alarm system think the latch is always firmly closed, so it will not trigger an armed system.

But what if the hood latch is not a problem? What if the problem is the common but elusive damage to wiring caused by rodents gnawing on the insulation which contains soybean derivatives??  After sifting through about 10 links that were thrown up by my DuckDuckGo search on the subject, I finally found a useful discussion on the “civicsforum.com”.

A certain “andrickjm” wrote that he had disconnected that wire junction, and his car alarm was still randomly going off. Some savant going by the moniker “ezone” wrote that what you needed to do then is to insert a little wire jumper between the two sockets of the connector that go to the alarm system. That will make the alarm system think the hood is always raised, never closed, and this will keep a system from ever arming.

So I cut a 1-inch piece of wire, stripped the insulation from the two ends, bent it into a U-shape, jammed the two bare wire ends into the two holes in the connector socket, and sealed it all up with duct tape.


The alarm has not sounded since. Victory at last, thanks to the distributed intelligence of the internet, resting on the efforts of millions of good-hearted souls who share their problems and solutions in all areas of life.

Solving the Participation Pickle with Pick.al

Joy: This post was written by my friend and fellow econ professor Cameron Hardwick.

One of my biggest ongoing teaching challenges is keeping students engaged during lectures.

Sure, there are ways to add interactivity here and there, but sometimes there’s just no way around an old-fashioned lecture.

There are a few ways of dealing with this, and I haven’t been satisfied with any.

  1. It’s their grade, if they zone out that’s on them. In terms of the incentives, sure, the externalities are all internalized. But as a macroeconomist, I also know: if time-inconsistency problems are hard for policymakers, how much more for students! We shouldn’t be surprised when students do poorly if the main feedback they get from paying attention or not comes a week later with the homework grade.
  2. Posing questions and waiting for answers. Either you get a minute of awkward silence, or you get the same two engaged students answering everything while everyone else keeps zoning out.
  3. Cold calling. I started doing this a few years into teaching. The advantage is that it keeps students on their toes and paying attention. But a few problems left me unsatisfied:
    1. “How about you in the red shirt”. Hard to catch a student’s attention that way, and in a class of 40 or more, learning names takes a good chunk of the semester.
    1. I had no systematic way of keeping track of participation. Every semester I’d look at the roster and still have a few names I couldn’t put a face to.
    1. Humans are really bad at making random choices! Much as I tried, I couldn’t guarantee I wasn’t biased toward or against (say) the corners of the room, or students whose names I knew.
  4. LMS software. These can offer a lot of great student participation tools. But students have to pay for them – which isn’t worth it if you’re just looking for one feature. On top of that, then you’re locked into an ecosystem.

So, I made an app myself. It does one thing and does it well.

Pick.al (pronounced Pickle) picks students at random from a roster and keeps track of participation points. I can now pose a question in class, ask “what do you think…”, pull out my phone and hit a button, and have a name.

I can also record the quality of their answers:

  • ✓: 1 point, good attempt! (Since this is for participation points, I record ✓ whether right or wrong, as long as they give it a good shot)
  • ?: 0.5 points, if they ask “wait, what was the question?”
  • ×: 0 points, if they’re not there or don’t respond at all.

There’s also a 1-5 scale option, for those who want a more fine-grained evaluation.

This has a lot of benefits in the classroom:

  • Since I can call on students by name, I learn names more quickly.
  • Pick.al chooses randomly from the pool of students who have been called on the least so far. So, I know my participation points are as fair as possible.
  • Students know they can get called on at any time, so they pay attention more in class, and then do better on the homeworks and tests.
  • Students appreciate being brought in more frequently. One noted on the evaluations the first semester I piloted it: “something specific I like is he got the class involved by calling people out which forced them to test their knowledge which is something teachers need to do more of.”

Using Pick.al is as simple as registering (with an email address or an OrcID), uploading a roster, and then hitting a button during class. You can also swipe through the history and edit or undo participation events, and go back in the admin interface and add, edit, and remove participation events after the fact if necessary.

Pick.al is secure and password-protected, and has a number of handy features:

  • You can set excused absences if a student lets you know beforehand, so their name doesn’t come up until a certain date.
  • You can select specific students from the roster in a sidebar, if you want to give credit to – say – a student who raises his hand unbidden.
  • If you’d like to use the classroom computer instead of pulling out a phone, you can use it with full keyboard navigation.
  • Scores can be downloaded as a CSV to be put in your own gradebook.
  • Private notes can be added to students to show up when their names are selected, e.g. “sits in the back corner”

If you use it and find a bug or have an idea that would make it more useful to you, feel free to let me know. It’s been a great tool in my own classes, and I hope it’ll be useful for other teachers to keep students engaged too.