When we lead our writing workshop with schools, the room gets quiet when I say “AI detection software doesn’t work and shouldn’t be used by K-12 schools”. In this post, I want to unpack the research that supports that claim, the companies that are lying by omission about that research, and what we should do with K-12 writing instruction moving forward instead of AI detection.
Best AI detectors: Only right 80% of time
A growing body of peer-reviewed research shows that AI detection is unreliable and easily fooled. Dr. Laura Weber-Wulff at the University of Applied Sciences of Berlin, alongside researchers across the world, published a landmark 2023 study showing that leading AI detection software is largely inaccurate. Figure 1 below, taken from the study, shows that even the best AI detection software is only correct about 80% of the time. Put another way, it will incorrectly label writing more than 1/5th (!!) of the time.
Paraphrasing makes AI detectors even worse
If being wrong 1/5th of the time wasn’t bad enough, other studies have shown that this number drops dramatically with simple paraphrasing.
AI detection software works by looking for distinct probabilistic patterns that emerge from pieces of words (i.e. tokens) that have been strung together by large language models. To break up those patterns, all you need to do is change some words around. In doing so, the University of Marlyand’s Computer Science department was able to drop AI detection in passages of 300 tokens in length from 99.3% to 9.7%.
So students need simply to paraphrase a few different chunks of a larger AI output to drop AI detection chances by 90%. This study joins a body of research (see here, here, and here) that shares the opinion offered by Dr. Weber-Wulff and her team: “Our conclusion is that the systems we tested should not be used in academic settings.”
AI detectors biased against non-native English speakers
Stanford researchers also recently came to the same conclusion. A study out of the Stanford Computer Science Department found that AI detection tools disproportionately flag non-native English speakers’ writing. “Our findings reveal that these detectors consistently misclassify non-native English writing samples as AI-generated, whereas native writing samples are accurately identified. Furthermore, we demonstrate that simple prompting strategies can not only mitigate this bias but also effectively bypass GPT detectors, suggesting that GPT detectors may unintentionally penalize writers with constrained linguistic expressions.”
Universities — and even OpenAI — decry AI detectors
This finding, along with all the others, have caused many leading universities to effectively ban AI detection. Alabama, Yale, MIT, Syracuse, UC Berkeley, The University of Central Florida, Colorado State, Missouri, Northwestern, Southern Methodist University, Vanderbilt, and countless others have all either banned AI detection tools or advised their faculty against using them.
If you don’t trust peer-reviewed 3rd party research teams or dozens of universities that span the political spectrum, then take it directly from the developers of world-leading AI tools. OpenAI, the company behind ChatGPT and arguably the world’s leading collection of AI researchers and developers, shut down its effort to develop an AI detection tool in 2023. Why? “As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy.” Open AI directly addresses educators in the post: “We recognize that identifying AI-written text has been an important point of discussion among educators,” they say, “and equally important is recognizing the limits and impacts of AI generated text classifiers in the classroom.” Put another way: This stuff doesn’t work, and the impact of using a tool that doesn’t work might be just as bad as not being able to identify AI in student writing.
The dishonest AI detection industry
It seems the only people claiming that AI detection software works are the people selling it.
TurnItIn, the largest and most influential of the AI detection companies, recently published a whitepaper in which they claim to show that, “Turnitin’s AI writing detection system has been independently shown to have high effectiveness in correctly identifying AI-generated content.” When I dug into this claim, I found it was partly supported by the following research: “Weber-Wulff (2023) showed that Turnitin’s AI writing detection system outperformed all other AI writing detection solutions on the market in accurately detecting AI writing across a diverse set of evaluations.”
The Weber-Wulff study above showing that TurnItIn’s AI detection software was wrong over 20% of the time? The same one that concludes “these systems should not be used in academic settings”? Yet TurnItIn cited that study as supporting their product.
One conclusion I come to here is that TurnItIn assumed that no one would read the research they cited. Another conclusion I come to is that TurnItIn is knowingly lying, by omission, to teachers.
I called out TurnItIn’s AI detector
I’m not a confrontational person. At all. I grew up in a small town in the Midwest, I apologize to other people when they bump into me on the sidewalk. But when I see a big company lying to schools and teachers for profit? That makes me more confrontational.
So, a few months ago, I tweeted about this. And it got picked up.
Journalists reached out. I did an interview with Edutopia. All the sudden, I was becoming “the anti-AI detection guy.” To be honest, I don’t want to be the anti-AI detection guy. And as much as it upsets me that a large company is fooling schools into wasting resources, I also recognize that it does no good for educators to just rant and rave about it.
So I reached out to TurnItIn.
The face-to-face meeting with TurnItIn
I was at the AI AIR Show in San Diego in April, where TurnItIn had a booth and keynote focused on their AI detection software. I (nervously) went up to their booth and asked if they would put me in touch with the folks that wrote their recent white paper (dubiously, the authors were not listed).
To TurnItIn’s credit, they were open to talking. They put me in contact with Patti Smith-West, their customer engagement lead. I thought it was interesting that, when asked to clarify which white-paper I was referring to, Patti’s response was, “I suspected this might be the one you meant, and my team does not own this one. You would need to reach out to Eric Wang, our VP of AI.” So I reached out to Eric and, again to his credit, he was open to meeting with me.
Over the course of about a 30-minute Zoom, we actually had an amazing conversation. I appreciated Eric’s candor and willingness to engage with me, especially given we started the call with a terse, “I saw your Twitter thread.”
“I think the skepticism that exists is healthy, but false positives cannot go to zero. That’s not how statistics work.” Eric argued that failures in using TurnItIn are often failures of school policy. “TurnItIn can’t be a policy and you shouldn’t buy us as that… You should never give a kid a zero based on TurnItIn results alone.” He also argued, as do many teachers who continue to insist on using AI detection, that TurnItIn should instead be used as a “healthy conversation starter”. If you suspect students of using AI, run it through TurnItIn, then use those results to start a conversation with a student about how or if they violated academic integrity expectations.
In addressing the Weber-Wulff study, Eric made analogies to breast cancer detection and weather forecasts. “When doctors use cancer screenings, they have an X% false positive rate. But that doesn’t prevent them from using the screening and it doesn’t make the screening a bad thing.” Eric continued by arguing that the weather forecast isn’t always right either, but there is still utility in forecast apps.
I deeply appreciate Eric’s willing ess to talk, especially given he knew I was a skeptic. At a time when it feels like education is becoming more polarized (see Jo Boler and the Algebra wars in California, Dan Meyer & the AI ChatBot wars, or the ongoing science of reading wars), I think it’s vital that we be open to this sort of dialogue with people we disagree with. Dialogue and empathy will lead to solutions that better serve teachers. That said, here’s why I still think Eric’s arguments are misguided.
Why the AI detection juice ain’t worth the squeeze
The point Eric made repeatedly in our conversation is the same point I’ve seen teachers make (see comments here) in defense of AI detection: It’s a conversation starter for when teachers suspect AI. The problem with this is that, given how largely unreliable these tools are, those conversations with students will start with a baseless accusation.
I don’t know any relationship experts that advise baseless accusations as a meaningful avenue for relationship building. In fact, therapists advise omitting blame entirely and starting tough conversations from a place of empathy. This is particularly important in K12 education because the relationship between a teacher and student is one of the most important factors in determining student learning outcomes. Given how important relationships are in K12 classrooms, using a tool that has a >20% chance of causing distrust or harm in a teacher/student relationship just isn’t worth it.
This leads to the second issue: The chance for misuse is too high.
Some teachers may use AI detection as a conversation starter. Many others (as you will see in the comments on my recent Facebook post) will simply use it to hand out zeros, even though the people creating this software agree it should “never be used to give a kid a zero.” The risk of teachers reaching for AI detection as the “easy button” that they can use when they suspect improper AI use is too high, especially given the aforementioned research showing that these tools will disproportionately flag non-native English speakers’ writing.
There are better uses of school funding
If the risk of damaging relationships with students isn’t enough, think about the opportunity cost. K-12 funding is always tight, but with ESSER funding running out this year, schools are making mission-critical decisions about what to cut or keep. It pains me to think that broken AI detection software might be prioritized over the other amazing uses we could allocate that funding towards.
It’s not listed on their site (at least not that I could find), but I believe TurnItIn costs anywhere from $3 to $5 per student (that link is dated, this price may have gone up). The top 100 largest US school districts all have at least 40,000 students. So district-wide access to TurnItIn might cost anywhere from $120K to $200K for our largest districts (again, it’s probably higher than this).
Think of the infrastructure, training, support services, extracurriculars, safety measures, library resources, early childhood programs, arts and music opportunities, STEM boosts, etc. that this money might be used for instead of software that has (at best!!) a 20% chance of damaging a relationship with a student.
Better alternatives than AI detectors
What makes this all particularly galling is that there are free alternatives that are better at monitoring AI influence on student work.
Simply set an expectation with your students up front that they need to construct their paper in the same Google Doc or Microsoft Word document for the entirety of their writing and have them use a free extension like Draftback or Revision History. These are much better avenues for conversation starters because they playback videos of students working through a document. You could use these to build metacognitive reflection into the writing process even when you don’t suspect AI usage. We discuss this in-depth in our writing workshop.
AI detectors unveil a bigger issue in instruction
This speaks to the larger issue here: AI detection leads us down the wrong path in adjusting instruction and assessment post-AI. It represents the easy-button option – “I don’t have to change my instruction and assessment, I’ll just tell kids not to use AI and use detection software”.
As we head into a world in which AI can increasingly do almost any task we throw at a student and in which it will be almost impossible to monitor AI’s influence if and when work leaves the classroom, we must instead make much larger, systemic changes to our instruction and assessment. Part of this means accepting that students can, will, and (in specific instances) should use AI to help them write.
The trick becomes teaching students how to use AI to write with them rather than for them. This is an incredibly difficult balance to pull off. Students often lack the motivation, metacognition, and self-regulated skills necessary for this balance. We lead entire workshops and are building a platform focused on finding this balance, but we’ll address at least a few points on the path forward in the next section.
Before doing so, I want to reiterate: Please do not use AI detection. The risk of damaging relationships and opportunity costs associated with this software just aren’t worth it. Your spidey senses should be tingling when the only folks arguing that this software works are the people selling it.
The path forward
So how do we move forward in a world without AI detection software? Again, that’s a huge question that we lead entire workshops on, but here are a few entry points.
1. Set clear classroom policies for academic integrity that allow for some AI use
The first step in finding balance in AI’s influence on student writing lies in setting clear classroom policies and expectations. Given that AI detection software doesn’t work and that AI will more than likely influence student work once it leaves the classroom, I agree with those advocating for thinking about AI through the lens of a tiered level of influence. Assignments should be labeled in advance as to whether or not AI can be used to complete them. A simple “red, yellow, green” framework is a great place to start:
- Red = No AI
- Yellow = Some AI (with specific constraints)
- Green = Full AI.
Leon Furze and his AI assessment scale and Adrian Cotterell are doing great work on this. Dr. Sonja Bjelobaba’s work on reframing academic integrity is also fantastic.
2. Bring more writing instruction and assessment into the classroom and make it cross-curricular
In-class writing allows for much more control over how much influence AI has in the writing process. Not through any technical process, simply through classroom management. It’s partly why we’re building Short Answer. If we’re going to bring writing into the classroom, it will need to be bite-sized, engaging, and motivating for students. For us, that means making writing a social experience.
I also believe that we’re going to need to find ways to make up for the lost practice repetitions that will result from AI being embedded in almost every piece of technology our students use. That means writing instruction can no longer just fall on the English teachers. It needs to be, as has been advocated for decades, embedded in every subject. And if we’re going to ask the math, science, social studies, and electives teachers to embed writing in their classes, it better be easy. Because teachers already have WAY too much on their plates. To us, that again means making writing bite-sized and used as a means to teach content. We’re building a platform for that 😊.
3. Teach students to use AI effectively in their writing
This is a huge topic and one that is particularly scary given: 1. All the risks inherent in AI use and 2. Peer-reviewed research is limited on how best to go about this yet. This doesn’t mean we should sit on our hands.
The dynamic of any effective AI use in the classroom needs to be AI in the loop, teacher in charge. If and when we don’t set clear classroom policies, we turn kids loose on AI we haven’t trained them on, and/or we enable too much influence over student thinking, the dynamic can too quickly become teacher in the loop, AI in charge. While we wait for a more robust research base, try using well-established instructional frameworks on how best to introduce students to new content. For example, we’ve repurposed Dr. Nancy Grey & Dr. Douglas Fischer’s gradual release of responsibility framework and paired it with a variety of new AI tools in our workshops to show teachers how they might go about effectively introducing AI use in their content area to students.
If you teach in the core subjects, AI literacy doesn’t need to be your central focus. That said, I do think you need to coach kids through, at a basic level, how it can be used effectively in your content area. In the context of ELA, that might mean showing students how to write effective prompts, reorganizing and paraphrasing the generic writing produced by AI, comparing and contrasting human vs. AI written content, using AI as a brainstorming partner, debating and AI chatbot for persuasive writing, etc. This is a huge topic that many people are doing amazing work on.
I really love Anna Mills’ work on this if you’re looking for inspiration. English teachers Meg Lamont and Kristina Zarlengo also have an inspiring post on how they’re going about this at Stanford Online High School. We also discuss this in-depth in our workshops.
4. Center more personal reflection and/or defense in writing practice
Building in a reflection or defense element to writing assessments can be an important step towards developing students’ writing ability while also ensuring we don’t lose the benefits of going through the writing process. Reflections and defenses aid in personal growth and encourage critical thinking. They also require students to explore personal feelings, experiences, and perceptions in ways that are difficult for AI to replicate easily (i.e. it’s hard to use ChatGPT to “cheat” on this).
Include a video reflection requirement using Google Forms to have students submit a video reflection they can make on their phones or Padlet to have class discussions about these reflections (I would have said Flip, but…). I also recently began exploring Mirror Talk and like what I’ve experienced, but haven’t used it enough to fully vouch for it yet. We also embed reflection in every writing activity in Short Answer.
5. Be intentional about the “why”
“Why should I learn how to write when AI can do it for me?” is an incredibly complicated question that deserves to be addressed directly, both as a staff in professional development and in conversations with our students when assigning writing tasks.
The simple answer here is that literacy is one of the most important predictors of health and well-being as students go through their lives. It is the means through which they will (or will not) be able to communicate their thoughts and ideas with the world. Going through the process of writing literally transforms thought. Being explicit about this and having an open dialogue with students about it is a start. More importantly, embedding writing practices that allow students to experience the vital skill that is writing should be a focus moving forward.
Stop chasing easy solutions to complex problems
Notice that none of these steps involve AI detection software. Rather than wasting time and money on these tools, we should focus on more effective strategies that promote academic integrity, enhance in-class writing instruction, and teach students to use AI responsibly.
Ultimately, we need to stop chasing easy solutions in an attempt to maintain the status quo and start accepting that AI requires fundamental shifts in the way we approach K12 writing, instruction, and assessment. Our resources should be invested towards those ends rather than the dead end of AI detection software.