Love the article, it's a unique solution even if we can still debate whether it's fully the right one. So many of us, including myself, are ready to throw in the towel. That said I think you're on the right track here.
The main problem I see is the coordination problem, particularly with respect to university administrations who are conflict averse and frankly don't care that much about academic integrity.
I think we should try to persuade some of our administrators, especially those who negotiate with software vendors. But I know that's tough with all the teaching and other responsibilities. Or maybe offer up our classes to computer scientists and programmers who want to do trials with these sort of provenance-establishing technologies...
I'm with you, I'm just sort of jaded because of my experience with some administrators. For example, I had strong evidence but not 100% proof (since it's not yet possible) that a student had used AI last semester. The student denied it of course and complained to the dean. I suggested to the dean that we ask him to write the assignment in the test proctoring center where we could completely verify he didn't use it. But the dean gave a bunch of reasons why this would violate this and that policy. So many are conflict averse and stuck in red tape.
I saw a bunch of discussion of watermarking back in 2023 (before ordinary writing professors were discussing much about LLMs) but I think it died out because open source models that wouldn’t follow the guidelines would be too easy to come by (even if LLaMa and Deepseek cooperated, it wouldn’t be hard for some unscrupulous company to jailbreak one of them).
I’ve been working with some people here at Irvine to see if we can get an old computer lab designated as a “writing lab” that would be proctored and open 12 hours a day, where students would have computers with access to all their class materials, but no AI, so that writing assignments could require being written there.
Yes, there's been a lot of effort in that direction from my colleagues here as well and I like this development a lot. My CS colleagues started this: https://acelab.berkeley.edu/cbtf/, a computer-based testing facility and I am hoping to use it for my classes in the fall. It has a learning curve.
The one difference with the one I am hoping to get set up here is that I want ours to be a space where students can go spend four hours writing an essay, rather than a space for extra quizzes. (And I think UCI is putting together something like that one as well, given what we heard about the proposed use of one of the computer labs we were looking at.)
So glad to see a new post! I'm skeptical for two reasons, both of which you address.
The first is the coordinated action required of higher education and a regulatory framework that enforces rules to an extent that would lead all AI companies to accept. The current situation with Grok and the way it is galvanizing new legislation to limit kids access to cultural technology seems relevant.The outcome of this particular fight and the public's attitudes toward more regulation could mean something like what you are proposing seems possible in a few years.
The second is how students actually use LLMs to complete academic work. This would "work" to limit the submission of LLM outputs directly, but it seems to me that most academic uses involve manipulation of the outputs before submission. Watermarking would force some more complex processes but those, as you say, might just be running the output through various models, or software that identifies watermarks and evaluates when they are no longer there.
We already have a problem of unknown dimensions in "agents" that are allowed by LMS providers to complete and submit academic work like discussion posts and short quizzes. Solving the collective action problem of forcing Canvas, Blackboard, and the other providers to introduce countermeasures against that sort of cheating would be an important first step, one addressed to edtech companies that are beholden to educational institutions.
My view is that there are a host of practices that need to change. LLMs are forcing that change, or at least drawing new attention to the need for change. "In-class (handwritten) exams, with group projects and presentations in the viva voce format" are a start, but reforming how and why we assess students means rethinking large course formats and exam structures all the way down to the credit unit, the semester/quarter calendar, and the weird grading system of F-A. The structures of higher learning that came into existence in the early 20th century no longer work effectively to support student learning.
Hi Rob, first of all, thanks, as always, for scrupulously reading the entire thing and the comment!
I don't think we disagree that much actually: for instance, I wouldn't say that I am against instructors and schools "rethinking large course formats and exam structures ..."; I think this would be quite compatible with watermarking and indeed, instructors and schools should keep experimenting.
But I think there is a clear difference in the two ambitions. All I want is a system for reliably detecting AI-generated text (or reliably detecting human-generated text, if you will). Yes, it's difficult, but it's a clear goal. I am not even sure what the goal is when it comes to "rethinking large course formats and exam structures all the way down to the credit unit, the semester/quarter calendar, and the weird grading system of F-A"! This is perhaps a brief for experimentation and with that I agree. In a great essay, Richard Rorty distinguished between "campaigns" and "movements" (https://dissentmagazine.org/online_articles/campaigns-and-movements/) and I think the difference is that I am proposing a campaign (a clear practical goal) and you are proposing a movement (something that is built more around identity and a sense of the ineffable).
In fact, this movement already has a name; David Labaree calls it pedagogical progressivism in his book "The Trouble with Ed Schools." For pedagogical progressives, the problem has always been that the school has not kept up with the rhythm of learning outside schools. George Siemens wrote a viral essay back in 2006 (https://en.wikipedia.org/wiki/Connectivism) where he argued that the internet and online communities had made the kind of learning done in schools obsolete and schools must be reinvented. Now you say the large course and the analytical paper are obsolete because of AI. Which is all fine; I think it's important for schools to evolve.
But that shouldn't invalidate the need for some kind of reliable AI detection, which, I think, is quite compatible with pedagogical progressivism because after all, even if there is watermarking, no one can force instructors to use it or to create assignments that require students to use AI.
The fact that we're so close on most things makes it interesting when you post something I disagree with. It forces me to think more carefully!
I think Rorty's distinction is spot on. I'm talking about a movement sparked by the disruptions of AI to rework the entire apparatus of teaching in higher ed and you want a campaign to fix a problem caused by AI. Experimentation is precisely where I think we should start, and a technical solution aimed at preserving the analytic essay is an interesting experiment. Like Rorty, I'm a pragmatist, though I use the term "process philosophy" these days to avoid the confusion over James's poorly chosen term. So, I am open to different approaches and believe there are multiple paths.
In any case, glad you found time to post an essay. There is so little external reward for academics writing on the internet for free, but I'm convinced that online essays about our instructional practices are one way to build the movement I'm looking for.
The concept of watermarking LLM output, in spite of high profile supporters and papers on ArXiv etc, just doesn’t make sense.
It’s trivial to bypass any watermarking. Selectively edit the text, and you break the pattern.
The scenario of hunting down rogue LLMs built without watermarking is a totalitarian nightmare. Do people not understand that LLMs are everywhere, and that free and open source LLMs can easily be installed locally by anyone with a sufficiently powerful GPU on their computer?
Sam Altman experimenting in a lab with watermarking ChatGPT output is very 2023. Promises of watermarked LLM output need to be set alongside promises like “2025 will be the year of Agentic AI; corporations will hire AI agents in 2025” and “we will have AGI by 2027.”
Thanks for the comment. A colleague of mine made this objection as well, that in the future, everyone would be running open-weight models on their laptops. I think this is an empirical question. I'd be curious to know if there are any good representative surveys about how people/students use AI and how many people use Gemini versus an open-weight version.
I am skeptical though. My students can barely do Pivot tables on Excel. I find the idea of these students downloading open-weight models and running them on their computers somewhat hard to imagine. And when it comes to watermarking for academic integrity, it just has to do enough to deter most students, even if it doesn't deter all students.
It's also not how open-source has worked out historically. Email was an open-source protocol yet how many people run their own email servers (though that was the idea among free software advocates; they wanted to be free from corporate proprietary regimes)? They don't because using Gmail is just more convenient (not to mention it has very useful features) and even if setting up an email server has become easier and easier, most people use Gmail.
But this is a good point which merits consideration; if you know of any good studies on who uses these open-weight models, I'd love to see them.
I appreciate the engagement. I am one person who does/has implemented local LLMs on a relatively affordable M4 MacMini right here locally on my desktop. It’s ridiculously easy, getting easier, and more and more powerful models will be local and out of direct supervision by large corporate IT departments as time goes on. Obviously I’m not most students. Or like most (ex-) professors. A more salient issue is that students work by going to Google, searching for what they need, and implementing the solutions they find there. They are not limited to the big foundation models served up for profit by the corporations. There are already countless options for accessing free writing based on independent implementations of open-source LLMs other than the big five (Google, OpenAI, Anthropic, X, and Meta). Rather than involving college faculties in a legislative struggle against not only an industry, but a still spreading and developing technology, I would encourage involving faculties in a dual-strategy of staying abreast of the technology (both as threat and as utility) and organizing against/around AI through the policies and practices of universities and colleges.
Love the article, it's a unique solution even if we can still debate whether it's fully the right one. So many of us, including myself, are ready to throw in the towel. That said I think you're on the right track here.
The main problem I see is the coordination problem, particularly with respect to university administrations who are conflict averse and frankly don't care that much about academic integrity.
I think we should try to persuade some of our administrators, especially those who negotiate with software vendors. But I know that's tough with all the teaching and other responsibilities. Or maybe offer up our classes to computer scientists and programmers who want to do trials with these sort of provenance-establishing technologies...
I'm with you, I'm just sort of jaded because of my experience with some administrators. For example, I had strong evidence but not 100% proof (since it's not yet possible) that a student had used AI last semester. The student denied it of course and complained to the dean. I suggested to the dean that we ask him to write the assignment in the test proctoring center where we could completely verify he didn't use it. But the dean gave a bunch of reasons why this would violate this and that policy. So many are conflict averse and stuck in red tape.
I saw a bunch of discussion of watermarking back in 2023 (before ordinary writing professors were discussing much about LLMs) but I think it died out because open source models that wouldn’t follow the guidelines would be too easy to come by (even if LLaMa and Deepseek cooperated, it wouldn’t be hard for some unscrupulous company to jailbreak one of them).
I’ve been working with some people here at Irvine to see if we can get an old computer lab designated as a “writing lab” that would be proctored and open 12 hours a day, where students would have computers with access to all their class materials, but no AI, so that writing assignments could require being written there.
Yes, there's been a lot of effort in that direction from my colleagues here as well and I like this development a lot. My CS colleagues started this: https://acelab.berkeley.edu/cbtf/, a computer-based testing facility and I am hoping to use it for my classes in the fall. It has a learning curve.
That looks really interesting!
The one difference with the one I am hoping to get set up here is that I want ours to be a space where students can go spend four hours writing an essay, rather than a space for extra quizzes. (And I think UCI is putting together something like that one as well, given what we heard about the proposed use of one of the computer labs we were looking at.)
So glad to see a new post! I'm skeptical for two reasons, both of which you address.
The first is the coordinated action required of higher education and a regulatory framework that enforces rules to an extent that would lead all AI companies to accept. The current situation with Grok and the way it is galvanizing new legislation to limit kids access to cultural technology seems relevant.The outcome of this particular fight and the public's attitudes toward more regulation could mean something like what you are proposing seems possible in a few years.
The second is how students actually use LLMs to complete academic work. This would "work" to limit the submission of LLM outputs directly, but it seems to me that most academic uses involve manipulation of the outputs before submission. Watermarking would force some more complex processes but those, as you say, might just be running the output through various models, or software that identifies watermarks and evaluates when they are no longer there.
We already have a problem of unknown dimensions in "agents" that are allowed by LMS providers to complete and submit academic work like discussion posts and short quizzes. Solving the collective action problem of forcing Canvas, Blackboard, and the other providers to introduce countermeasures against that sort of cheating would be an important first step, one addressed to edtech companies that are beholden to educational institutions.
My view is that there are a host of practices that need to change. LLMs are forcing that change, or at least drawing new attention to the need for change. "In-class (handwritten) exams, with group projects and presentations in the viva voce format" are a start, but reforming how and why we assess students means rethinking large course formats and exam structures all the way down to the credit unit, the semester/quarter calendar, and the weird grading system of F-A. The structures of higher learning that came into existence in the early 20th century no longer work effectively to support student learning.
Hi Rob, first of all, thanks, as always, for scrupulously reading the entire thing and the comment!
I don't think we disagree that much actually: for instance, I wouldn't say that I am against instructors and schools "rethinking large course formats and exam structures ..."; I think this would be quite compatible with watermarking and indeed, instructors and schools should keep experimenting.
But I think there is a clear difference in the two ambitions. All I want is a system for reliably detecting AI-generated text (or reliably detecting human-generated text, if you will). Yes, it's difficult, but it's a clear goal. I am not even sure what the goal is when it comes to "rethinking large course formats and exam structures all the way down to the credit unit, the semester/quarter calendar, and the weird grading system of F-A"! This is perhaps a brief for experimentation and with that I agree. In a great essay, Richard Rorty distinguished between "campaigns" and "movements" (https://dissentmagazine.org/online_articles/campaigns-and-movements/) and I think the difference is that I am proposing a campaign (a clear practical goal) and you are proposing a movement (something that is built more around identity and a sense of the ineffable).
In fact, this movement already has a name; David Labaree calls it pedagogical progressivism in his book "The Trouble with Ed Schools." For pedagogical progressives, the problem has always been that the school has not kept up with the rhythm of learning outside schools. George Siemens wrote a viral essay back in 2006 (https://en.wikipedia.org/wiki/Connectivism) where he argued that the internet and online communities had made the kind of learning done in schools obsolete and schools must be reinvented. Now you say the large course and the analytical paper are obsolete because of AI. Which is all fine; I think it's important for schools to evolve.
But that shouldn't invalidate the need for some kind of reliable AI detection, which, I think, is quite compatible with pedagogical progressivism because after all, even if there is watermarking, no one can force instructors to use it or to create assignments that require students to use AI.
The fact that we're so close on most things makes it interesting when you post something I disagree with. It forces me to think more carefully!
I think Rorty's distinction is spot on. I'm talking about a movement sparked by the disruptions of AI to rework the entire apparatus of teaching in higher ed and you want a campaign to fix a problem caused by AI. Experimentation is precisely where I think we should start, and a technical solution aimed at preserving the analytic essay is an interesting experiment. Like Rorty, I'm a pragmatist, though I use the term "process philosophy" these days to avoid the confusion over James's poorly chosen term. So, I am open to different approaches and believe there are multiple paths.
In any case, glad you found time to post an essay. There is so little external reward for academics writing on the internet for free, but I'm convinced that online essays about our instructional practices are one way to build the movement I'm looking for.
The concept of watermarking LLM output, in spite of high profile supporters and papers on ArXiv etc, just doesn’t make sense.
It’s trivial to bypass any watermarking. Selectively edit the text, and you break the pattern.
The scenario of hunting down rogue LLMs built without watermarking is a totalitarian nightmare. Do people not understand that LLMs are everywhere, and that free and open source LLMs can easily be installed locally by anyone with a sufficiently powerful GPU on their computer?
Sam Altman experimenting in a lab with watermarking ChatGPT output is very 2023. Promises of watermarked LLM output need to be set alongside promises like “2025 will be the year of Agentic AI; corporations will hire AI agents in 2025” and “we will have AGI by 2027.”
Thanks for the comment. A colleague of mine made this objection as well, that in the future, everyone would be running open-weight models on their laptops. I think this is an empirical question. I'd be curious to know if there are any good representative surveys about how people/students use AI and how many people use Gemini versus an open-weight version.
I am skeptical though. My students can barely do Pivot tables on Excel. I find the idea of these students downloading open-weight models and running them on their computers somewhat hard to imagine. And when it comes to watermarking for academic integrity, it just has to do enough to deter most students, even if it doesn't deter all students.
It's also not how open-source has worked out historically. Email was an open-source protocol yet how many people run their own email servers (though that was the idea among free software advocates; they wanted to be free from corporate proprietary regimes)? They don't because using Gmail is just more convenient (not to mention it has very useful features) and even if setting up an email server has become easier and easier, most people use Gmail.
But this is a good point which merits consideration; if you know of any good studies on who uses these open-weight models, I'd love to see them.
I appreciate the engagement. I am one person who does/has implemented local LLMs on a relatively affordable M4 MacMini right here locally on my desktop. It’s ridiculously easy, getting easier, and more and more powerful models will be local and out of direct supervision by large corporate IT departments as time goes on. Obviously I’m not most students. Or like most (ex-) professors. A more salient issue is that students work by going to Google, searching for what they need, and implementing the solutions they find there. They are not limited to the big foundation models served up for profit by the corporations. There are already countless options for accessing free writing based on independent implementations of open-source LLMs other than the big five (Google, OpenAI, Anthropic, X, and Meta). Rather than involving college faculties in a legislative struggle against not only an industry, but a still spreading and developing technology, I would encourage involving faculties in a dual-strategy of staying abreast of the technology (both as threat and as utility) and organizing against/around AI through the policies and practices of universities and colleges.