AI Agents Will Transform Work. They'll Set Back Learning.

There's a new wave in AI right now, and it's not about chatbots. Over the past several months, something shifted. AI systems went from answering questions to doing things: writing entire codebases, managing complex projects, executing multi-step tasks with minimal human oversight. Tools like Clawdbot, an open-source agent framework that blew up almost overnight, let people run AI locally on their own machines, connected to their own apps, acting on their behalf. Claude Code is changing how software gets built.
McKay Wrigley, a developer whose judgment I trust, called Opus 4.5 "to agents what Sonnet 3.5 was to code" — the inflection point. METR, an independent evaluation org, found that the time horizon over which AI can work autonomously is doubling every four to seven months. Engineers at Brisk are handing over nearly all of their coding to AI and spending their time reviewing, directing, and deciding.
This isn't hype. The agent paradigm works for knowledge work, and it's going to change how a lot of industries operate.
Why the Agent Paradigm Fails Students
A reporter asked me recently what this means for K-12 education. I've been thinking about this for a while, and my honest answer is: probably a lot of smoke. Not because agents aren't powerful, but because the logic that makes them work for software engineering is almost exactly the logic that makes them harmful for learning. I don't think enough people in education are making this distinction clearly.
The easiest way to see it is to ask: what's the goal?
In software, the goal is the output. Ship the feature, fix the bug, deploy the system. If an AI agent writes the code for you, better and faster than you could have, that's the point. The agent doing the work is the product. An engineer's job increasingly becomes reviewing, directing, and deciding while the AI handles execution.
In school, the goal is the opposite. The output (the essay, the problem set, the lab report) is almost beside the point. What matters is what happened in the student's brain while they produced it.
A student who uses an AI agent to complete an assignment has not been helped. They've been robbed of the only thing that mattered. The fact that the output looks better is, if anything, worse: it creates the illusion of learning where none occurred.
We now have hard evidence for this. A randomized controlled trial across nearly a thousand high school students in Turkey, published in PNAS, tested exactly this scenario. Students were split into three groups: no AI access, a standard ChatGPT-like tool, and a carefully guardrailed AI tutor designed with input from their teachers.
Students using the standard AI tool performed 48% better on practice problems. They also performed 17% worse on the exam, when the AI was taken away. Most troublingly, they didn't know it.
They believed they had learned more than the control group. The OECD, in its 2026 Digital Education Outlook published just weeks ago, coined a term for this: metacognitive laziness. Students offload not just the work but the awareness of whether they're actually learning.
The guardrailed tutor group did somewhat better (127% improvement on practice) but on the unassisted exam, they did no better than students who never had AI at all. Even the most thoughtfully designed student-facing AI tool produced zero measurable learning gains on the assessment that actually mattered.
This isn't just about students being lazy. It's something structural about what agents are and what K-12 education is.
{{banner-cta}}
Three Structural Mismatches
Mismatch 1: The Expert-in-the-Loop Problem
Agents are built for a specific kind of work. They're good at open-ended, novel problems where the human in the loop is skilled enough to evaluate the output. An engineer using Claude Code can read the pull request, spot the error, understand the tradeoff, and redirect. The agent proposes; the expert disposes. This works because the human has enough domain knowledge to be a real check on the agent's work.
A ninth-grader can't do this. When an AI tutor walks a student through a math problem or drafts a paragraph for their essay, the student lacks the very knowledge they're supposed to be acquiring. They can't evaluate whether the AI's reasoning is sound, whether a step was skipped, or whether the explanation, while fluent, is subtly wrong.
The human-in-the-loop model breaks down precisely when the human is the one who is supposed to be learning.
This isn't something better prompt engineering or more sophisticated guardrails can fix. It's structural. The student is, by definition, not yet the expert.
Mismatch 2: Routine Work vs. Novel Reasoning
There's a second mismatch. Agents are designed for novel, generalizable tasks, the kind that requires flexible reasoning across unfamiliar territory. But K-12 education isn't novel. It's 180 days of structured, sequenced curriculum. A third-grade teacher in Houston and a third-grade teacher in Memphis are teaching roughly the same standards in roughly the same order. The work is routine in the best sense: carefully designed to build knowledge incrementally, each lesson resting on the one before. You don't need an AI that reasons from scratch every time. You need something that executes reliably against a known scope and sequence. The agent paradigm is pointed at the wrong problem.
Mismatch 3: Productive vs. Unproductive Struggle
Then there's the deepest issue, one that good teachers understand intuitively but that's almost entirely absent from the AI-in-education discourse: the difference between productive and unproductive struggle.
Not all difficulty is the same. When a student wrestles with a challenging math problem and feels that friction of not quite getting it — that's productive struggle. That's where learning happens. But when a teacher spends forty minutes reformatting a lesson plan across three different platforms, or manually differentiating a reading passage for five reading levels, or hunting through an LMS for the right standard to tag — that's unproductive struggle. It produces no learning for anyone. It's pure overhead.
A general-purpose AI agent can't tell the difference between productive and unproductive struggle. Its default is to collapse both.
When you point an agent at a student, it eliminates the productive struggle — the part where learning lives — right alongside the busywork. That's what the Turkey study showed. The AI made the work feel easier. The students felt like they were learning. But the ease was the problem.
What Actually Works for Students
So if autonomous agents are the wrong model for students, does that mean AI should never touch a student? No. But the direction of the AI has to be completely different from the agent paradigm, and the teacher has to stay in control of what the AI is driving toward.
Think about two different designs for a student-facing AI activity. In the first, a student opens a general-purpose AI and says "help me write an essay about the causes of the Civil War." The agent obliges. It drafts an outline, suggests a thesis, offers paragraphs to refine. The student ends up with a polished essay and a vague sense of having participated. This is the agent model applied to learning, and it's what the Turkey study measured. It doesn't work.
In the second design, a teacher creates an activity grounded in this week's unit. The AI's job isn't to help the student complete the task. It's to drive the student toward demonstrating a specific learning outcome. The student can't advance by asking the AI to do the thinking. The AI asks questions, pushes back, requires the student to articulate their reasoning. The only way through is to show you understand. The teacher defines the outcome; the AI enforces the path.
In the first model, the AI is pointed at the task, and its success metric is task completion. In the second, the AI is pointed at the student, and its success metric is whether the student demonstrated understanding.
That distinction sounds small but it changes everything. One is an agent. The other is an instrument of the teacher's pedagogy. This is what we've built with Brisk Boost: student-facing AI activities where the teacher sets the learning objective, the AI is grounded in the actual curriculum, and the student has to do the thinking. The AI's job is to make the productive struggle unavoidable, not to eliminate it.
The issue was never "should AI touch students." It's: who is in control, and what is the AI optimizing for? A general-purpose agent optimizing for task completion will always tend to do the work for the student. That's what agents are built to do. When the teacher defines the learning outcome and the AI is constrained to drive toward it, the dynamic flips. The AI becomes a tool for enforcing rigor, not bypassing it.
Where Agents Do Belong: The Teacher Workflow
The teacher side is equally important, and it's where the agent paradigm actually does fit.
Teachers are drowning, but not in the work of teaching. In the work around teaching. The average teacher touches half a dozen platforms daily: their LMS, their SIS, their curriculum portal, Google Docs, email, grading tools. The overhead of navigating these systems, reformatting content between them, and manually adapting materials for different student needs is enormous. Data shows that teachers using AI weekly save an average of nearly six hours per week. Over a school year, that's six full weeks of time back.
The tasks that eat teachers alive are exactly the kind agents do well: differentiate this unit for three reading levels, align this assessment to the district's standards, generate a parent communication about this week's objectives, build a small-group activity grounded in the lesson's scope and sequence. These are constrained, repeatable, reviewable workflows. They're not open-ended. They're not novel. They're the same types of tasks, executed hundreds of times a year, against a known curriculum.
And the teacher is the expert in the loop. They review the AI's output before it reaches a student. They catch the error, adjust the tone, redirect the approach. The human-in-the-loop model works here because the human has the domain knowledge to evaluate what the agent produces.
This is the kind of AI that K-12 actually needs: not an autonomous agent reasoning from first principles about what a student should learn next, but a system that executes reliably against a district's actual curriculum: their scope and sequence, their adopted materials, their standards and pacing guides. The intelligence isn't in the general reasoning. It's in the retrieval, the grounding, the fidelity to what this district, in this grade, in this unit, is actually teaching this week.
{{banner-cta}}
The fact that AI agents are transforming software engineering doesn't mean they'll transform everything. The logic of code is not the logic of learning. In code, you want the AI to do the work. In school, the student doing the work is the entire point. Treating learning like another knowledge-work task to be automated is a mistake that could set back a generation of students while making everyone feel like progress is being made.
The technology is powerful. The question is whether we point it at the work, or at the people who are supposed to be learning from the work.
The opportunity in K-12 is real and it's significant, but it's in putting teachers in control. Of the curriculum the AI draws from. Of the learning outcomes it drives toward. Of the workflows it executes on their behalf. Not in handing students a general-purpose agent and hoping for the best.
Less busywork. More impact.
Latest Posts

.webp)
.png)
.webp)
.webp)