I have been dancing around this post for the past few weeks. My sabbatical has been well-timed, in that it has coincided with the most significant shift in the capabilities of AI tools since the advent of ChatGPT in 2022. I’m referring here to the power of new agentic AI models and tools, like Claude Opus 4.6 and Claude Code. After much ballyhooing about 2025 being the year of agents, 2026 has brought with it mainstream coverage of the destabilization of the software engineering profession and the introduction of non-coding agentic tools. There’s also a real split in my circles between folks on a kind of adrenaline high as they automate their lives versus folks wondering what the heck they’re talking about.

I confess to being in that first group; the timing of my sabbatical means I have been able to dive very deeply into what’s now possible versus what was possible before, and I genuinely do believe that November marked the start of a categorically different thing. Rather than web-based chatbots, my AI agents are working with my files, my calendar, and passively created metadata as I go about my day. Rather than helping me with a step of the process, they are mapping out entire plans, tracking their own progress, sending other agents to complete sub-tasks so they don’t get distracted, and making complex decisions for me. And they’re quite good at it.

This post isn’t going to be about how insane and potentially destabilizing this all is (but it really, really is). If you’re in academia and are interested in knowing what this looks like for a social science researcher, I highly recommend Scott Cunningham’s Claude Code series on his Substack. It is much more focused on research than teaching, but he is effectively leveraging the power of these tools to rethink how he does his work in a way that is worthwhile. He occasionally makes screencasts where you can just watch him use Claude Code for a while, which is a great way to get a feel for it. Also, I particularly like his sobering post on faculty adoption of agentic AI that I can’t help but agree with.

The goal of this post is to instead reflect on what I think all of this means for the content that we teach our students. It has changed my thinking substantially, in a way I didn’t expect. But first, a little more on agents.

An easy way to understand agents

The term agent is thrown around a lot to mean a lot of things. I personally subscribe to Simon Willison’s definition of agents as an LLM/AI model that runs tools in a loop to achieve a goal. I’ll take each part of this in turn, using Deep Research as a simple example of agentic behavior:

Tools: LLMs predict next words, which is why we use them to write e-mails and reports. But they can also predict whether or not to invoke some other functionality; this “tool” then does something and returns something back to the LLM. So, when an LLM does a web search, it is predicting a search query and seeing what the web search gives back to it. After you tell an AI with Deep Research what you need, it does a web search based on your question to gather data. There are many more tools (sometimes called connectors), but let’s stick with web search for now.
Loop: When vanilla ChatGPT does a web search for you and answers based on the results, its work is over until your next prompt. An agentic model takes the results of that tool call, considers them, and runs the tool again with new information. You can see this in the reasoning trail of Deep Research tools; they do a web search, and based on what they find, they do more web searches, on and on.
Goal: At some point the agent decides that it has run its tools enough to have what it needs to accomplish its goal. An AI with Deep Research decides it is saturated with enough information to write a nicely formatted report. At that point it stops, writes the report, and presents it to you.

So that’s all an agent is.

Deep Research runs web searches in a loop to write a report.

Claude Code/ChatGPT Codex executes code in a loop to build software.

Claude Cowork runs commands on your local machine in a loop to accomplish whatever task you give it.

So what happened in the past few months that led to this explosion in capability? In short, the models got very, very good. The first Deep Research functionality was bundled with ChatGPT-o3, its flagship reasoning model at the time which was great at this task. In late 2025 Anthropic and OpenAI released models that were very good at this iterative process for a much wider variety of tasks: Claude Opus 4.5 and ChatGPT 5.2 could plan, reason, iterate, and judge on complex, ambiguous goals.

Why this is different

It’s hard to describe how much of a shift this is, as someone who has spent the past several years studying and using generative AI. In practice, these tools feel qualitatively different from what came before them. While I used to be very focused on effective prompting and being very careful with the setup of my chatbots, I can now treat the AI that lives in my terminal as an entity that can “own” tasks that I give it. I regularly tell Claude Code to build a set of scripts that I don’t know how to build; to make a plan for my day based on my schedule, my upcoming deadlines, my previous day’s notes, and my priorities; to synthesize content across multiple presentations into decks for a different audience, complete with adapted presenter notes and tweaked layouts. It will work on these problems for a while, sometimes 25-30 minutes as it loops and loops and iterates and iterates. Perhaps most significantly for me, it will write tests to see if what it’s doing is accomplishing what I asked it to, run those tests and, based on the results, adapt its work.

In short, these tools can already do a lot of the things that I do in my job, and they can handle work that a research assistant would need a lot more guidance from me to complete. I can feel the details of my day-to-day shifting, and shifting quickly. For the first time, I really do feel like I am supervising a fleet of researchers. And, reader, I suspect you will start to feel this way sooner rather than later.

Oh right, Course Corrections

I’m already getting more philosophical than I’d like, so let’s get back to what this means for teaching. The big shift is this: while we’ve spent a lot of time worrying about AI being able to do all our assignments, we now have to think seriously about AI being able to do the jobs that we have been training students for. Whereas before I envisioned my students using ChatGPT to do a quick analysis, or evaluate the quality of claims from a report they receive, I now see a future where AI agents make my students capable of doing things that would not even be on the table for their jobs at this time.

Instead of someone in a mayor’s office writing a report on complaints the city receives from a hotline, they can produce high quality transcriptions of each call; turn them into a structured dataset categorizing the complaints by subject area, locality, sentiment, and urgency; and produce an interactive visualization to better understand how the mayor’s office should respond.

And here’s the kicker: I believe that this will become the kind of work no longer reserved for an “analyst”; anyone with knowledge of the context and facility with AI will be able to “commission” work of this type by delegating it to agents. They will not need to write any code, doing little more than guiding the agents, spot-checking their work, and thinking of how to best assess the robustness of results.

How can I prepare my students to be able to do that?

Alone Together

In my last post (which I quite liked!), I talked about the tension between deepening expertise and developing AI skills. I ended that post arguing that both things are real and important, though what “AI skills” means remains difficult for me to pin down.

However, this agentic shift has changed my thinking somewhat. I had previously (read: two weeks ago) assumed that the right way to teach these two sets of things is to interleave them with one another; learn content, learn how to do it, learn how to do it with AI, repeat. But I’m starting to feel that subject expertise and AI skills need to be taught as separate, discrete units.

As I reflect on how I am currently using these tools, two things stick out:

My expertise is more useful than before, not less: As my agents do the work I assign them, I really do feel like a supervisor. I need to give high-level guidance and ask the right questions to make sure they are working as I want them to. As I drift further into vibe-coding things I do not have a technical background in, I am less able to do this, and it shows in the final result. I’ve gone down unhelpful rabbit holes because I didn’t understand the nature of the problem, and failed to think of possible errors because I wasn’t familiar enough with the underlying software libraries being used. This is true despite the fact that I am not touching 99.9% of the code that is written.
Getting good at these agents takes practice: When I start raving to my partner and friends about all the things I had Claude do today, I have had some try to do the same and come back to me with very disappointing results. Presentations look terrible because Claude’s pptx skill isn’t loaded; asking ChatGPT how to change its behavior leaves them struggling for hours (once again, DO NOT do this unless you know that the tool has this capability built into it, such as Claude Skills). This is an unfortunate realization, because the tools that we are using in six months may have a completely different set of capabilities that we will need to learn, but I do not see a way around learning the nitty-gritty, even if it’s transient.

So why do I think it’s a bad idea to teach students statistics and teach them how to use agentic tools to do statistics in the same course? To start, subject-matter expertise remains extremely important, and the risk of dulling the development of that expertise with automation is real. As I watch my agents work, I review the steps that they’re taking and map them onto my mental model of what should be happening. Developing those strong mental models is an essential part of directing these agents, as is developing an intuition for when things seem to be going wrong. I think that developing this kind of intuition is hard unless it’s grounded in lots of productive struggle, encountering errors in the wild and correcting them.

Now, this isn’t to say that what we teach as subject-matter expertise should remain static. I do believe there is a fair amount of content that I teach that can now be abstracted in order to make room for higher-level thinking. For example, I am less convinced of the value of teaching the details of standard error calculations, which were simplified in our course to begin with, when my agent can bootstrap SEs on its own. Bootstrapping is currently missing from my curriculum, and I think that’s a problem. On the other hand, I don’t think I can fully abstract the coding components of what I teach. The notion of reproducible analyses will remain important, and having documentation of each step of an analysis and a loose ability to follow what happened isn’t going to go away. This also isn’t to say that AI doesn’t have a role in developing that expertise, but it will have to be through carefully crafted, structured uses with specific goals in mind.

On the other hand, I do think that practice with these tools is absolutely essential to using them well. In some ways it is like learning how to be a supervisor; in other ways it’s learning how to talk to an alien. But either way, I do not want my students to make the many (many) mistakes I have made getting to know these tools when they’re actually on the job. And, while students are probably much more exposed to generative AI than their faculty, I wager that they are using these tools in ways that do not necessarily make them better able to discern good AI output from bad.

Here come the tradeoffs

So what does this look like? Here are three possibilities:

Solution A: Make my statistics course shorter, or significantly redesign its content, to make room for a discrete AI module. That will require a lot of difficult decisions about what’s essential and what isn’t.

Solution B: Tack on an additional requirement to the student curriculum. I recognize that the typical faculty member’s solution to any problem is “MORE SCHOOL!,” and I do not want to fall into that trap.

Solution C: Make “AI Lab Sessions” the same way that I traditionally had Stata or R sessions. This certainly is in the spirit of what I’m imagining; more hands-on and project-based. But the goal of these types of sessions when I was a student was to practice the ins and outs of code. This needs to have a higher-level component as well, as students work through what it means to manage multiple agents accomplishing tasks for them in practice.

At the moment I am leaning toward a version of Solution A, where the goals of the course remain the same, but the nature of the expertise I aim to develop in my students is different. This means a stronger focus on judgment, evaluating analyses, and less emphasis on the traditional building blocks of calculating posterior probabilities or selecting the right estimator for a hypothesis test. I would then reserve the last several weeks of the course to learning how to direct agentic tools to work effectively, and what the next steps of a policymaker should be after the production of a directed analysis.

There’s a lot of risk here of course. My students will not all have access to tools as powerful as Claude Code in their jobs (for financial, enterprise agreement, or policy reasons), and there’s immense variability in what AI tools can do at the moment (my friends stuck with Copilot are understandably incredulous when we compare notes). Abstracting away too much of the details could make them more likely to just accept results blindly. And the puck continues to defy laws of physics, so I may be moving my students toward a reality that only materializes briefly. The balance will be very difficult to strike.

Of course, I could just have my AI avatar teach my students how to use AI in one-on-one sessions.

2026 is shaping up to be a hell of a year.

Tags: AI Pedagogy Course Corrections

Agentic Everything

How the latest set of models changes things.

An easy way to understand agents

Why this is different

Oh right, Course Corrections

Alone Together

Here come the tradeoffs