Thoughts on AI — 2026 edition
It’s a couple years since I wrote about AI[1] and things have been changing fast. In 2024, AI had started to impinge on my day-to-day work, but 2025 felt like the year in which it changed things dramatically. While I still would characterize a lot of my work with AI as consisting of "arguing with LLMs", there were definitely times when they produced results of acceptable quality after a few rounds of revision, feedback, and adjustment, and the net time and effort required felt like a relative win compared to doing it all myself.
New tools
The biggest change came in the form of the arrival of Claude Code. Instead[2] of just chatting with an LLM (mostly using my Shellbot fork), I could now delegate to it as an agent without having to abandon my beloved $EDITOR[3]. What began as experimentation (figuring out what can this thing do?) has since turned into an integral part of my workflow: even for changes I could very quickly carry out myself, I will instead turn to Claude and ask it to make the change, unless it is utterly trivial (ie. the threshold for cutting over to Claude is the point at which it can manually manipulate the text faster than I can).
New capabilities
2025 brought customization mechanisms, Model Context Protocol[4], subagents, skills, and custom slash commands among other things. From my point of view, these all have the same goal, namely, equipping agents with:
- Specialized knowledge that enables them to obtain the information they need; and:
- The means to carry out necessary actions in service of an objective; while simultaneously:
- Not overflowing the context window with garbage which obscures things and prevents the agent from producing a correct result.
Collectively, these are probably more important and useful than improvements to the models themselves. Speaking of which…
New models
2025 brought model sicophancy to the forefront, and Claude was no exception. Around mid-year, Claude’s "You’re absolutely right!" was ringing in the ears of users across the world in an almost continuous chorus. Thankfully, it seems to have subsided a bit now.
I didn’t follow the whole model benchmarking question very closely, and am in general only interested in how well the models improve my experience in my daily work. Overall, subjectively, I’d say that the models improved significantly over the last year, but as I said before, I believe that it’s the tooling around the models that had the greater impact.
Use cases
In my last post, I said that LLMs were good for "low-stakes stuff like Bash and Zsh scripts for local development", "React components", "Dream interpretation", and "Writing tests". In 2025, I used them for a lot more than that. I used them for fixing bugs, adding features, working across multiple languages and services, and for explaining foreign codebases to me.
Where they shine:
- Places where there are a lot of guard rails in place to provide them with clear feedback about success (eg. working in a strongly typed language like Rust, or in an environment where you can perform automated verification in the form of linting or tests).
- They’re also great in places where you may not even know where to start but their ability to quickly search large corpuses and repos can quickly locate leads for you to follow.
Places where they still leave much to be desired:
- Things where the non-determinism of their output means that you can’t trust the quality of their results. For example, say you have a change that you want to uniformly make across a few hundred call sites in a repo. Your first instinct might be to say, "This is a repetitious change, one that should be amenable to automation, and if the LLM can be given clear instructions that allow it to do it correctly in one place, then it should be able to do it quickly and correctly in all 100 places". Sadly, this could not be further from the truth. LLMs are inherently non-deterministic, and that means there’s always a random chance that they’ll do something different on the 19th, 77th or 82nd time. You will have to check every single modification they make, and you may be far better off getting the LLM to create a separate, deterministic tool to carry out the work. And if you want to throw caution to the wind and have the LLM make all the changes for you anyway, you’re probably better off firing off the agent in a loop, with a clean context for every iteration and a clearly stated mechanism for verifying the correctness of the change, than expecting a single agent to carry out any significant amount of work serially.
- Anything that can’t be trivially described with a minimum of context. This is a conclusion that I’ve recently come to. In the past, I thought that bigger context windows would endow the models with the ability to solve fuzzier problems, the kinds that humans are particularly good at (with their ability to take into account disparate sources of information scattered across time and place). But my experience with even relatively small amounts in their context (ie. far less than 200K tokens), is that models can easily "overlook" salient information that’s "buried" in the context, even when it’s not that large. Failure modes include things like telling the model to look at a series of commits, and then observing how it "forgets" something critical in the first of the series; it proposes a change that looks like it only actually attended to the most recent tokens in its context window, and often ends up contradicting itself, or reimplementing a decision that it previously reverted. My suspicion is that when we have models that have 10-million-token context windows, we’ll still get the best results when we distill down everything we want them to "know" into the first few thousand tokens.
Job security
In 2024 I said that I wasn’t worried about AI taking my job in the near term, but that things could change quickly, and I advised to "judiciously use AI to get your job done faster". In 2026, AI has clearly gotten to the point where it is making real waves in tech workplaces. Not only is AI making it possible for people to ship more code and faster than before, there is also considerable business pressure to make use of in in the name of maximizing productivity. Unfortunately, the signal here is very noisy: our corporate overlords can mandate the use of these tools and monitor their use, but I don’t think we have reliable evidence yet on how much of this is unalloyed value, and how much of it is technical debt, latent regressions, and noise masquerading as productivity.
Now more than ever it seems important to not only use the machines to deliver useful work, but also to focus on the places where I as a human can still deliver value where a mere next-token-predictor cannot. The pressure on both of those fronts is only going to increase. I’d say that my feeling of "precariousness" is quite a bit stronger now than it was two years ago, and I’m not looking forward to seeing that trend continue although I feel that is surely must.
In terms of job satisfaction, I’ve observed an inverse correlation: the more my job consists of me prompting the AI to do things for me, the less intrinsically satisfied I feel. This was one of the reasons why I had so looked forward to Advent of Code in December; I was itching to do some significant work with my own two hands. I look now towards the future with some dread, but also with a determination to not go gently into that good night: no matter what happens, I want to commit to finding things to take authentic pride in beyond "how I got a swarm of agents running in parallel to implement some set of inscrutable artifacts".
The impact on the world more generally
So far, I’ve been talking about how AI has affected my job. But "Gen AI", in particular, is having the expected effects on the wider world. Deep fakes, AI slop, and bot activity more generally are flooding YouTube, Twitter, and anywhere content is shared[5]. It seems that we’re already well on the way into a "post-truth" world, where our ability to distinguish fact from falsehood has been devastatingly damaged, with no prospect of putting the genie back in the bottle, given the inevitably increasing capabilities of AI systems to produce this stuff at ever higher levels of quality.
One can hold out, clinging to reliable information sources, but in the end it seems unavoidable that these will be islands of truth surrounded by oceans of worthless, endlessly self-referential fabrication. I shudder to imagine what this looks like when you fast-forward it a hundred years.
I wrote that piece in March 2024, so just a couple months shy of two years ago, to be precise. ↩︎
Maybe I should say "as well as" rather than "instead" because I still do chat with the LLM a fair bit when I want to ask it general questions about something in the world; but when doing almost anything related to coding, I almost exclusively do that via Claude Code. ↩︎
Technically, I am "abandoning" it in the sense of switching focus to another tmux pane, but Neovim continues running and I can dip in and out of it whenever I want. ↩︎
MCP nominally arrived in 2024, but as it required folks to actually build MCP servers, I think it’s fair to say that it "arrived" in a tangible way in 2025. ↩︎
I almost wrote "where humans share content", but that’s already appallingly misleading. ↩︎