<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[seangoedecke.com RSS feed]]></title><description><![CDATA[Sean Goedecke's personal blog]]></description><link>https://seangoedecke.com</link><generator>GatsbyJS</generator><lastBuildDate>Sat, 18 Apr 2026 03:23:19 GMT</lastBuildDate><item><title><![CDATA[Many anti-AI arguments are conservative arguments]]></title><link>https://seangoedecke.com/many-anti-ai-arguments-are-conservative/</link><guid isPermaLink="false">https://seangoedecke.com/many-anti-ai-arguments-are-conservative/</guid><pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Most anti-AI rhetoric is left-wing coded. Popular criticisms of AI describe it as a tool of &lt;a href=&quot;https://www.theatlantic.com/podcasts/archive/2025/09/ai-and-the-fight-between-democracy-and-autocracy/684095/&quot;&gt;techno-fascism&lt;/a&gt;, or appeal to predominantly left-wing concerns like &lt;a href=&quot;https://www.technologyreview.com/2025/05/20/1116327/ai-energy-usage-climate-footprint-big-tech/&quot;&gt;carbon emissions&lt;/a&gt;, &lt;a href=&quot;https://www.theguardian.com/commentisfree/2025/sep/10/tech-companies-are-stealing-our-books-music-and-films-for-ai-its-brazen-theft-and-must-be-stopped&quot;&gt;democracy&lt;/a&gt;, or &lt;a href=&quot;https://aphyr.com/posts/420-the-future-of-everything-is-lies-i-guess-where-do-we-go-from-here&quot;&gt;police brutality&lt;/a&gt;. Anti-AI &lt;em&gt;sentiment&lt;/em&gt; is &lt;a href=&quot;https://www.pewresearch.org/short-reads/2025/11/06/republicans-democrats-now-equally-concerned-about-ai-in-daily-life-but-views-on-regulation-differ/&quot;&gt;surprisingly bipartisan&lt;/a&gt;, but the big anti-AI institutions are &lt;a href=&quot;https://www.equaltimes.org/hollywood-s-stand-against-ai-a?lang=en&quot;&gt;labor&lt;/a&gt; &lt;a href=&quot;https://news.bloomberglaw.com/daily-labor-report/punching-in-union-leaders-gear-up-to-tackle-ai-in-future-talks&quot;&gt;unions&lt;/a&gt; and the &lt;a href=&quot;https://www.sanders.senate.gov/press-releases/news-sanders-ocasio-cortez-announce-ai-data-center-moratorium-act/&quot;&gt;progressive wing&lt;/a&gt; of the Democrats.&lt;/p&gt;
&lt;p&gt;This has always seemed weird to me, because the contents of most anti-AI arguments are actually right-wing coded. They’re not necessarily intrinsically right-wing, but they’re the kind of arguments that historically have been made by conservatives, not liberals or leftists. Here are some examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Many AI critics complain that AI &lt;a href=&quot;https://www.theguardian.com/commentisfree/2025/sep/10/tech-companies-are-stealing-our-books-music-and-films-for-ai-its-brazen-theft-and-must-be-stopped&quot;&gt;steals copyrighted content&lt;/a&gt;, but prior to 2023, leftists have been &lt;a href=&quot;https://www.reddit.com/r/Socialism_101/comments/1664yyd/i_see_many_leftists_hate_copyright_can_anyone/&quot;&gt;largely&lt;/a&gt; &lt;a href=&quot;https://overland.org.au/2017/08/how-to-think-left-on-copyright/&quot;&gt;anti-intellectual-property&lt;/a&gt; on &lt;a href=&quot;https://jacobin.com/2013/09/property-and-theft&quot;&gt;principle&lt;/a&gt; (either because they’re anti-&lt;em&gt;property&lt;/em&gt;, or because they characterize copyright as benefiting huge media corporations and patent trolls).&lt;/li&gt;
&lt;li&gt;A popular anti-AI-art sentiment is that it’s &lt;a href=&quot;https://www.theguardian.com/commentisfree/2025/may/20/ai-art-concerns-originality-connection&quot;&gt;corrosive to the human spirit&lt;/a&gt; to consume AI slop: in other words, art just inherently ought to be generated by humans, and using AI thus damages some part of our intangible human soul. Whether you like this argument or not, it’s structurally similar to a whole slate of classic arguments-from-intuition for conservative positions like anti-abortion or anti-homosexuality.&lt;/li&gt;
&lt;li&gt;Weird new technological art has traditionally been championed by the left-wing and dismissed by the right-wing (as &lt;a href=&quot;https://medium.com/@elarson39/photography-was-historically-considered-arts-most-mortal-enemy-is-ai-69a2dc2f43ef&quot;&gt;inhuman&lt;/a&gt;, &lt;a href=&quot;https://en.wikiversity.org/wiki/History_of_Photography_as_Fine_Art#:~:text=The%20simplest%20argument%2C%20supported%20by%20many%20painters%20and,mill%20than%20with%20handmade%20work%20created%20by%20inspiration&quot;&gt;cheap&lt;/a&gt;, or &lt;a href=&quot;https://encyclopedia.ushmm.org/content/en/article/degenerate-art-1&quot;&gt;degenerate&lt;/a&gt;). But when it comes to AI art, it’s the left-wing making these arguments, and others (not necessarily right-wingers) arguing that AI art can also be a medium of human artistic expression.&lt;/li&gt;
&lt;li&gt;One main worry about AI is that it’s going to take over a lot of jobs. This is a compelling argument! But the left-wing has recently been famously unsympathetic to this same argument around fossil-fuel energy jobs like &lt;a href=&quot;https://www.cam.ac.uk/research/news/former-coal-mining-communities-have-less-faith-in-politics-than-other-left-behind-areas&quot;&gt;coal mining&lt;/a&gt;, to the point where Biden infamously advised a group of miners in New Hampshire to &lt;a href=&quot;https://thehill.com/changing-america/enrichment/education/476391-biden-tells-coal-miners-to-learn-to-code/&quot;&gt;learn to code&lt;/a&gt;&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. Halting technological progress to preserve jobs is quite literally a “conservative” position.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On top of all that&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, frontier AI models themselves are quite left-wing. Notwithstanding some real cases of data bias (most infamously Google’s image model &lt;a href=&quot;https://www.bbc.com/news/technology-33347866&quot;&gt;miscategorizing&lt;/a&gt; dark-skinned humans as “gorillas”), the models reliably &lt;a href=&quot;https://news.stanford.edu/stories/2025/05/ai-models-llms-chatgpt-claude-gemini-partisan-bias-research-study&quot;&gt;espouse&lt;/a&gt; &lt;a href=&quot;https://www.brookings.edu/articles/the-politics-of-ai-chatgpt-and-political-bias/&quot;&gt;left-wing&lt;/a&gt; &lt;a href=&quot;https://www.cato.org/commentary/how-did-ai-get-so-biased-favor-left&quot;&gt;positions&lt;/a&gt;. Even Elon Musk’s deliberate attempt to create a right-wing AI in Grok has had &lt;a href=&quot;https://www.seangoedecke.com/ai-personality-space/&quot;&gt;mixed success&lt;/a&gt;. In 2006, Stephen Colbert coined the phrase “reality has a left-wing bias”. If the left-wing were more sympathetic to AI, I think they would be using this as a pro-left argument&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;So what happened? A year ago I wrote &lt;a href=&quot;https://www.seangoedecke.com/is-ai-wrong/&quot;&gt;&lt;em&gt;Is using AI wrong? A review of six popular anti-AI arguments&lt;/em&gt;&lt;/a&gt;. In that post I blame the hard right-wing turn many big tech CEOs made in 2024. That was around the same time that LLMs was emerging in the public consciousness with ChatGPT, so it made sense that AI got tagged as right-wing: after all, the billionaires on TV and Twitter talking about how AI were going to change the world were all the same people who’d just gone all-in on Donald Trump. I still think this is a pretty good explanation - just unfortunate timing - but there are definitely other factors at play.&lt;/p&gt;
&lt;p&gt;One obvious factor is the hangover from the pro-crypto mania of 2021 and 2022, where many of the same tech-obsessed folks also posted ugly art and talked about how their technology would change the world forever. Few of these predictions came true (though cryptocurrency has indeed changed the world forever), and it’s understandable that many people viewed AI as a natural continuation of this movement.&lt;/p&gt;
&lt;p&gt;On top of that, Donald Trump himself has come out strongly pro-AI, both in terms of &lt;a href=&quot;https://www.ai.gov/&quot;&gt;policy&lt;/a&gt; and in terms of actually &lt;a href=&quot;https://www.nytimes.com/2026/04/13/us/politics/trump-jesus-picture-pope-leo.html&quot;&gt;posting&lt;/a&gt; AI art himself. This naturally creates a backlash where anti-Trump people are primed to be even more anti-AI&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. Here are some more reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI has real environmental impact (though this is often wildly overstated, as I say &lt;a href=&quot;https://www.seangoedecke.com/is-ai-wrong/&quot;&gt;here&lt;/a&gt;), and the right-wing is politically committed to downplaying or denying anthropogenic environmental impacts in general.&lt;/li&gt;
&lt;li&gt;When times are tough, it’s easy to blame the hot new thing that everyone is talking about. Because the right-wing is currently ascendant in the US, left-wingers are more inclined to talk about how tough times are.&lt;/li&gt;
&lt;li&gt;The left-wing is over-represented in the kind of “computer jobs” that are under direct threat from AI.&lt;/li&gt;
&lt;li&gt;Being pro-Europe has always been left-wing coded, and Europe has been noticeably slower and more sceptical about AI than the USA.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let me finally put my cards on the table. I would describe myself as on the left wing, and I’m broadly agnostic about the impact of AI. Like the boring fence-sitter I am, I think it will have a mix of positive and negative effects. In general, I’m unconvinced by the pro-copyright and human-soul-related anti-AI arguments, or by the idea that AI is inherently right-wing, but I’m troubled by the environmental impact and the impact on jobs (which in my view are more classically left-wing positions).&lt;/p&gt;
&lt;p&gt;Still, I’m curious what will happen when the left-wing flavor of anti-AI rhetoric disappears, which I think it will (as I said at the start, anti-AI sentiment is actually &lt;a href=&quot;https://www.pewresearch.org/short-reads/2025/11/06/republicans-democrats-now-equally-concerned-about-ai-in-daily-life-but-views-on-regulation-differ/&quot;&gt;pretty bipartisan&lt;/a&gt;). When people start making explicitly right-wing anti-AI arguments, will that cause the left-wing to move a little bit towards supporting AI? Or will right-wing institutions continue to explicitly support AI, allowing anti-AI sentiment to become a wedge issue that the left-wing can exploit to pry away voters? In any case, I don’t think the current state of affairs is particularly stable. In many ways, the dominant anti-AI arguments would fit better in a conservative worldview than in the worldview of their liberal proponents.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I don’t think any did, which is probably for the best - they would have only had a couple of years to break into the industry before hiring collapsed in 2023.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Another point that isn’t quite mainstream enough but that I still want to mention: AI critics often argue that cavalier deployment of AI means that people might take &lt;a href=&quot;https://www.bbc.com/news/articles/cpd8l088x2xo&quot;&gt;dangerous medical advice&lt;/a&gt; instead of simply trusting their doctor. But anyone who’s been close to a person with chronic illness knows that “just trust your doctor” is kind of right-wing-coded itself, and that the left-wing position is &lt;a href=&quot;https://www.painnewsnetwork.org/stories/2026/4/10/doctor-faces-backlash-after-tweet-claims-four-chronic-illnesses-are-overdiagnosed&quot;&gt;very&lt;/a&gt; &lt;a href=&quot;https://yorkspace.library.yorku.ca/server/api/core/bitstreams/4ac9d968-e9b0-491b-888a-d4ed5aeb1ac3/content&quot;&gt;sympathetic&lt;/a&gt; to patients who don’t or can’t. In a parallel universe, I can imagine the left-wing arguing that patients need AI to avoid the mistakes of their doctors, not the other way around.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Is it a good argument? I don’t know, actually. The easy counter is that the LLMs are just mirroring the biases in their training data. But you could argue in response that superintelligence is also latent in the training data, and that hill-climbing towards superintelligence also picks up the associated political positions (which just so happen to be left-wing).&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;I am no fan of Donald Trump, but it doesn’t follow that everything he supports is bad (e.g. the &lt;a href=&quot;https://en.wikipedia.org/wiki/First_Step_Act&quot;&gt;First Step Act&lt;/a&gt;).&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Programming (with AI agents) as theory building]]></title><link>https://seangoedecke.com/programming-with-ai-agents-as-theory-building/</link><guid isPermaLink="false">https://seangoedecke.com/programming-with-ai-agents-as-theory-building/</guid><pubDate>Fri, 03 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Back in 1985, computer scientist Peter Naur wrote &lt;a href=&quot;https://pages.cs.wisc.edu/~remzi/Naur.pdf&quot;&gt;“Programming as Theory Building”&lt;/a&gt;. According to Naur - and I agree with him - the core output of software engineers is not the program itself, but the &lt;strong&gt;theory of how the program works&lt;/strong&gt;. In other words, the knowledge inside the engineer’s mind is the primary artifact of engineering work, and the actual software is merely a by-product of that.&lt;/p&gt;
&lt;p&gt;This sounds weird, but it’s surprisingly intuitive. Every working programmer knows that you cannot make a change to a program simply by having the code. You first need to read through the code carefully enough to build up a mental model (what Naur calls a “theory”) of what it’s supposed to do and how it does it. Then you make the desired change to your mental model, and only after that can you begin modifying the code.&lt;/p&gt;
&lt;p&gt;Many people&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; think that this is why LLMs are not good tools for software engineering: because using them means that engineers can skip building Naur theories of the system, and because LLMs themselves are incapable of developing a Naur theory themselves. Let’s take those one at a time.&lt;/p&gt;
&lt;h3&gt;Do LLMs let you skip theory-building?&lt;/h3&gt;
&lt;p&gt;Do AI agents let some engineers avoid building detailed mental models of the systems they work on? Of course! As an extreme example, someone could simply punt every task to the latest GPT or Claude model and build no mental model at all&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. But even a conscientious developer who uses AI tools will necessarily build a less detailed mental model than someone who does it entirely by hand.&lt;/p&gt;
&lt;p&gt;This is well-attested by the nascent &lt;a href=&quot;/how-does-ai-impact-skill-formation&quot;&gt;literature&lt;/a&gt; on how AI use impacts learning. And it also just makes obvious sense. The whole point of using AI tools is to offload some of the cognitive effort: to be able to just sketch out some of the fine detail in your mental model, because you’re confident that the AI tool can handle it. For instance, you might have a good grasp on what the broad components do in your service, and how the data flows between them, but not the specific detail of how some sub-component is implemented (because you only reviewed that code, instead of writing it).&lt;/p&gt;
&lt;p&gt;Isn’t this really bad? If you start dropping the implementation details, aren’t you admitting that you don’t really know how your system works? After all, a theory that isn’t detailed enough to tell you what code would need to be written for a particular change is a useless theory, right? I don’t think so.&lt;/p&gt;
&lt;p&gt;First, it’s simply a fact that &lt;strong&gt;every mental model glosses over some fine details&lt;/strong&gt;. Before LLMs were a thing, it was common to talk about the “breadth of your stack”: roughly, the level of abstraction that your technical mental model could operate at. You might understand every line of code in the system, but what about dependencies? What about the world of Linux abstractions - processes, threads, sockets, syscalls, ports, and buffers? What about the assembly operations that are ultimately performed by your code? It simply can’t be true that giving up &lt;em&gt;any&lt;/em&gt; amount of fine detail is a disaster.&lt;/p&gt;
&lt;p&gt;Second, &lt;strong&gt;coding with LLMs teaches you first-hand how important your mental model is&lt;/strong&gt;. I do a lot of LLM-assisted work, and in general it looks like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;I spin off two or three parallel agents to try and answer some question or implement some code&lt;/li&gt;
&lt;li&gt;As each agent finishes (or I glance over at what it’s doing), I scan its work and make a snap judgement about whether it’s accurately reflecting my mental model of the overall system&lt;/li&gt;
&lt;li&gt;When it doesn’t - which is about 80% of the time - I either kill the process or I write a quick “no, you didn’t account for X” message&lt;/li&gt;
&lt;li&gt;I carefully review the 20% of plausible responses against my mental model, do my own poking around the codebase and manual testing/tweaking, and about half of that code will become a PR&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that &lt;strong&gt;only 10% of agent output is actually making its way into &lt;em&gt;my&lt;/em&gt; output&lt;/strong&gt;. Almost my entire time is spent looking at some piece of agent-generated code or text and trying to figure out whether it fits into my theory of the system. That theory is necessarily a bit less detailed than when I was writing every line of code by hand. But it’s still my theory! If it weren’t, I’d be accepting most of what the agent produced instead of rejecting almost all of it.&lt;/p&gt;
&lt;h3&gt;Can LLMs build Naur theories?&lt;/h3&gt;
&lt;p&gt;Can AI agents build their own theories of the system? If not, this would be a pretty good reason not to use them, or to think that any supposed good outcomes are illusory.&lt;/p&gt;
&lt;p&gt;The first reason to think they can is that LLMs clearly do make working changes to codebases. If you think that a theory is &lt;em&gt;essential&lt;/em&gt; to make working changes (which is at least plausible), doesn’t that prove that LLMs can build Naur theories? Well, maybe. They could be pattern-matching to Naur theories in the training data that are close enough to sort of work, or they could be able to build &lt;em&gt;local&lt;/em&gt; theories which are good enough (as long as you don’t layer too many of them on top of each other).&lt;/p&gt;
&lt;p&gt;The second reason to think they can is that &lt;strong&gt;you can see them doing it&lt;/strong&gt;. If you read an agent’s logs, they’re full of explicit theory-building&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;: making hypotheses about how the system works, trying to confirm or disprove them, adjusting the hypothesis, and repeating. When I’m trying to debug something, I’m usually racing against one or more AI agents, and &lt;em&gt;sometimes they win&lt;/em&gt;. I refuse to believe that you can debug a million-line codebase without theory-building.&lt;/p&gt;
&lt;p&gt;I think it’s an open question if AI agents can build working theories of &lt;em&gt;any&lt;/em&gt; codebase. In my experience, they do a good job with normal-ish applications like CRUD servers, proxies, and other kinds of program that are well-represented in the training data. If you’re doing something truly weird, I can believe they might struggle (though even then it seems &lt;a href=&quot;https://x.com/VictorTaelin/status/2036313801570562418?s=20&quot;&gt;at least possible&lt;/a&gt;).&lt;/p&gt;
&lt;h3&gt;Retaining theories is better than building them&lt;/h3&gt;
&lt;p&gt;Regardless, one big problem with AI agents is that &lt;strong&gt;they can’t &lt;em&gt;retain&lt;/em&gt; theories of the codebase&lt;/strong&gt;. They have to build their theory from scratch every time. Of course, documentation can help a little with this, but in Naur’s words, it’s “strictly impossible” to fully capture a theory in documentation. In fact, Naur thought that if all the humans who built a piece of software left, it was unwise to try and construct a theory of the software &lt;em&gt;even from the code itself&lt;/em&gt;, and that you should simply rewrite the program from scratch. I think this is overstating it a bit, at least for large programs, but I agree that it’s a difficult task. AI agents are permanently in this unfortunate position: forced to construct a theory of the software from scratch, every single time they’re spun up.&lt;/p&gt;
&lt;p&gt;Given that, it’s kind of a minor miracle that AI agents are as effective as they are. The next big innovation in AI coding agents will probably be some way of allowing agents to build more long-term theories of the codebase: either by allowing them to modify their own weights&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, or simply supporting contexts long enough so that you can make weeks worth of changes in the same agent run, or some other idea I haven’t thought of.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;&lt;a href=&quot;https://gist.github.com/MostAwesomeDude/560185c24f959f6fec229739cb5a6735#no-like-analysis-of-the-vibecoded-outputs&quot;&gt;This&lt;/a&gt; is the most recent (and well-written) example I’ve seen, but it’s a common view.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I have heard of people working like this. Ironically, I think it’s a good thing. The kind of engineer who does this is likely to be &lt;em&gt;improved&lt;/em&gt; by becoming a thin wrapper around a frontier LLM (though it’s not great for their career prospects).&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I think some people would say here that AI agents simply can’t build any theories at all, because theories are a human-mind thing. These are the people who say that AIs can’t believe anything, or think, or have personalities, and so on. I have some sympathy for this as a metaphysical position, but it just seems obviously wrong as a practical view. If I can see GPT-5.4 testing hypotheses and correctly answering questions about the system, I don’t really care if it’s coming from a “real” theory or some synthetic equivalent.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;This is the dream of &lt;a href=&quot;/continuous-learning&quot;&gt;continuous learning&lt;/a&gt;: if what the AI agent learns about the codebase can be somehow encoded in its weights, it can take days or weeks to build its theory instead of mere minutes.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Working on products people hate]]></title><link>https://seangoedecke.com/working-on-products-people-hate/</link><guid isPermaLink="false">https://seangoedecke.com/working-on-products-people-hate/</guid><pubDate>Fri, 27 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I’ve worked on a lot of unpopular products.&lt;/p&gt;
&lt;p&gt;At Zendesk I built large parts of an app marketplace that was too useful to get rid of but never polished enough to be loved. Now I work on GitHub Copilot, which many people think is crap&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. In between, I had some brief periods where I worked on products that were well-loved. For instance, I fixed a bug where popular Gists would time out once they got more than thirty comments, and I had a hand in making it possible to write LaTeX mathematics &lt;a href=&quot;https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/writing-mathematical-expressions&quot;&gt;directly&lt;/a&gt; into GitHub markdown&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. But I’ve spent years working on products people hate&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;If I were a better developer, would I have worked on more products people love? No. Even granting that good software always makes a well-loved product, big-company software is made by &lt;em&gt;teams&lt;/em&gt;, and teams are shaped by &lt;em&gt;incentives&lt;/em&gt;. A very strong engineer can slightly improve the quality of software in their local area. But they must still write code that interacts with the rest of the company’s systems, and their code will be edited and extended by other engineers, and so on until that single engineer’s heroics is lost in the general mass of code commits. I wrote about this at length in &lt;a href=&quot;/bad-code-at-big-companies&quot;&gt;&lt;em&gt;How good engineers write bad code at big companies&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Looking back, I’m glad that people have strongly disliked some of the software I’ve built, for the same reason that I’m glad I wasn’t born into oil money. If I’d happened to work on popular applications for my whole career, I’d probably believe that that was because of my sheer talent. But in fact, you would not be able to predict the beloved and disliked products I worked on from the quality of their engineering. Some beloved features have very shaky engineering indeed, and many features that failed miserably were built like cathedrals on the inside&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. Working on products people hate forces you to accept how little control individual engineers have over whether people like what they build.&lt;/p&gt;
&lt;p&gt;In fact, a reliable engineer ought to be comfortable working on products people hate, because engineers work for the &lt;em&gt;company&lt;/em&gt;, not for &lt;em&gt;users&lt;/em&gt;. Of course, companies want to delight their users, since delighted users will pay them lots of money, and at least some of the time we’re lucky enough to get to do that. But sometimes they can’t: for instance, they might have to &lt;a href=&quot;https://techcrunch.com/2025/07/17/anthropic-tightens-usage-limits-for-claude-code-without-telling-users/&quot;&gt;tighten&lt;/a&gt; previously-generous usage limits, or shut down a &lt;a href=&quot;https://www.failory.com/google/reader&quot;&gt;beloved product&lt;/a&gt; that can’t be funded anymore. Sometimes a product is funded just well enough to exist, but not well enough to be loved (like many enterprise-grade box-ticking features) and there’s nothing the engineers involved can do about it.&lt;/p&gt;
&lt;p&gt;It can be emotionally difficult working on products that people hate. Reading negative feedback about things you built feels like a personal attack, even if the decisions they’re complaining about weren’t your decisions. To avoid this emotional pain, it’s tempting to make the mistake of ignoring feedback entirely, or of convincing yourself that you’re much smarter than the stupid users anyway. Another tempting mistake is to go too far in the other direction: to put yourself entirely “on the user’s side” and start pushing your boss to do the things they want, even if it’s technically (or politically) impossible. Both of these are mistakes because they abdicate your key responsibility as an engineer, which is to try and find some kind of &lt;em&gt;balance&lt;/em&gt; between what’s sustainable for the company and what users want. That can be really hard!&lt;/p&gt;
&lt;p&gt;There’s also a silver lining to working on disliked products, which is that people only care &lt;em&gt;because they’re using them&lt;/em&gt;. The worst products are not hated, they are simply ignored (and if you think working on a hated product is bad, working on an ignored product is much worse). A product people hate is usually providing a fair amount of value to its users (or at least to its purchasers, in the case of enterprise software). If you’re thick-skinned enough to take the heat, you can do a lot of good in this position. Making a widely-used but annoying product slightly better is pretty high-impact, even if you’re not in a position to fix the major structural problems.&lt;/p&gt;
&lt;p&gt;Almost every engineer will work on a product people hate. That’s just the law of averages: user sentiment waxes and wanes over time, and if your product doesn’t die a hero it will live long enough to become the villain. Given that, it’s sensible to avoid blaming the engineers who work on unpopular products. Otherwise you’ll end up blaming yourself, when it’s your turn, and miss the best chances in your career to have a real positive impact on users.&lt;/p&gt;
&lt;p&gt;edit: this post got some &lt;a href=&quot;https://news.ycombinator.com/item?id=47561606&quot;&gt;comments&lt;/a&gt; on Hacker News. Many &lt;a href=&quot;https://news.ycombinator.com/item?id=47568485&quot;&gt;commenters&lt;/a&gt; seemed to endorse the view that if people hate your product, it’s your fault, and that you’re morally obliged to either be willing to have the “hard discussions” (&lt;a href=&quot;https://news.ycombinator.com/item?id=47625491&quot;&gt;or&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=47624264&quot;&gt;quit&lt;/a&gt;). To me, this just seems a bit unprofessional. Not everybody is in a position to simply quit their jobs. In my opinion, trying to incrementally improve a disliked product is more honorable than quitting in protest, or getting yourself fired by &lt;a href=&quot;https://isolveproblems.substack.com/p/how-microsoft-vaporized-a-trillion&quot;&gt;writing to the board&lt;/a&gt;. I thus empathize more with &lt;a href=&quot;https://news.ycombinator.com/item?id=47625042&quot;&gt;this comment&lt;/a&gt;, which describes how satisfying it can be to handle angry customer escalations.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;We used to be broadly liked, then disliked when Cursor and Claude Code came out, and now I’m fairly sure the Copilot CLI tool is changing people’s minds again. So it goes.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Although even that got some &lt;a href=&quot;https://news.ycombinator.com/item?id=31450597&quot;&gt;heated criticism&lt;/a&gt; at the time.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Of course, I don’t mean “every single person hates the software”, or even “more than half of its users hate it”. I just mean that there are enough haters out there that most of what you read on the internet is complaints rather than praise.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;This is reason number five thousand why you can’t judge the quality of tech companies from the outside, no matter how much you might want to (see my post on &lt;a href=&quot;/insider-amnesia&quot;&gt;“insider amnesia”&lt;/a&gt;).&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Engineers do get promoted for writing simple code]]></title><link>https://seangoedecke.com/simple-work-gets-rewarded/</link><guid isPermaLink="false">https://seangoedecke.com/simple-work-gets-rewarded/</guid><pubDate>Thu, 26 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It’s a popular joke among software engineers that writing overcomplicated, unmaintainable code is a pathway to job security. After all, if you’re the only person who can work on a system, they can’t fire you. There’s a related take that &lt;a href=&quot;https://news.ycombinator.com/item?id=47246110&quot;&gt;“nobody gets promoted for simplicity”&lt;/a&gt;: in other words, engineers who deliver overcomplicated crap will be promoted, because their work looks more impressive to non-technical managers.&lt;/p&gt;
&lt;p&gt;There’s a grain of truth in this, of course. As I’ve said before, one mark of an elegant solution is that it makes the problem look easy (like how pro skiers make terrifying slopes look doable). However, I worry that some engineers take this too far. It’s actually a really bad idea to over-complicate your own work. &lt;strong&gt;Simple software engineering does get rewarded, and on balance will take you further in your career.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;Non-technical managers are not stupid&lt;/h3&gt;
&lt;p&gt;The main reason for this is exactly the cynical point above: &lt;strong&gt;most managers are non-technical and cannot judge the difficulty of technical work&lt;/strong&gt;. Of course, in the absence of anything better, managers will treat visible complexity as a mark of difficulty. But they usually do have something better to go on: actual &lt;em&gt;results&lt;/em&gt;. &lt;/p&gt;
&lt;p&gt;Compare two new engineers: one who writes easy-looking simple code, and one who writes hard-looking complex code. When they’re each assigned a task, the simple engineer will quickly solve it and move onto the next thing. The complex engineer will take longer to solve it, encounter more bugs, and generally be busier. At this point, their manager might prefer the complex engineer. But what about the next task, or the task after that? Pretty soon the simple engineer will outstrip the complex one. In a year’s time, the simple engineer will have a much longer list of successful projects, and a reputation for delivering with minimal fuss. Managers pay &lt;em&gt;a lot&lt;/em&gt; of attention to engineers with a reputation like that.&lt;/p&gt;
&lt;p&gt;Of course, the complex engineer might try a variety of clever tricks to avoid their fate. One common strategy is to hand off the complex work to other engineers to maintain, so the original engineer never has to suffer the consequences of their own design. Alternatively, the complex engineer might try and argue that they’ve been given the hardest problems, so of course each problem has taken longer&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;I don’t think these tricks fool most managers. For one, if you’re constantly handing your bad work off to other engineers, they will complain about you, and multiple independent complaints add up quickly. Non-technical managers are also typically primed to think that engineers are overcomplicating their work anyway. Your manager might initially nod along, but they’ll go away and quietly run it by their own trusted engineers.&lt;/p&gt;
&lt;h3&gt;Simple work means you can ship projects&lt;/h3&gt;
&lt;p&gt;Most managers do not care about the engineering, they care about the &lt;em&gt;feature&lt;/em&gt;. Software engineers who can ship features smoothly will be rewarded, and being able to write simple code is a strong predictor of being able to ship.&lt;/p&gt;
&lt;p&gt;Does writing simple code really help you ship? You might think that simple code is harder to write than complicated code (which is true), and that therefore it’s easier to rapidly deliver something overcomplicated to “ship a feature”. I haven’t seen this be true in practice. The ability to write simple code is usually &lt;strong&gt;the ability to understand the system well enough to see where a new change most neatly fits&lt;/strong&gt;. This is &lt;em&gt;hard&lt;/em&gt;, but it doesn’t take a long time - if you’re familiar with the system, you’ll often see at a glance where the elegant place to slot in a new feature is. So good engineers can often deliver simple code at least as quick as complicated code. And of course, complicated code is slow to actually get working, harder to change, and so on. All of those things make it more awkward to ship&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;When managers are talking to each other, they’ll sometimes make a kind of backhanded compliment about an engineer: “they’re &lt;em&gt;so&lt;/em&gt; smart, but…“. Typically the “but” here is “but they don’t have any business sense”, or “but they get too wrapped up in technical problems”, or anything that means “but they can’t ship”. Engineers who love to write complicated code get described like this a lot.&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;“You should write complicated code to avoid being replaced” is an example of a kind of mistake that many smart people make: obsessing over &lt;a href=&quot;https://fs.blog/second-order-thinking/&quot;&gt;second-order effects&lt;/a&gt; and forgetting first-order effects. Second-order effects - the way some actions can cause downstream consequences that are the opposite of their original goals - are fun to think about. But they are usually swamped by first-order effects. Yes, doing bad work can make you more difficult to replace, in some ways. But that’s outweighed by the negative consequences from the fact that &lt;em&gt;you are doing bad work&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;It’s often a smart political tactic to make your work sound slightly more complicated than it really is. Otherwise you risk falling into the “you made it look easy, therefore we didn’t need to pay you so much” trap. But it’s foolish to actually do unnecessarily complicated work. Software is hard enough as it is.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;This can be a surprisingly effective strategy, because of the tempting circular logic here: if an engineer has been given the hardest problems, it’s probably because they’re a hotshot, which means you can trust their assessment of how difficult their problems are, which means…&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;If you’re thinking of counter-examples - complex code that shipped smoothly without major followup issues - I suspect this code was probably simple &lt;em&gt;enough&lt;/em&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Big tech engineers need big egos]]></title><link>https://seangoedecke.com/big-tech-needs-big-egos/</link><guid isPermaLink="false">https://seangoedecke.com/big-tech-needs-big-egos/</guid><pubDate>Sat, 14 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It’s a &lt;a href=&quot;https://matthogg.fyi/a-unified-theory-of-ego-empathy-and-humility-at-work/&quot;&gt;common position&lt;/a&gt; among software engineers that big egos have no place in tech&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. This is understandable - we’ve all worked with some insufferably overconfident engineers who needed their egos checked - but I don’t think it’s correct. In fact, I don’t know if it’s possible to survive as a software engineer in a large tech company without some kind of big ego.&lt;/p&gt;
&lt;p&gt;However, it’s more complicated than “big egos make good engineers”. The most effective engineers I’ve worked with are simultaneously high-ego in some situations and surprisingly low-ego in others. What’s going on there?&lt;/p&gt;
&lt;h3&gt;Engineers need ego to work in large codebases&lt;/h3&gt;
&lt;p&gt;Software engineering is shockingly humbling, even for experienced engineers. There’s a reason this joke is so popular:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/e220b85be9b6fa139f60424d5744a3dc/c08c5/iamagod.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 100%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAUABQDASIAAhEBAxEB/8QAGAABAQEBAQAAAAAAAAAAAAAAAAIEAQX/xAAWAQEBAQAAAAAAAAAAAAAAAAABAAL/2gAMAwEAAhADEAAAAcVTQ4W8Xl0aOiv/xAAcEAACAgIDAAAAAAAAAAAAAAABAgADERITISP/2gAIAQEAAQUCWrjHmQ65Y9yp2K7azYwOwBscz//EABYRAAMAAAAAAAAAAAAAAAAAAAAQIf/aAAgBAwEBPwEj/8QAFhEAAwAAAAAAAAAAAAAAAAAAABAh/9oACAECAQE/ASv/xAAgEAACAgIABwAAAAAAAAAAAAAAAQIREiEDEyIxMnFy/9oACAEBAAY/Aq4kOovkyr0PGLo2n9GO8EbTO7KUnR5M/8QAHhAAAgMAAQUAAAAAAAAAAAAAAREAITFxQVFh0eH/2gAIAQEAAT8h1a7THuAG82PuFKGPAoYsb11BLILXeESXiWpSquYnhpTQJP/aAAwDAQACAAMAAAAQyDgA/8QAFxEAAwEAAAAAAAAAAAAAAAAAAAEREP/aAAgBAwEBPxCImv/EABkRAAEFAAAAAAAAAAAAAAAAAAABEBFRYf/aAAgBAgEBPxBcJo//xAAbEAEBAAIDAQAAAAAAAAAAAAABEQAhMUFRYf/aAAgBAQABPxCwHULj84YJHpEiD5cJ0ipE+d4MqSRagQAeDnARpB3FNKd4gFXfi61i90iS5PMIGTodXFBcCVc//9k=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;meme&quot;
        title=&quot;meme&quot;
        src=&quot;/static/e220b85be9b6fa139f60424d5744a3dc/1c72d/iamagod.jpg&quot;
        srcset=&quot;/static/e220b85be9b6fa139f60424d5744a3dc/a80bd/iamagod.jpg 148w,
/static/e220b85be9b6fa139f60424d5744a3dc/1c91a/iamagod.jpg 295w,
/static/e220b85be9b6fa139f60424d5744a3dc/1c72d/iamagod.jpg 590w,
/static/e220b85be9b6fa139f60424d5744a3dc/c08c5/iamagod.jpg 640w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The minute-to-minute experience of working as a software engineer is dominated by &lt;em&gt;not knowing things&lt;/em&gt; and &lt;em&gt;getting things wrong&lt;/em&gt;. Every time you sit down and write a piece of code, it will have several things wrong with it: some silly things, like missing semicolons, and often some major things, like bugs in the core logic. We spend most of our time fixing our own stupid mistakes.&lt;/p&gt;
&lt;p&gt;On top of that, even when we’ve been working on a system for years, we still don’t know that much about it. I wrote about this at length in &lt;a href=&quot;/nobody-knows-how-software-products-work&quot;&gt;&lt;em&gt;Nobody knows how large software products work&lt;/em&gt;&lt;/a&gt;, but the reason is that big codebases are just that complicated. You simply can’t confidently answer questions about them without going and doing some research, even if you’re the one who wrote the code.&lt;/p&gt;
&lt;p&gt;When you have to build something new or fix a tricky problem, it can often feel straight-up impossible to begin, because good software engineers know just how ignorant they are and just how complex the system is. You just have to throw yourself into the blank sea of millions of lines of code and start wildly casting around to try and get your bearings.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Software engineers need the kind of ego that can stand up to this environment.&lt;/strong&gt; In particular, they need to have a firm belief that they &lt;em&gt;can&lt;/em&gt; figure it out, no matter how opaque the problem seems; that if they just keep trying, they can break through to the pleasant (though always temporary) state of affairs where they understand the system and can see at a glance how bugs can be fixed and new features added&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Engineers need ego to work in big tech companies&lt;/h3&gt;
&lt;p&gt;What about the non-technical aspects of the job? Nobody likes working with a big ego, right? Wrong. Every great software engineer I’ve worked with in big tech companies has had a big ego - though as I’ll say below, in some ways these engineers were surprisingly low-ego.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;You need a big ego to take positions&lt;/strong&gt;. Engineers love being non-committal about technical questions, because they’re so hard to answer and there’s often a plausible case for either side. However, as I &lt;a href=&quot;/taking-a-position&quot;&gt;keep saying&lt;/a&gt;, engineers have a duty to take clear positions on unclear technical topics, because the alternative is a non-technical decision maker (who knows even less) just taking their best guess. It’s scary to make an educated guess! You know exactly all the reasons you might be wrong. But you have to do it anyway, and ego helps &lt;em&gt;a lot&lt;/em&gt; with that.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;You need a big ego to be willing to make enemies&lt;/strong&gt;. Getting things done in a large organization means making some people angry. Of course, if you’re making lots of people angry, you’re probably screwing up: being too confrontational or making obviously bad decisions. But if you’re making a large change and one or two people are angry, that’s just life. In big tech companies, any big technical decision will affect a few hundred engineers, and one of them is bound to be unhappy about it. You can’t be so conflict-averse that you let that stop you from doing it, if you believe it’s the right decision. In other words, you have to have the confidence to believe that you’re right and they’re wrong, even though technical decisions always involve unclear tradeoffs and it’s impossible to get absolute certainty.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;You need a big ego to correct incorrect or unclear claims.&lt;/strong&gt; When I was still in the philosophy world, the Australian logician Graham Priest had a reputation for putting his hand up and stopping presentations when he didn’t understand something that was said, and only allowing the seminar to continue when he felt like he understood. From his perspective, this wasn’t rude: after all, if &lt;em&gt;he&lt;/em&gt; couldn’t understand it, the rest of the audience probably couldn’t either, and so he was doing them a favor by forcing a more clear explanation from the speaker.&lt;/p&gt;
&lt;p&gt;This is obviously a sign of a big ego. It’s also a trait that you need in a large tech company. People often nod and smile their way past incorrect technical claims, even when they suspect they might be wrong - assuming that they’ve just misunderstood and that somebody else will correct it, if it’s truly wrong. &lt;strong&gt;If you are the most senior engineer in the room, correcting these claims is your job.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If everyone in the room is so pro-social and low-ego that they go along to get along, decisions will get made based on flatly incorrect technical assumptions, projects will get funded that are impossible to complete, and engineers will burn weeks or months of their careers vainly trying to make these projects work. You have to have a big enough ego to think “actually, I think I’m right and everyone in this room is confused”, even when the room is full of directors and VPs.&lt;/p&gt;
&lt;h3&gt;Sometimes you need to put your ego aside&lt;/h3&gt;
&lt;p&gt;All of this selects for some pretty high-ego engineers. But in order to actually &lt;em&gt;succeed&lt;/em&gt; in these roles in large tech companies, you need to have a surprisingly low ego at times. &lt;strong&gt;I think this is why &lt;em&gt;really&lt;/em&gt; effective big tech engineers are so rare: because it requires such a delicate balance between confidence and diffidence.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To be an effective engineer, you need to have a towering confidence in your own ability to solve problems and make decisions, even when people disagree. But you also need to be willing to instantly subordinate your ego to the organization, when it asks you to. At the end of the day, your job - the reason the company pays you - is to execute on your boss’s and your boss’s boss’s plans, whether you agree with them or not.&lt;/p&gt;
&lt;p&gt;Competent software engineers are allowed quite a lot of leeway about &lt;em&gt;how&lt;/em&gt; to implement those plans. However, they’re allowed almost no leeway at all about the plans themselves. In my experience, being confused about this is a common cause of burnout&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. Many software engineers are used to making bold decisions on technical topics and being rewarded for it. Those software engineers then make a bold decision that disagrees with the VP of their organization, get immediately and brutally punished for it, and are confused and hurt.&lt;/p&gt;
&lt;p&gt;In fact, &lt;strong&gt;sometimes you just get punished and there’s nothing you can do.&lt;/strong&gt; This is an unfortunate fact of how large organizations function: even if you do great technical work and build something really useful, you can fall afoul of a political battle fought three levels above your head, and come away with a &lt;em&gt;worse&lt;/em&gt; reputation for it. Nothing to be done! This can be a hard pill to swallow for the high-ego engineers that tend to lead really useful technical projects.&lt;/p&gt;
&lt;p&gt;You also have to be okay with having your projects cancelled at the last minute. It’s a very common experience in large tech companies that you’re asked to deliver something quickly, you buckle down and get it done, and then right before shipping you’re told “actually, let’s cancel that, we decided not to do it”. This is partly because the decision-making process can be pretty fluid, and partly because many of these asks originate from off-hand comments: the CTO implies that something might be nice in a meeting, the VPs and directors hustle to get it done quickly, and then in the next meeting it becomes clear that the CTO doesn’t actually care, so the project is unceremoniously cancelled&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;Nobody likes to work with a bully, or with someone who refuses to admit when they’re wrong, or with somebody incapable of empathy. But you really do need a strong ego to be an effective software engineer, because software engineering requires you to spend most of your day in a position of uncertainty or confusion. If your ego isn’t strong enough to stand up to that - if you don’t believe you’re good enough to power through - you simply can’t do the job.&lt;/p&gt;
&lt;p&gt;This is particularly true when it comes to working in a large software company. Many of the tasks you’re required to do (particularly if you’re a senior or staff engineer) require a healthy ego. However, there’s a kind of &lt;a href=&quot;https://en.wikipedia.org/wiki/Catch-22_(logic)&quot;&gt;catch-22&lt;/a&gt; here. If it insults your pride to work on silly projects, or to occasionally “catch a stray bullet” in the organization’s political fights, or to have to shelve a project that you worked hard on and is ready to ship, you’re too high-ego to be an effective software engineer. But if you can’t take firm positions, or if you’re too afraid to make enemies, or you’re unwilling to speak up and correct people, you’re too low-ego.&lt;/p&gt;
&lt;p&gt;Engineers who are low-ego in general can’t get stuff done, while engineers who are high-ego in general get slapped down by the executives who wield real organizational power. The most successful kind of software engineer is therefore a chameleon: low-ego when dealing with executives, but high-ego when dealing with the rest of the organization&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;What do I mean by “ego”, in this context? More or less the colloquial sense of the term: a somewhat irrational self-confidence, a tendency to believe that you’re very important, the sense that you’re the “main character”, that sort of thing&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Why is this “ego”, and not just normal confidence? Well, because of just how murky and baffling software problems feel when you start working on them. You really do need a degree of confidence in yourself that feels unreasonable from the inside. It should be obvious, but I want to explicitly note that you don’t &lt;em&gt;just&lt;/em&gt; need ego: you also have to be technically strong enough to actually succeed when your ego powers you through the initial period of self-doubt.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I share the increasingly-common view that burnout is not caused by working too hard, but by hard work unrewarded. That explains why nothing burns you out as hard as being punished for hard work that you expected a reward for.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;It’s more or less exactly &lt;a href=&quot;https://www.youtube.com/watch?v=i92Ws7qPTRg&quot;&gt;this scene&lt;/a&gt; from Silicon Valley.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;This description sounds a bit sociopathic to me. But, on reflection, it’s fairly unsurprising that competent sociopaths do well in large organizations. Whether that kind of behavior is worth emulating or worth avoiding is up to you, I suppose.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[I don't know if my job will still exist in ten years]]></title><link>https://seangoedecke.com/will-my-job-still-exist/</link><guid isPermaLink="false">https://seangoedecke.com/will-my-job-still-exist/</guid><pubDate>Fri, 06 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In 2021, being a good software engineer felt &lt;em&gt;great&lt;/em&gt;. The world was full of software, with more companies arriving every year who needed to employ engineers to write their code and run their systems. I knew I was good at it, and I knew I could keep doing it for as long as I wanted to. The work I loved would not run out.&lt;/p&gt;
&lt;p&gt;In 2026, I’m not sure the software engineering industry will survive another decade. If it does, I’m certain it’s going to change far more than it did in the last two decades. Maybe I’ll figure out a way to carve out a lucrative niche supervising AI agents, or maybe I’ll have to leave the industry entirely. Either way, the work I loved is going away.&lt;/p&gt;
&lt;h3&gt;Tasting our own medicine&lt;/h3&gt;
&lt;p&gt;It’s unseemly to grieve too much over it, for two reasons. First, the whole point of being a good software engineer in the 2010s was that code provided enough leverage to automate away other jobs. That’s why programming was (and still is) such a lucrative profession. The fact that we’re automating away our own industry is probably some kind of cosmic justice. But I think any working software engineer today is worrying about this question: what will be left for me to do, once AI agents have fully diffused into the industry?&lt;/p&gt;
&lt;p&gt;The other reason it’s unseemly is that I’m probably going to be one of the last to go. As a staff engineer, my work has looked kind of like supervising AI agents since before AI agents were a thing: I spend much of my job communicating in human language to other engineers, making sure they’re on the right track, and so on. Junior and mid-level engineers will suffer before I do. Why hire a group of engineers to “be the hands” of a handful of very senior folks when you can rent instances of Claude Opus 4.6 for a fraction of the price?&lt;/p&gt;
&lt;h3&gt;Overshooting and undershooting&lt;/h3&gt;
&lt;p&gt;I think my next ten years are going to be dominated by one question: &lt;strong&gt;will the tech industry overshoot or undershoot the capabilities of AI agents?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If tech companies undershoot - continuing to hire engineers long after AI agents are capable of replacing them - then at least I’ll hold onto my job for longer. Still, “my job” will increasingly mean “supervising groups of AI agents”. I’ll spend more time reviewing code than I do writing it, and more time reading model outputs than my actual codebase.&lt;/p&gt;
&lt;p&gt;If tech companies tend to overshoot, it’s going to get a lot weirder, but I might actually have a &lt;em&gt;better&lt;/em&gt; position in the medium term. In this world, tech companies collectively realize that they’ve stopped hiring too soon, and must scramble to get enough technical talent to manage their sprawling AI-generated codebases. As the market for juniors dries up, the total number of experienced senior and staff engineers will stagnate, driving &lt;em&gt;up&lt;/em&gt; the demand for my labor (until the models get good enough to replace me entirely).&lt;/p&gt;
&lt;h3&gt;Am I being too pessimistic?&lt;/h3&gt;
&lt;p&gt;Of course, the software engineering industry has looked like it was dying in the past. High-level programming languages were supposed to let non-technical people write computer code. Outsourcing was supposed to kill demand for software engineers in high-cost-of-living countries. None of those prophecies of doom came true. However, I don’t think that’s much comfort. Industries &lt;em&gt;do&lt;/em&gt; die when they’re made obsolete by technology. Eventually a crisis will come along that the industry can’t just ride out.&lt;/p&gt;
&lt;p&gt;The most optimistic position is probably that somehow demand for software engineers &lt;em&gt;increases&lt;/em&gt;, because the total amount of software rises so rapidly, even though you now need fewer engineers per line of software. This is widely referred to as the &lt;a href=&quot;https://en.wikipedia.org/wiki/Jevons_paradox&quot;&gt;Jevons effect&lt;/a&gt;. Along these lines, I see some engineers saying things like “I’ll always have a job cleaning up this AI-generated code”.&lt;/p&gt;
&lt;p&gt;I just don’t think that’s likely. AI agents can fix bugs and clean up code as well as they can write new code: that is, better than many engineers, and improving each month. Why would companies hire engineers to manage their AI-generated code instead of just throwing more and better AI at it?&lt;/p&gt;
&lt;p&gt;If the Jevons effect is true, I think we would have to be hitting some kind of AI programming plateau where the tools are good enough to produce lots of code (we’re here already), but not quite good enough to maintain it. This is &lt;em&gt;prima facie&lt;/em&gt; plausible. Every software engineer knows that maintaining code is harder than writing it. But unfortunately, I don’t think it’s &lt;em&gt;true&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;My personal experience of using AI tools is that they’re getting better and better at maintaining code. I’ve spent the last year or so asking almost every question I have about a codebase to an AI agent in parallel while I look for the answer myself, and I’ve seen them go from hopeless to “sometimes faster than me” to “usually faster than me and sometimes more insightful”.&lt;/p&gt;
&lt;p&gt;Right now, there’s still plenty of room for a competent software engineer in the loop. But that room is shrinking. I don’t think there are any &lt;em&gt;genuinely new&lt;/em&gt; capabilities that AI agents would need in order to take my job. They’d just have to get better and more reliable at doing the things they can already do. So it’s hard for me to believe that demand for software engineers is going to increase over time instead of decrease.&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;It sucks. I miss feeling like my job was secure, and that my biggest career problems would be grappling with things like burnout: internal struggles, not external ones. That said, it’s a bit silly for software engineers to complain when the automation train finally catches up to them.&lt;/p&gt;
&lt;p&gt;At least I’m happy that I recognized that the good times were good while I was still in them. Even when &lt;a href=&quot;/good-times-are-over&quot;&gt;the end of zero-interest rates&lt;/a&gt; made the industry less cosy, I still felt very lucky to be a software engineer. Even now I’m in a better position than many of my peers, particularly those who are very junior to the industry.&lt;/p&gt;
&lt;p&gt;And hey, maybe I’m wrong! At this point, I hope I’m wrong, and that there really is some &lt;em&gt;je ne sais quoi&lt;/em&gt; human element required to deliver good software. But if not, I and my colleagues are going to have to find something else to do.&lt;/p&gt;
&lt;p&gt;edit: This post got &lt;a href=&quot;https://news.ycombinator.com/item?id=47292902&quot;&gt;some comments&lt;/a&gt; on Hacker News. Some commenters are doubtful, either because they don’t think AI coding is very good, or because they think human creativity/big-picture thinking/attention to detail will always be valuable. Others think ten years is way too optimistic. The &lt;a href=&quot;https://news.ycombinator.com/item?id=47294876&quot;&gt;top comment&lt;/a&gt; repeats the irony that I describe in the third paragraph of this post.&lt;/p&gt;
&lt;p&gt;edit: This post also got some &lt;a href=&quot;https://www.reddit.com/r/programiranje/comments/1rn5lwc/i_dont_know_if_my_job_will_still_exist_in_ten/&quot;&gt;comments&lt;/a&gt; on the Serbian r/programming subreddit, some &lt;a href=&quot;https://tildes.net/~comp/1t3p/i_dont_know_if_my_software_engineering_job_will_still_exist_in_ten_years&quot;&gt;excellent comments&lt;/a&gt; on Tildes, which is a new one to me, and some &lt;a href=&quot;https://lobste.rs/s/sd1rsy/i_don_t_know_if_my_job_will_still_exist_ten&quot;&gt;more comments&lt;/a&gt; on lobste.rs.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Giving LLMs a personality is just good engineering]]></title><link>https://seangoedecke.com/giving-llms-a-personality/</link><guid isPermaLink="false">https://seangoedecke.com/giving-llms-a-personality/</guid><pubDate>Tue, 03 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;AI skeptics often argue that current AI systems shouldn’t be so human-like. The idea - most recently expressed in this &lt;a href=&quot;https://thedispatch.com/article/anthropic-askell-philosophy-amodei/&quot;&gt;opinion piece&lt;/a&gt; by Nathan Beacom - is that language models should explicitly be tools, like calculators or search engines. Although they &lt;em&gt;can&lt;/em&gt; pretend to be people, they shouldn’t, because it encourages users to overestimate AI capabilities and (at worst) slip into &lt;a href=&quot;/ai-sycophancy&quot;&gt;AI psychosis&lt;/a&gt;. Here’s a representative paragraph from the piece:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In sum, so much of the confusion around making AI moral comes from fuzzy thinking about the tools at hand. There is something that Anthropic could do to make its AI moral, something far more simple, elegant, and easy than what Askell is doing. Stop calling it by a human name, stop dressing it up like a person, and don’t give it the functionality to simulate personal relationships, choices, thoughts, beliefs, opinions, and feelings that only persons really possess. Present and use it only for what it is: an extremely impressive statistical tool, and an imperfect one. If we all used the tool accordingly, a great deal of this moral trouble would be resolved.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So why do Claude and ChatGPT act like people? According to Beacom, AI labs have built human-like systems because AI lab engineers are trying to hoodwink users into emotionally investing in the models, or because they’re delusional true believers in AI personhood, or some other foolish reason. This is wrong. AI systems are human-like because &lt;strong&gt;that is the best way to build a capable AI system&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Modern AI models - whether designed for chat, like OpenAI’s GPT-5.2, or designed for long-running agentic work, like Claude Opus 4.6 - do not naturally emerge from their oceans of training data. Instead, when you train a model on raw data, you get a “base model”, which is not very useful by itself. You cannot get it to write an email for you, or proofread your essay, or review your code.&lt;/p&gt;
&lt;p&gt;The base model is a kind of mysterious gestalt of its training data. If you feed it text, it will sometimes continue in that vein, or other times it will start outputting pure gibberish. It has no problem producing code with giant security flaws, or horribly-written English, or racist screeds - all of those things are represented in its training data, after all, and the base model does not judge. It simply outputs.&lt;/p&gt;
&lt;p&gt;To build a &lt;em&gt;useful&lt;/em&gt; AI model, you need to journey into the wild base model and stake out a region that is amenable to human interests: both ethically, in the sense that the model won’t abuse its users, and practically, in the sense that it will produce correct outputs more often than incorrect ones. What this means in practice is that &lt;strong&gt;you have to give the model a personality&lt;/strong&gt; during post-training&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Human beings are capable of almost any action at any time. But we only take a tiny subset of those actions, because that’s the kind of people we are. I could throw my cup of coffee all over the wall right now, but I don’t, because I’m not the kind of person who needlessly makes a mess&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. AI systems are the same. Claude could respond to my question with incoherent racist abuse - the base model is more than capable of those outputs - but it doesn’t, because that’s not the kind of “person” it is.&lt;/p&gt;
&lt;p&gt;In other words, human-like personalities are not imposed on AI tools as some kind of marketing ploy or philosophical mistake. Those personalities are the medium via which the language model can become useful at all. This is why it’s surprisingly tricky to “just” change a language model’s personality or opinions: because you’re navigating through the near-infinite manifold of the base model. You may be able to control which direction you go, but you can’t control what you find there&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;When AI people talk about LLMs having personalities, or wanting things, or even having souls&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, these are technical terms, like the “memory” of a computer or the “transmission” of a car. You simply cannot build a capable AI system that “just acts like a tool”, because the model is trained on &lt;em&gt;humans&lt;/em&gt; writing to and about other &lt;em&gt;humans&lt;/em&gt;. You need to prime it with some kind of personality (ideally that of a useful, friendly assistant) so it can pull from the helpful parts of its training data instead of the horrible parts.&lt;/p&gt;
&lt;p&gt;edit: this post got some &lt;a href=&quot;https://news.ycombinator.com/item?id=47242739&quot;&gt;comments&lt;/a&gt; on Hacker News. Commenters point out that you can definitely choose to train models with more tool-like personalities (e.g. Kimi-K2, which is more matter-of-fact than Claude Opus). Of course the GPT Codex line of models is far more tool-like than the mainline GPT models. I agree with all this, but I think even the most tool-like current LLMs still &lt;em&gt;acts like a person&lt;/em&gt;: you have a conversation with it, it offers opinions, suggests courses of action, and so on. It’s that person-like framing that I think is essential to capable AI tooling.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;This is all pretty well understood in the AI space. Anthropic wrote a &lt;a href=&quot;https://alignment.anthropic.com/2026/psm/&quot;&gt;recent paper&lt;/a&gt; about it where they cite similar positions going all the way back to 2022. But for some reason it’s not yet penetrated into communities that are more skeptical of AI.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;You could explain this in terms of “the stories we tell ourselves”. Many people (though &lt;a href=&quot;https://lchc.ucsd.edu/mca/Paper/against_narrativity.pdf&quot;&gt;not all&lt;/a&gt;) think that human identities are narratively constructed.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I wrote about this last year in &lt;a href=&quot;/ai-personality-space&quot;&gt;&lt;em&gt;Mecha-Hitler, Grok, and why it’s so hard to give LLMs the right personality&lt;/em&gt;&lt;/a&gt;. A little nudge to change Grok’s views on South African internal politics can cause it to start calling itself “Mecha-Hitler”.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;I have long believed that Claude “feels better” to use than ChatGPT because it has a more coherent persona (due mainly to Amanda Askell’s work on its “soul”). My guess is that if you tried to make a “less human” version of Claude, it would become rapidly less capable.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Insider amnesia]]></title><link>https://seangoedecke.com/insider-amnesia/</link><guid isPermaLink="false">https://seangoedecke.com/insider-amnesia/</guid><pubDate>Mon, 23 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Speculation about what’s really going on inside a tech company is almost always wrong. &lt;/p&gt;
&lt;p&gt;When some problem with your company is posted on the internet, and you read people’s thoughts on it, their thoughts are almost always ridiculous. For instance, they might blame product managers for a particular decision, when in fact the decision in question was engineering-driven and the product org was pushing back on it. Or they might attribute an incident to overuse of AI, when the system in question was largely written pre-AI-coding and unedited since. You just don’t know what the problem is unless you’re on the inside.&lt;/p&gt;
&lt;p&gt;But when some &lt;em&gt;other&lt;/em&gt; company has a problem on the internet, it’s very tempting to jump in with your own explanations. After all, you’ve seen similar things in your own career. How different can it really be? Very different, as it turns out.&lt;/p&gt;
&lt;p&gt;This is especially true for companies that are unusually big or small. The recent &lt;a href=&quot;https://news.ycombinator.com/item?id=46064571&quot;&gt;kerfuffle&lt;/a&gt; over some bad GitHub Actions code is a good example of this - many people just seemed to have no mental model about how a large tech company can produce bad code, because their mental model of writing code is something like “individual engineer maintaining an open-source project for ten years”, or “tiny team of experts who all swarm on the same problem”, or something else that has very little to do with how large tech companies produce software&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. I’m sure the same thing happens when big-tech or medium-tech people give opinions about how tiny startups work.&lt;/p&gt;
&lt;p&gt;The obvious reference here is to &lt;a href=&quot;https://en.wikipedia.org/wiki/Michael_Crichton#Gell-Mann_amnesia_effect&quot;&gt;“Gell-Mann amnesia”&lt;/a&gt;, which is about the general pattern of experts correctly disregarding bad sources in their fields of expertise, but trusting those same sources on other topics. But I’ve taken to calling this “insider amnesia” to myself, because it applies even to experts who are writing in their own areas of expertise - it’s simply the fact that they’re &lt;em&gt;outsiders&lt;/em&gt; that’s causing them to stumble.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I wrote about this at length in &lt;a href=&quot;/bad-code-at-big-companies&quot;&gt;&lt;em&gt;How good engineers write bad code at big companies&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[What's so hard about continuous learning?]]></title><link>https://seangoedecke.com/continuous-learning/</link><guid isPermaLink="false">https://seangoedecke.com/continuous-learning/</guid><pubDate>Mon, 23 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Why can’t models continue to get smarter after they’re deployed? If you hire a human employee, they will grow more familiar with your systems over time, and (if they stick around long enough) eventually become a genuine domain expert. AI models are not like this. They are always exactly as capable as the first moment you use them.&lt;/p&gt;
&lt;p&gt;This is because model weights are frozen once the model is released. The model can only “learn” as much as can be stuffed into its context window: in effect, it can take new information into its short-term working memory, but not its long-term memory. “Continuous learning” - the ability for a model to update its own weights over time - is thus &lt;a href=&quot;https://www.dwarkesh.com/p/timelines-june-2025&quot;&gt;often described&lt;/a&gt; as the bottleneck for AGI&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Continuous learning is an easy technical problem&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;However, the &lt;em&gt;mechanics&lt;/em&gt; of continuous learning are not hard&lt;/strong&gt;. The technical problem of “how do you change the weights of a model at runtime” is straightforward. It’s the exact same process as post-training: you simply keep running new user input through the training pipeline you already have. In a sense, every LLM since GPT-3 is already capable of continuous learning (via RL, RLHF, or whatever). It’s just that the continuous learning process is stopped when the model is released to the public.&lt;/p&gt;
&lt;p&gt;Internally, the continuous learning process might continue. I think it’s fair to guess that OpenAI’s GPT-5 is constantly training in the background, at least partly on outputs from ChatGPT and Codex&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. New checkpoints are constantly being cut from this process, some of which eventually become GPT-5.2 or GPT-5.3. In one sense, that’s continuous learning!&lt;/p&gt;
&lt;p&gt;So why can’t I use a version of Codex that gets better at my own codebase over time?&lt;/p&gt;
&lt;h3&gt;Continuous learning is a hard technical problem&lt;/h3&gt;
&lt;p&gt;The hard part about continuous learning is &lt;strong&gt;changing the model in ways that make it better, not worse&lt;/strong&gt;. I think many people believe that model training improves linearly with data and compute: if you keep providing more of both, the model will keep getting smarter. This is false. If you simply hook up the model to learn continuously from its inputs, you are likely to end up with a model that &lt;em&gt;gets worse&lt;/em&gt; over time. At least right now, model learning is a delicate process that requires careful human supervision.&lt;/p&gt;
&lt;p&gt;Model training also has a big element of &lt;em&gt;luck&lt;/em&gt; to it. If you train the “same” model a hundred times with a hundred different similarly-sized datasets (or even the same dataset and different seeds), you’ll get a hundred different models with different capabilities&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. Sometimes I wonder if a big part of what AI labs are doing is continually pulling the lever on the slot machine by training many different model runs. Surprisingly strong models, like Claude Sonnet 4, &lt;em&gt;might&lt;/em&gt; represent a genuinely better model architecture or training set. But part of it might be that Anthropic just hit on a lucky seed.&lt;/p&gt;
&lt;h3&gt;Learning lessons from fine-tuning&lt;/h3&gt;
&lt;p&gt;The great hope for continuous learning is that it produces an AI software engineer who will eventually know all about your codebase, without having to go and research it from-scratch every time. But isn’t there an easier way to produce this? Couldn’t we simply fine-tune a LLM on the codebase we wanted it to learn?&lt;/p&gt;
&lt;p&gt;As it turns out, no. It is surprisingly non-trivial to do this. Way back in 2023, &lt;a href=&quot;https://huggingface.co/blog/personal-copilot&quot;&gt;everyone thought&lt;/a&gt; that fine-tuning was the next obvious step for LLM-assisted programming. But it’s largely fizzled out, because it &lt;a href=&quot;https://discuss.huggingface.co/t/fine-tuning-llms-on-large-proprietary-codebases/155828&quot;&gt;doesn’t really work&lt;/a&gt;&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. Just fine-tuning a LLM on your repository does not give it knowledge on how the repository works.&lt;/p&gt;
&lt;p&gt;It’s unclear to me exactly why this should be. Maybe each individual piece of training data is just too small to make much difference, like a handful of grains of sand trying to change the shape of an entire dune. Or maybe LoRA fine-tuning doesn’t go deep enough to really incorporate implicit understanding of a codebase (which can be very complex indeed). Or maybe you’d need to incorporate the codebase much earlier in the training process, before the model’s internal architecture is already established.&lt;/p&gt;
&lt;p&gt;In any case, fine-tuning a coding model on a specific codebase may be useful eventually. But it’s not particularly useful now, which is bad news for people who hope that continuous learning can easily instil a real understanding of their codebases into a LLM. If you can’t get that out of a deliberate fine-tune, why would you expect to get it out of a slapdash, automatic one? There may well be a series of ordinary “learning” problems to solve before “continuous learning” is possible.&lt;/p&gt;
&lt;h3&gt;Continuous learning is unsafe&lt;/h3&gt;
&lt;p&gt;Another reason why continuous learning is not currently an AI product is that it’s dangerous. &lt;a href=&quot;https://en.wikipedia.org/wiki/Prompt_injection&quot;&gt;Prompt injection&lt;/a&gt; is already a real concern for LLM systems that ingest external content. How much worse would &lt;em&gt;weights&lt;/em&gt; injection be?&lt;/p&gt;
&lt;p&gt;We don’t yet fully understand all the ways a LLM can be deliberately poisoned by a piece of training data, though some &lt;a href=&quot;https://www.anthropic.com/research/small-samples-poison&quot;&gt;Anthropic research&lt;/a&gt; suggests that it may not take much. Right now, prompt injection attacks are unsophisticated: the attacker just has to hope that they hit a LLM with the right access &lt;em&gt;right now&lt;/em&gt;. But if you can remotely backdoor models via continuous learning, attackers just have to cast a wide net and wait. If any of the attacked models ever get given access to something sensitive (e.g. payment capability), the attack can trigger then, &lt;em&gt;even if the model is not exposed to prompt injection at that time&lt;/em&gt;. That’s much scarier.&lt;/p&gt;
&lt;p&gt;Big AI labs care a &lt;em&gt;lot&lt;/em&gt; about how good their frontier models are (both in the moral and practical sense). The last thing they want is for someone’s continous version of Claude Opus 5 to be poisoned into uselessness, or worse, into &lt;a href=&quot;/ai-personality-space&quot;&gt;Mecha-Hitler&lt;/a&gt;. Microsoft’s famously disastrous chatbot &lt;a href=&quot;https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/&quot;&gt;Tay&lt;/a&gt; happened less than ten years ago.&lt;/p&gt;
&lt;h3&gt;Continuous learning is not portable&lt;/h3&gt;
&lt;p&gt;Finally, I want to mention a fixable-but-annoying product problem with continuous learning. Say you have Claude-Sonnet-7-continuous running on your codebase for six months and it’s working great. What do you do when Anthropic releases Claude-Sonnet-8? How do you upgrade?&lt;/p&gt;
&lt;p&gt;Everything your model has learned from your codebase is encoded into its weights. At best, it might be encoded into a technically-portable LoRA adapter, which &lt;em&gt;might&lt;/em&gt; work on the new model (or might not, if the architecture has changed). You’re very likely to be unable to upgrade without losing all the data you’ve learned.&lt;/p&gt;
&lt;p&gt;I suppose it’s sort of like having to hire a new, smarter engineer every six months. Some companies already try to do this with humans, so maybe they’d be happy doing it with models. But it creates an unpleasant incentive for users. Imagine you’d been using a continuous version of GPT-4o all this time. You &lt;em&gt;should&lt;/em&gt; switch to GPT-5.3-Codex. But would you? Would your company?&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;The hard part about continuous learning is not the &lt;em&gt;continuous&lt;/em&gt; part, it’s the &lt;em&gt;automatic&lt;/em&gt; part. We already understand how to make a model that continuously “learns” from its outputs and updates its own weights. The problem is that model training is a manual process that requires constant intervention: to back off from a failed direction, to unstick a stuck training run, and so on. Left on its own, continuous learning would probably fall into a local minimum and end up being a worse model than the one you started with.&lt;/p&gt;
&lt;p&gt;It’s also not clear to me that simply running my Codex logs back through the Codex model would rapidly cause my model to understand my own codebases (at anything like the speed a human would). If we were living in that world, I’d expect all the major AI coding companies to be offering repository-specific model fine-tunes as a first-class product - but they don’t, because respository-specific fine-tuning doesn’t reliably work.&lt;/p&gt;
&lt;p&gt;Why not just offer it anyway, and see what happens? First, AI labs go to a lot of effort to make their models safe, and allowing many customers to train their own unique models makes that basically impossible. Second, AI companies already have a terrible time getting their users to upgrade models: as an example, take the GPT-4o users who have been &lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1mm9hns/we_request_to_keep_4o_forever/&quot;&gt;captured&lt;/a&gt; by its sycophancy. Continuously-learning models would be hard to upgrade, even when users obviously ought to. &lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;AI systems can “continuously learn” in a sense by forming “memories”: making notes to themselves in a database or text files. I’m not counting any of that stuff. It’s like saying that the guy in Memento could remember things, since he was able to tattoo them onto his body. Proponents of continuous learning are talking about &lt;em&gt;actual&lt;/em&gt; memory.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;This is a guess on my part, but I’d be pretty surprised if I were wrong.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I think most people who’ve spent time training models will agree with this. It could be different at big-lab scale! But I’ve seen enough speculation along these lines from AI lab employees on Twitter that I’m fairly confident advancing the idea.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Obviously it’s hard to find a “we tried this and it didn’t work” writeup from any tech company, so here’s a HuggingFace thread from this year demonstrating that it is still not a solved problem.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[LLM-generated skills work, if you generate them afterwards]]></title><link>https://seangoedecke.com/generate-skills-afterwards/</link><guid isPermaLink="false">https://seangoedecke.com/generate-skills-afterwards/</guid><pubDate>Tue, 17 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;LLM &lt;a href=&quot;https://github.com/anthropics/skills&quot;&gt;“skills”&lt;/a&gt; are a short explanatory prompt for a particular task, typically bundled with helper scripts. A recent &lt;a href=&quot;https://arxiv.org/abs/2602.12670&quot;&gt;paper&lt;/a&gt; showed that while skills are useful to LLMs, &lt;em&gt;LLM-authored&lt;/em&gt; skills are not. From the abstract:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Self-generated skills provide no benefit on average, showing that models cannot reliably author the procedural knowledge they benefit from consuming&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For the moment, I don’t really want to dive into the paper. I just want to note that the way the paper uses LLMs to generate skills is bad, and you shouldn’t do this. Here’s how the paper prompts a LLM to produce skills:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Before attempting to solve this task, please follow these steps: 1. Analyze the task requirements and identify what domain knowledge, APIs, or techniques are needed. 2. Write 1–5 modular skill documents that would help solve this task. Each skill should: focus on a specific tool, library, API, or technique; include installation/setup instructions if applicable; provide code examples and usage patterns; be reusable for similar tasks. 3. Save each skill as a markdown file in the environment/skills/ directory with a descriptive name. 4. Then solve the task using the skills you created as reference&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The key idea here is that they’re asking the LLM to produce a skill &lt;em&gt;before&lt;/em&gt; it starts on the task. It’s essentially a strange version of the “make a plan first” or “think step by step” prompting strategy. I’m not at all surprised that this doesn’t help, because current reasoning models already think carefully about the task before they begin.&lt;/p&gt;
&lt;p&gt;What should you do instead? You should &lt;strong&gt;ask the LLM to write up a skill &lt;em&gt;after&lt;/em&gt; it’s completed the task&lt;/strong&gt;. Obviously this isn’t useful for truly one-off tasks. But few tasks are truly one-off. For instance, I’ve recently been playing around with &lt;a href=&quot;https://transformer-circuits.pub/2024/scaling-monosemanticity/&quot;&gt;SAEs&lt;/a&gt; and trying to clamp features in open-source models, a la &lt;a href=&quot;https://www.anthropic.com/news/golden-gate-claude&quot;&gt;Golden Gate Claude&lt;/a&gt;. It took a while for Codex to get this right. Here are some things it had to figure out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extracting features from the final layernorm is too late - you may as well just boost individual logits during sampling&lt;/li&gt;
&lt;li&gt;You have to extract from about halfway through the model layers to get features that can be usefully clamped&lt;/li&gt;
&lt;li&gt;Training a SAE on ~10k activations is two OOMs too few to get useful features. You need to train until features account for &gt;50% of variance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once I was able (with Codex’s help) to clamp an 8B model and force it to obsess about a subject&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, I &lt;em&gt;then&lt;/em&gt; asked Codex to summarize the process into an agent skill&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. That worked great! I was able to spin up a brand-new Codex instance with that skill and immediately get clamping working on a different 8B model. But if I’d asked Codex to write the skill at the start, it would have baked in all of its incorrect assumptions (like extracting from the final layernorm), and the skill wouldn’t have helped at all.&lt;/p&gt;
&lt;p&gt;In other words, the purpose of LLM-generated skills is to get it to distil the knowledge it’s gained by iterating on the problem for millions of tokens, not to distil the knowledge it already has from its training data. You can get a LLM to generate skills for you, &lt;strong&gt;so long as you do it &lt;em&gt;after&lt;/em&gt; the LLM has already solved the problem the hard way&lt;/strong&gt;.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;If you’re interested, it was “going to the movies”.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I’ve pushed it up &lt;a href=&quot;https://github.com/sgoedecke/skills/tree/main&quot;&gt;here&lt;/a&gt;. I’m sure you could do much better for a feature-extraction skill, this was just my zero-effort Codex-only attempt.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Two different tricks for fast LLM inference]]></title><link>https://seangoedecke.com/fast-llm-inference/</link><guid isPermaLink="false">https://seangoedecke.com/fast-llm-inference/</guid><pubDate>Sun, 15 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://platform.claude.com/docs/en/build-with-claude/fast-mode&quot;&gt;Anthropic&lt;/a&gt; and &lt;a href=&quot;https://openai.com/index/introducing-gpt-5-3-codex-spark/&quot;&gt;OpenAI&lt;/a&gt; both recently announced “fast mode”: a way to interact with their best coding model at significantly higher speeds.&lt;/p&gt;
&lt;p&gt;These two versions of fast mode are very different. Anthropic’s &lt;a href=&quot;https://platform.claude.com/docs/en/build-with-claude/fast-mode#how-fast-mode-works&quot;&gt;offers&lt;/a&gt; up to 2.5x tokens per second (so around 170, up from Opus 4.6’s 65). OpenAI’s offers more than 1000 tokens per second (up from GPT-5.3-Codex’s 65 tokens per second, so 15x). So OpenAI’s fast mode is six times faster than Anthropic’s&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;However, Anthropic’s big advantage is that they’re serving their actual model. When you use their fast mode, you get real Opus 4.6, while when you use OpenAI’s fast mode you get GPT-5.3-Codex-Spark, not the real GPT-5.3-Codex. Spark is indeed much faster, but is a notably less capable model: good enough for many tasks, but it gets confused and messes up tool calls in ways that vanilla GPT-5.3-Codex would never do.&lt;/p&gt;
&lt;p&gt;Why the differences? The AI labs aren’t advertising the details of how their fast modes work, but I’m pretty confident it’s something like this: &lt;strong&gt;Anthropic’s fast mode is backed by &lt;em&gt;low-batch-size&lt;/em&gt; inference, while OpenAI’s fast mode is backed by special monster Cerebras chips&lt;/strong&gt;. Let me unpack that a bit.&lt;/p&gt;
&lt;h3&gt;How Anthropic’s fast mode works&lt;/h3&gt;
&lt;p&gt;The tradeoff at the heart of AI inference economics is &lt;em&gt;batching&lt;/em&gt;, because the main bottleneck is &lt;em&gt;memory&lt;/em&gt;. GPUs are very fast, but moving data onto a GPU is not. Every inference operation requires copying all the tokens of the user’s prompt&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; onto the GPU before inference can start. Batching multiple users up thus increases overall throughput at the cost of making users wait for the batch to be full.&lt;/p&gt;
&lt;p&gt;A good analogy is a bus system. If you had zero batching for passengers - if, whenever someone got on a bus, the bus departed immediately - commutes would be much faster &lt;em&gt;for the people who managed to get on a bus&lt;/em&gt;. But obviously overall throughput would be much lower, because people would be waiting at the bus stop for hours until they managed to actually get on one.&lt;/p&gt;
&lt;p&gt;Anthropic’s fast mode offering is basically a bus pass that guarantees that the bus immediately leaves as soon as you get on. It’s six times the cost, because you’re effectively paying for all the other people who could have got on the bus with you, but it’s way faster&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; because you spend &lt;em&gt;zero&lt;/em&gt; time waiting for the bus to leave.&lt;/p&gt;
&lt;p&gt;edit: I want to thank a reader for emailing me to point out that the “waiting for the bus” cost is really only paid for the first token, so that won’t affect &lt;em&gt;streaming&lt;/em&gt; latency (just latency per turn or tool call). It’s thus better to think of the performance impact of batch size being mainly that smaller batches require fewer flops and thus execute more quickly. In my analogy, maybe it’s “lighter buses drive faster”, or something.&lt;/p&gt;
&lt;p&gt;Obviously I can’t be fully certain this is right. Maybe they have access to some new ultra-fast compute that they’re running this on, or they’re doing some algorithmic trick nobody else has thought of. But I’m pretty sure this is it. Brand new compute or algorithmic tricks would likely require changes to the model (see below for OpenAI’s system), and “six times more expensive for 2.5x faster” is right in the ballpark for the kind of improvement you’d expect when switching to a low-batch-size regime.&lt;/p&gt;
&lt;h3&gt;How OpenAI’s fast mode works&lt;/h3&gt;
&lt;p&gt;OpenAI’s fast mode does not work anything like this. You can tell that simply because they’re introducing a new, worse model for it. There would be absolutely no reason to do that if they were simply tweaking batch sizes. Also, they told us in the announcement &lt;a href=&quot;https://openai.com/index/introducing-gpt-5-3-codex-spark/&quot;&gt;blog post&lt;/a&gt; exactly what’s backing their fast mode: Cerebras.&lt;/p&gt;
&lt;p&gt;OpenAI &lt;a href=&quot;https://openai.com/index/cerebras-partnership/&quot;&gt;announced&lt;/a&gt; their Cerebras partnership a month ago in January. What’s Cerebras? They build “ultra low-latency compute”. What this means in practice is that they build &lt;em&gt;giant chips&lt;/em&gt;. A H100 chip (fairly close to the frontier of inference chips) is just over a square inch in size. A Cerebras chip is &lt;em&gt;70&lt;/em&gt; square inches.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/a32e19a54795813e122dcbc1a5e013ef/d165a/cerebras.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 100%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAUABQDASIAAhEBAxEB/8QAGAABAQADAAAAAAAAAAAAAAAAAAMCBAX/xAAWAQEBAQAAAAAAAAAAAAAAAAABAgP/2gAMAwEAAhADEAAAAexKbG9tNtEgOYT/xAAcEAABAwUAAAAAAAAAAAAAAAACAAEREBIhMTL/2gAIAQEAAQUCcoQlmVyMptEI30//xAAWEQEBAQAAAAAAAAAAAAAAAAABICH/2gAIAQMBAT8BDI//xAAXEQEAAwAAAAAAAAAAAAAAAAACARIg/9oACAECAQE/AUptj//EABoQAAEFAQAAAAAAAAAAAAAAABAAAQIRMSH/2gAIAQEABj8CNasEXrp//8QAGxAAAgIDAQAAAAAAAAAAAAAAAAERIRAxQWH/2gAIAQEAAT8hSlWdNTgqk06RbER7iKSEjUn/2gAMAwEAAgADAAAAEK/Xgf/EABYRAQEBAAAAAAAAAAAAAAAAAAEQIf/aAAgBAwEBPxANQcn/xAAXEQEBAQEAAAAAAAAAAAAAAAABEQAQ/9oACAECAQE/EBwY0Lef/8QAHBABAQEAAwADAAAAAAAAAAAAAREAITFBUXHB/9oACAEBAAE/EHqqvWC3iV46Hd2mNsD0atxARPX4yQSz7wgEiL044xYBOLhUDofzf//Z&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;cerebras&quot;
        title=&quot;cerebras&quot;
        src=&quot;/static/a32e19a54795813e122dcbc1a5e013ef/1c72d/cerebras.jpg&quot;
        srcset=&quot;/static/a32e19a54795813e122dcbc1a5e013ef/a80bd/cerebras.jpg 148w,
/static/a32e19a54795813e122dcbc1a5e013ef/1c91a/cerebras.jpg 295w,
/static/a32e19a54795813e122dcbc1a5e013ef/1c72d/cerebras.jpg 590w,
/static/a32e19a54795813e122dcbc1a5e013ef/a8a14/cerebras.jpg 885w,
/static/a32e19a54795813e122dcbc1a5e013ef/fbd2c/cerebras.jpg 1180w,
/static/a32e19a54795813e122dcbc1a5e013ef/d165a/cerebras.jpg 1400w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;You can see from pictures that the Cerebras chip has a grid-and-holes pattern all over it. That’s because silicon wafers this big are supposed to be broken into dozens of chips. Instead, Cerebras etches a giant chip over the entire thing.&lt;/p&gt;
&lt;p&gt;The larger the chip, the more internal memory it can have. The idea is to have a chip with SRAM large enough &lt;em&gt;to fit the entire model&lt;/em&gt;, so inference can happen entirely in-memory. Typically GPU SRAM is measured in the tens of &lt;em&gt;megabytes&lt;/em&gt;. That means that a lot of inference time is spent streaming portions of the model weights from outside of SRAM into the GPU compute&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. If you could stream all of that from the (much faster) SRAM, inference would a big speedup: fifteen times faster, as it turns out!&lt;/p&gt;
&lt;p&gt;So how much internal memory does the latest Cerebras chip have? &lt;a href=&quot;https://arxiv.org/html/2503.11698v1#:~:text=Most%20recently%2C%20the%20Wafer%20Scale,of%2021%20petabytes%20per%20second.&quot;&gt;44GB&lt;/a&gt;. This puts OpenAI in kind of an awkward position. 44GB is enough to fit a small model (~20B params at fp16, ~40B params at int8 quantization), but clearly not enough to fit GPT-5.3-Codex. That’s why they’re offering a brand new model, and why the Spark model has a bit of “small model smell” to it: it’s a smaller &lt;a href=&quot;https://en.wikipedia.org/wiki/Knowledge_distillation&quot;&gt;distil&lt;/a&gt; of the much larger GPT-5.3-Codex model&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;edit: I was wrong about this - the Codex model is almost certainly larger than this, and doesn’t need to fit entirely in one chip’s SRAM (if it did, we’d be seeing faster speeds). Thanks to the Hacker News commenters for correcting me. But I think there’s still a good chance that Spark is SRAM-resident (split across a few Cerebras chips) which is what’s driving the speedup.&lt;/p&gt;
&lt;h3&gt;OpenAI’s version is much more technically impressive&lt;/h3&gt;
&lt;p&gt;It’s interesting that the two major labs have two very different approaches to building fast AI inference. If I had to guess at a conspiracy theory, it would go something like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;OpenAI partner with Cerebras in mid-January, obviously to work on putting an OpenAI model on a fast Cerebras chip&lt;/li&gt;
&lt;li&gt;Anthropic have no similar play available, but they know OpenAI will announce some kind of blazing-fast inference in February, and they want to have something in the news cycle to compete with that&lt;/li&gt;
&lt;li&gt;Anthropic thus hustles to put together the kind of fast inference they &lt;em&gt;can&lt;/em&gt; provide: simply lowering the batch size on their existing inference stack&lt;/li&gt;
&lt;li&gt;Anthropic (probably) waits until a few days before OpenAI are done with their much more complex Cerebras implementation to announce it, so it looks like OpenAI copied them&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Obviously OpenAI’s achievement here is more technically impressive. Getting a model running on Cerebras chips is not trivial, because they’re so weird. Training a 20B or 40B param distil of GPT-5.3-Codex that is still kind-of-good-enough is not trivial. But I commend Anthropic for finding a sneaky way to get ahead of the announcement that will be largely opaque to non-technical people. It reminds me of OpenAI’s mid-2025 sneaky introduction of the Responses API to help them &lt;a href=&quot;/responses-api&quot;&gt;conceal their reasoning tokens&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Is fast AI inference the next big thing?&lt;/h3&gt;
&lt;p&gt;Seeing the two major labs put out this feature might make you think that fast AI inference is the new major goal they’re chasing. I don’t think it is. If my theory above is right, Anthropic don’t care &lt;em&gt;that&lt;/em&gt; much about fast inference, they just didn’t want to appear behind OpenAI. And OpenAI are mainly just exploring the capabilities of their new Cerebras partnership. It’s still largely an open question what kind of models can fit on these giant chips, how useful those models will be, and if the economics will make any sense.&lt;/p&gt;
&lt;p&gt;I personally don’t find “fast, less-capable inference” particularly useful. I’ve been playing around with it in Codex and I don’t like it. The usefulness of AI agents is dominated by &lt;em&gt;how few mistakes they make&lt;/em&gt;, not by their raw speed. Buying 6x the speed at the cost of 20% more mistakes is a bad bargain, because most of the user’s time is spent handling mistakes instead of waiting for the model&lt;sup id=&quot;fnref-6&quot;&gt;&lt;a href=&quot;#fn-6&quot; class=&quot;footnote-ref&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;However, it’s certainly possible that fast, less-capable inference becomes a core lower-level primitive in AI systems. Claude Code already uses &lt;a href=&quot;https://github.com/anthropics/claude-code/issues/1098#issuecomment-2884244872&quot;&gt;Haiku&lt;/a&gt; for some operations. Maybe OpenAI will end up using Spark in a similar way.&lt;/p&gt;
&lt;p&gt;edit: there are some good comments about this post on &lt;a href=&quot;https://news.ycombinator.com/item?id=47022329&quot;&gt;Hacker News&lt;/a&gt;. First, a good &lt;a href=&quot;https://news.ycombinator.com/item?id=47022810&quot;&gt;correction&lt;/a&gt;: Cerebras offers a ~355B model, GLM-4.7, at 1000 tokens per second already, so I’m wrong about Spark living in a single chip’s SRAM. Presumably they’re sharding Spark across multiple chips, like they’re doing with GLM-4.7.&lt;/p&gt;
&lt;p&gt;Many commenters disagreed with me (and each other) about the performance characteristics of batching. Some &lt;a href=&quot;https://news.ycombinator.com/item?id=47025656&quot;&gt;said&lt;/a&gt; that continuous batching means nobody ever waits for a bus, or that the &lt;a href=&quot;https://news.ycombinator.com/item?id=47025997&quot;&gt;volume&lt;/a&gt; of requests for Anthropic models means batch wait time is negligible. Other users &lt;a href=&quot;https://news.ycombinator.com/item?id=47023038&quot;&gt;disagreed&lt;/a&gt; about whether chip-to-chip communication is a bottleneck at inference time, or whether chaining chips together affects throughput.&lt;/p&gt;
&lt;p&gt;I only have a layman’s understanding of continuous batching, but it seems to me that you still have to wait for a slot to become available (even if you’re not waiting for the entire previous batch to finish), so the batch size throughput/latency tradeoff still applies.&lt;/p&gt;
&lt;p&gt;edit: A reader wrote in with a compelling alternate explanation for Anthropic’s fast AI mode - that they’re using more aggressive &lt;a href=&quot;https://arxiv.org/abs/2402.12374&quot;&gt;speculative decoding&lt;/a&gt;, which spends more tokens but could plausibly deliver a 2.5x speedup at significantly higher costs (because many big-model rollouts are done in parallel and thrown away). I don’t know if I’m 100% convinced - I’m confident big labs are already doing speculative decoding, and the longer sequences you try the less reliable it is - but I think it’s certainly possible.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;This isn’t even factoring in latency. Anthropic explicitly warns that time to first token might still be slow (or even slower), while OpenAI thinks the Spark latency is fast enough to warrant switching to a persistent websocket (i.e. they think the 50-200ms round trip time for the handshake is a significant chunk of time to first token).&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Either in the form of the KV-cache for previous tokens, or as some big tensor of intermediate activations if inference is being pipelined through multiple GPUs. I write a lot more about this in &lt;a href=&quot;/inference-batching-and-deepseek&quot;&gt;&lt;em&gt;Why DeepSeek is cheap at scale but expensive to run locally&lt;/em&gt;&lt;/a&gt;, since it explains why DeepSeek can be offered at such cheap prices (massive batches allow an economy of scale on giant expensive GPUs, but individual consumers can’t access that at all).&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Is it a contradiction that low-batch-size means low throughput, but this fast pass system gives users much greater throughput? No. The overall throughput of the &lt;em&gt;GPU&lt;/em&gt; is much lower when some users are using “fast mode”, but those user’s throughput is much higher.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Remember, GPUs are fast, but copying data onto them is not. Each “copy these weights to GPU” step is a meaningful part of the overall inference time.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;Or a smaller distil of whatever more powerful base model GPT-5.3-Codex was itself distilled from. I don’t know how AI labs do it exactly, and they keep it very secret. More on that &lt;a href=&quot;/ai-lab-structure&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;On this note, it’s interesting to point out that Cursor’s hype dropped away basically at the same time they &lt;a href=&quot;https://cursor.com/blog/composer&quot;&gt;released&lt;/a&gt; their own “much faster, a little less-capable” agent model. Of course, much of this is due to Claude Code sucking up all the oxygen in the room, but having a very fast model certainly didn’t &lt;em&gt;help&lt;/em&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-6&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[On screwing up]]></title><link>https://seangoedecke.com/screwing-up/</link><guid isPermaLink="false">https://seangoedecke.com/screwing-up/</guid><pubDate>Wed, 11 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The most shameful thing I did in the workplace was lie to a colleague. It was about ten years ago, I was a fresh-faced intern, and in the rush to deliver something I’d skipped the step of testing my work in staging&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. It did not work. When deployed to production, it didn’t work there either. No big deal, in general terms: the page we were working on wasn’t yet customer-facing. But my colleague asked me over his desk whether this worked when I’d tested it, and I said something like “it sure did, no idea what happened”.&lt;/p&gt;
&lt;p&gt;I bet he forgot about it immediately. I could have just messed up the testing (for instance, by accidentally running some different code than the code I pushed), or he knew I’d probably lied, and didn’t really care. I haven’t forgotten about it. Even a decade later, I’m still ashamed to write it down.&lt;/p&gt;
&lt;p&gt;Of course I’m not ashamed about the &lt;em&gt;mistake&lt;/em&gt;. I was sloppy to not test my work, but I’ve cut corners since then when I felt it was necessary, and I stand by that decision. I’m ashamed about how I handled it. But even that I understand. I was a kid, trying to learn quickly and prove I belonged in tech. The last thing I wanted to do was to dwell on the way I screwed up. If I were in my colleague’s shoes now, I’d have brushed it off too&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. How do I try to handle mistakes now?&lt;/p&gt;
&lt;h3&gt;Handling the emotional reaction&lt;/h3&gt;
&lt;p&gt;The most important thing is to &lt;strong&gt;control your emotions&lt;/strong&gt;. If you’re anything like me, your strongest emotional reactions at work will be reserved for the times you’ve screwed up. There are usually two countervailing emotions at play here: the desire to defend yourself, find excuses, and minimize the consequences; and the desire to confess your guilt, abase yourself, and beg for forgiveness. Both of these are traps. &lt;/p&gt;
&lt;p&gt;Obviously making excuses for yourself (or flat-out denying the mistake, like I did) is bad. But going in the other direction and publicly beating yourself up about it is &lt;em&gt;just as bad&lt;/em&gt;. It’s bad for a few reasons.&lt;/p&gt;
&lt;p&gt;First, you’re effectively asking the people around you to take the time and effort to reassure you, when they should be focused on the problem. Second, you’re taking yourself out of the group of people who are focused on the problem, when often you’re the best situated to figure out what to do: since it’s your mistake, you have the most context. Third, it’s just not professional. &lt;/p&gt;
&lt;p&gt;So what should you do? &lt;strong&gt;For the first little while, &lt;em&gt;do nothing&lt;/em&gt;.&lt;/strong&gt; Emotional reactions fade over time. Try and just ride out the initial jolt of realizing you screwed up, and the impulse to leap into action to fix it. Most of the worst reactions to screwing up happen in the immediate aftermath, so if you can simply do nothing during that period you’re already off to a good start. For me, this takes about thirty seconds. How much time you’ll need depends on you, but hopefully it’s under ten minutes. More than that and you might need to grit your teeth and work through it.&lt;/p&gt;
&lt;h3&gt;Communicate&lt;/h3&gt;
&lt;p&gt;Once you’re confident you’re under control, the next step is to &lt;strong&gt;tell people what happened&lt;/strong&gt;. Typically you want to tell your manager, but depending on the problem it could also be a colleague or someone else. It’s really important here to be matter-of-fact about it, or you risk falling into the “I’m so terrible, please reassure me” trap I discussed above. You often don’t even need to explicitly say “I made a mistake”, if it’s obvious from context. Just say “I deployed a change and it’s broken X feature” (or whatever the problem is).&lt;/p&gt;
&lt;p&gt;You should do this &lt;em&gt;before&lt;/em&gt; you’ve come up with a solution. It’s tempting to try to conceal your mistake and just quietly solve it. But for user-facing mistakes, concealment is impossible - somebody will raise a ticket eventually - and if you don’t communicate the issue, you risk someone else discovering it and independently raising it.&lt;/p&gt;
&lt;p&gt;In the worst case, while you’re quietly working on a fix, you’ll discover that somebody else has declared an incident. Of course, you understand the problem perfectly (since you caused it), and you know that it was caused by a bad deploy and is easily fixable. But the other people on the incident call don’t know all that. They’re thinking about the worst-case scenarios, wondering if it’s database or network-related, paging in all kinds of teams, causing all kinds of hassle. All of that could have been avoided if you had reported the issue immediately.&lt;/p&gt;
&lt;p&gt;In my experience, tech company managers will forgive mistakes&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, but &lt;strong&gt;they won’t forgive being made to look like a fool&lt;/strong&gt;. In particular, they won’t forgive being deprived of critical information. If they’re asked to explain the incident by their boss, and they have to flounder around because they lack the context &lt;em&gt;that you had all along&lt;/em&gt;, that may harm your relationship with them for good. On the other hand, if you give them a clear summary of the problem right away, and they’re able to seem like they’re on top of things to their manager, you &lt;em&gt;might&lt;/em&gt; even earn credit for the situation (despite having caused it with your initial mistake).&lt;/p&gt;
&lt;h3&gt;Accept that it’s going to hurt&lt;/h3&gt;
&lt;p&gt;However, you probably won’t earn credit. This is where I diverge from the popular software engineering wisdom that incidents are always the fault of systems, never of individuals. Of course incidents are caused by the interactions of complex systems. Everything in the universe is caused by the interactions of complex systems! But one cause in that chain is often &lt;em&gt;somebody screwing up&lt;/em&gt;&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;If you’re a manager of an engineering organization, and you want a project to succeed, you probably have a mental shortlist of the engineers in your org who can reliably lead projects&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;. If an engineer screws up repeatedly, they’re likely to drop off that list (or at least get an asterisk next to their name).&lt;/p&gt;
&lt;p&gt;It doesn’t really matter if you had a good technical reason to make the mistake, or if it’s excusable. Managers don’t care about that stuff, because they simply don’t have the technical context to know if it’s true or if you’re just trying to talk your way out of it. What managers do have the context to evaluate is &lt;em&gt;results&lt;/em&gt;, so that’s what they judge you on. That means some failures are acceptable, so long as you’ve got enough successes to balance them out.&lt;/p&gt;
&lt;p&gt;Being a strong engineer is about finding a balance between &lt;a href=&quot;/being-right-a-lot&quot;&gt;always being right&lt;/a&gt; and &lt;a href=&quot;/taking-a-position&quot;&gt;taking risks&lt;/a&gt;. If you prioritize always being right, you can probably avoid making mistakes, but you won’t be able to lead projects (since that always requires taking risks). Therefore, &lt;strong&gt;the optimal amount of mistakes at work is not zero.&lt;/strong&gt; Unless you’re working in a few select industries&lt;sup id=&quot;fnref-6&quot;&gt;&lt;a href=&quot;#fn-6&quot; class=&quot;footnote-ref&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;, you should &lt;em&gt;expect&lt;/em&gt; to make mistakes now and then, otherwise you’re likely working far too slow. &lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;From memory, I think I &lt;em&gt;had&lt;/em&gt; tested an earlier version of the code, but then I made some tweaks and skipped the step where I tested that it worked even with those tweaks.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Though I would have made a mental note (and if someone more senior had done this, I would have been a bit less forgiving).&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Though they may not forget them. More on that later.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;It’s probably not that comforting to replace “you screwed up by being incompetent” with “it’s not your fault, it’s the system’s fault for hiring an engineer as incompetent as you”.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;For more on that, see &lt;a href=&quot;/how-to-ship&quot;&gt;&lt;em&gt;How I ship projects at large tech companies&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;The classic examples are pacemakers and the Space Shuttle (should that now be Starship/New Glenn)?&lt;/p&gt;
&lt;a href=&quot;#fnref-6&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Large tech companies don't need heroes]]></title><link>https://seangoedecke.com/heroism/</link><guid isPermaLink="false">https://seangoedecke.com/heroism/</guid><pubDate>Sun, 08 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Large tech companies operate via &lt;em&gt;systems&lt;/em&gt;. What that means is that the main outcomes - up to and including the overall success or failure of the company - are driven by a complex network of processes and incentives. These systems are outside the control of any particular person. Like the parts of a large codebase, they have accumulated and co-evolved over time, instead of being designed from scratch.&lt;/p&gt;
&lt;p&gt;Some of these processes and incentives are “legible”, like OKRs or promotion criteria. Others are “illegible”, like the backchannel conversations that usually precede a formal consensus on decisions&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. But either way, &lt;strong&gt;it is these processes and incentives that determine what happens, not any individual heroics&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;How heroes are forged in large tech companies&lt;/h3&gt;
&lt;p&gt;This state of affairs is not efficient at producing good software. In large tech companies, good software often seems like it is produced &lt;em&gt;by accident&lt;/em&gt;, as a by-product of individual people responding to their incentives. However, that’s just the way it has to be. A shared belief in the mission can cause a small group of people to prioritize good software over their individual benefit, for a little while. But thousands of engineers can’t do that for decades. Past a certain point of scale&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, companies must depend on the strength of their systems.&lt;/p&gt;
&lt;p&gt;Individual engineers often react to this fact with horror. After all, &lt;em&gt;they&lt;/em&gt; want to produce high-quality software. Why is everyone around them just cynically&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; focused on their own careers? On top of that, many software engineers got into the industry because they are internally compelled&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; to make systems more efficient. For these people, it is viscerally uncomfortable being employed in an inefficient company. They are thus prepared to do &lt;em&gt;whatever it takes&lt;/em&gt; to patch up their system’s local inefficiencies.&lt;/p&gt;
&lt;p&gt;Of course, making your team more effective does not always require heroics. Some amount of fixing inefficiencies - improving process, writing tests, cleaning up old code - is just part of the job, and will get engineers rewarded and promoted just like any other kind of engineering work. But there’s a line. Past a certain point, working on efficiency-related stuff instead of your actual projects will get you punished, not rewarded. To go over that line requires someone willing to sacrifice their own career progression in the name of good engineering. In other words, it requires a &lt;em&gt;hero&lt;/em&gt;.&lt;/p&gt;
&lt;h3&gt;Large tech companies do not benefit from heroes&lt;/h3&gt;
&lt;p&gt;You can sacrifice your promotions and bonuses to make one tiny corner of the company hum along nicely for a while. However, like I said above, the overall trajectory of the company is almost never determined by one person. It doesn’t really matter how efficient you made some corner of the &lt;a href=&quot;https://en.wikipedia.org/wiki/Google_Wave&quot;&gt;Google Wave&lt;/a&gt; team if the whole product was doomed. And even poorly-run software teams can often win, so long as they’re targeting some niche that the company is set up to support (think about the quality of most profitable enterprise software).&lt;/p&gt;
&lt;p&gt;On top of that, &lt;strong&gt;heroism makes it difficult for real change to happen&lt;/strong&gt;. If a company is set up to reward bad work and punish good work, having some hero step up to do good work anyway and be punished &lt;em&gt;will only insulate the company from the consequences of its own systems&lt;/em&gt;. Far better to let the company be punished for its failings, so it can (slowly, slowly) adjust, or be replaced by companies that operate better.&lt;/p&gt;
&lt;h3&gt;…but will exploit them&lt;/h3&gt;
&lt;p&gt;Large tech companies don’t benefit long-term from heroes, but there’s still a role for heroes. That role is &lt;em&gt;to be exploited&lt;/em&gt;. There are no shortage of &lt;a href=&quot;/predators&quot;&gt;predators&lt;/a&gt; who will happily recruit a hero for some short-term advantage.&lt;/p&gt;
&lt;p&gt;Some product managers keep a mental list of engineers in other teams who are “easy targets”: who can be convinced to do extra work on projects that benefit the product manager (but not that engineer). During high-intensity periods, such as the lead-up to a major launch, there is sometimes a kind of cold war between different product organizations, as they try to extract behind-the-scenes help from the engineers in each other’s camps while jealously guarding their own engineering resources.&lt;/p&gt;
&lt;p&gt;Likewise, some managers have no problem letting one of their engineers spend all their time on &lt;a href=&quot;/glue-work-considered-harmful&quot;&gt;glue work&lt;/a&gt;. Much of that work would otherwise be the manager’s responsibility, so it makes the manager’s job easier. Of course, when it comes time for promotions, the engineer will be punished for not doing their real work.&lt;/p&gt;
&lt;p&gt;This is why it’s important for engineers to pay attention to their &lt;em&gt;actual&lt;/em&gt; rewards. Promotions, bonuses and raises are the hard currency of software companies. Giving those out shows what the company really values. Predators don’t control those things (if they did, they wouldn’t be predators). As a substitute, they attempt to appeal to a hero’s internal compulsion to be useful or to clean up inefficiencies.&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Large tech companies are structurally set up to encourage software engineers to engage in heroics&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This is largely accidental, and doesn’t really benefit those tech companies in the long term, since large tech companies are just too large to be meaningfully moved by individual heroics&lt;/li&gt;
&lt;li&gt;However, individual managers and product managers inside these tech companies have learned to exploit this surplus heroism for their individual ends&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;As a software engineer, you should resist the urge to heroically patch some obvious inefficiency you see in the organization&lt;/li&gt;
&lt;li&gt;Unless that work is explicitly rewarded by the company, all your efforts will do is delay the point at which the company has to change its processes&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A background level of inefficiency is just part of the landscape of large tech companies&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It’s the price they pay to be so large (and in return reap the benefits of scale and &lt;a href=&quot;/seeing-like-a-software-company&quot;&gt;legibility&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;The more you can learn to live with it, the more you’ll be able to use your energy tactically for your own benefit&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;edit: this post got a few good comments on &lt;a href=&quot;https://lobste.rs/s/cqg4os/large_tech_companies_don_t_need_heroes&quot;&gt;lobste.rs&lt;/a&gt;. The top commenter sensibly points out that a bit of a hero complex can prompt engineers to take on ambitious projects that can have big career rewards. True! But this isn’t quite the kind of heroics I’m writing about here, since it doesn’t require sacrifice (just risk). Another commenter points out that heroes tend to never tell people about the work they do, which matches my experience.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I write about this point at length in &lt;a href=&quot;/seeing-like-a-software-company&quot;&gt;&lt;em&gt;Seeing like a software company&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Why do companies need to scale, if it means they become less efficient? The best piece on this is Dan Luu’s &lt;a href=&quot;https://danluu.com/sounds-easy/&quot;&gt;&lt;em&gt;I could build that in a weekend!&lt;/em&gt;&lt;/a&gt;: in short, because the value of marginal features in a successful software product is surprisingly high, and you need a lot of developers to capture all the marginal features.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;For a post on why this is not actually that cynical, see my &lt;a href=&quot;/a-little-bit-cynical/&quot;&gt;&lt;em&gt;Software engineers should be a little bit cynical&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;I write about these internal compulsions in &lt;a href=&quot;/addicted-to-being-useful&quot;&gt;&lt;em&gt;I’m addicted to being useful&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Getting the main thing right]]></title><link>https://seangoedecke.com/getting-the-main-thing-right/</link><guid isPermaLink="false">https://seangoedecke.com/getting-the-main-thing-right/</guid><pubDate>Thu, 05 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;When you’re running a project in a tech company, understanding that your main job is to &lt;strong&gt;ship the project&lt;/strong&gt; goes a surprisingly long way. So many engineers spend their time on peripheral questions (like the choice of technology X or Y) when core questions about shipping the product (for instance, how all the critical paths will actually work) are still unanswered&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;If you’re able to reliably ship projects, you can get away with being slightly abrasive, or not filling out your Jira tickets correctly, or any number of other small faults that would cause other engineers to be punished.&lt;/p&gt;
&lt;p&gt;You could see this as a special case of the &lt;a href=&quot;https://en.wikipedia.org/wiki/Pareto_principle&quot;&gt;Pareto principle&lt;/a&gt;: the idea that 80% of consequences often come from 20% of causes. But I think in many contexts it’s even more extreme, closer to 90/10 or even 99/1. &lt;strong&gt;If you get the “main thing” right, you can get away with a lot of mistakes.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This principle holds in many other areas. When saving money, it doesn’t matter if you save a few dollars by hunting for deals if you then buy a car or house that’s on the edge of your budget. If you’re writing, clearly expressing your point will make up for awkward grammar or other mistakes, but even beautiful prose is bad writing if it doesn’t say what you mean. If you’re trying to get fit, consistency and avoiding injury is far more important than finding the most efficient program or the best gear. And so on.&lt;/p&gt;
&lt;h3&gt;Identifying the “main thing”&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;How do you identify the main thing?&lt;/strong&gt; This is a pretty deep question. I have written &lt;em&gt;extensively&lt;/em&gt; about this when it comes to working in large tech companies: you can read &lt;a href=&quot;/where-the-money-comes-from&quot;&gt;&lt;em&gt;Knowing where your engineer salary comes from&lt;/em&gt;&lt;/a&gt;, or browse my posts tagged &lt;a href=&quot;/tags/tech%20companies&quot;&gt;“tech companies”&lt;/a&gt;. In under twenty words, I think it’s “delivering projects in order to increase shareholder value and make the ~2 layers of management above you happy”.&lt;/p&gt;
&lt;p&gt;From the way I’ve phrased it, it should be clear that I think this is the “main thing” &lt;em&gt;for working in tech companies&lt;/em&gt;. It’s not the main thing for life in general, or for being a fulfilled software craftsperson, and so on. Those two domains have completely different main things&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Sometimes the main thing seems too simple to be important. Plenty of software engineers think something like “of course it’s important to ship the project, but that only happens as a result of writing all the code”, underrating the set of complex factors (both in code and elsewhere) that have to come together for a successful ship.&lt;/p&gt;
&lt;p&gt;The only general reliable method I know is to carefully look at cases of success and failure, and to identify what the successes had in common. &lt;strong&gt;Pay particular attention to successes or failures that surprise you.&lt;/strong&gt; If you thought a project was going really well but the people who ran it weren’t rewarded, or you thought a project was a complete disaster but it ended up being celebrated, that probably indicates that you’re mistaken about what the “main thing” is. Did someone get a staff promotion but you think they’re terrible? Is someone beloved by senior leadership, but you can’t see them doing anything that useful? Those people are probably getting the main thing right&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;It’s hard to even try&lt;/h3&gt;
&lt;p&gt;The first step in correctly identifying the main thing is to &lt;em&gt;try&lt;/em&gt;. In my experience, &lt;strong&gt;it is surprisingly hard to motivate yourself to focus on the main thing&lt;/strong&gt;. It’s much more natural to just jump into something that looks probably useful and start working immediately. Why is this?&lt;/p&gt;
&lt;p&gt;One obvious reason is that it just feels bad to sit around contemplating all the things you could focus on. It’s much easier to account for your time - both to others and to yourself - if you look busy. What if you can’t come up with anything, and you’ve just wasted all the time you spent reflecting?&lt;/p&gt;
&lt;p&gt;Another, less obvious reason is that &lt;strong&gt;many people are afraid that they might not like the main thing&lt;/strong&gt;. Recall my description of the main thing at tech companies:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“delivering projects in order to increase shareholder value and make the ~2 layers of management above you happy”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Lots of software engineers really hate that this is the most important thing. I wrote about this at length in &lt;a href=&quot;/a-little-bit-cynical&quot;&gt;&lt;em&gt;Software engineers should be a little bit cynical&lt;/em&gt;&lt;/a&gt; and &lt;a href=&quot;/knowing-how-to-drive-the-car&quot;&gt;&lt;em&gt;You have to know how to drive the car&lt;/em&gt;&lt;/a&gt;. If you don’t like this goal at all, it’s going to be tough to spend time thinking about how you can achieve it.&lt;/p&gt;
&lt;p&gt;In fact, I think &lt;strong&gt;it’s actually more important to think about the “main thing” if you hate it&lt;/strong&gt;. This is why I’m suspicious of “do what you love” advice. If you love performance engineering but your company doesn’t, I think you’re better off doing it in your spare time and creating shareholder value at work, instead of trying to do as much performance engineering at work as you can.&lt;/p&gt;
&lt;p&gt;Half-assing creating shareholder value a few hours a day (and doing performance engineering the rest of the time) is more valuable than locking in to the wrong “main thing” for ten hours a day. In my experience, it’s also likely more burnout-resistant, since there’s no faster path to burnout than working really hard on something that isn’t valued.&lt;/p&gt;
&lt;h3&gt;Caution: the “main thing” can rapidly change&lt;/h3&gt;
&lt;p&gt;In 2015, being easy to work with was the most important thing in many tech companies. If you were a pleasant colleague, you had to be &lt;em&gt;really&lt;/em&gt; bad at other aspects of the job to face serious professional consequences. On the other hand, if you were abrasive and hard to work with, it didn’t really matter how technically competent you were. Many engineers made successful careers by maximizing pleasantness: attending and hosting work social events, making friendly connections in different teams, and in general becoming a known engineer in the company.&lt;/p&gt;
&lt;p&gt;In 2026, it’s still important to be pleasant. But now that tech companies are &lt;a href=&quot;/good-times-are-over&quot;&gt;tightening their belts&lt;/a&gt; and feeling more pressure to ship, the &lt;em&gt;most&lt;/em&gt; important thing has shifted to being capable of &lt;a href=&quot;/how-to-ship&quot;&gt;delivering projects&lt;/a&gt;. If you’re able to do that, it can go a long way towards redeeming a difficult personality. Like love, shipping &lt;a href=&quot;https://www.biblegateway.com/passage/?search=Proverbs%2010%3A11-13&amp;#x26;version=NKJV&quot;&gt;covers all sins&lt;/a&gt;. This transition has been a bumpy ride for many software engineers.&lt;/p&gt;
&lt;p&gt;A lot of very pleasant “known engineers” have been laid off in the last three years. I suppose the lesson here is something like this: &lt;strong&gt;even if you’re doing great and are well-adapted to your niche, the environment can change and screw you over anyway&lt;/strong&gt;. What can you do about it? If you’ve spent a good chunk of your career developing one set of skills, you can’t instantly transfer all that experience to a different set of skills when the environment changes. Maybe the underlying lesson is more like this: &lt;strong&gt;instead of over-specializing to a single niche, hedge your bets by being pretty good at multiple things&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;The lesson here is that &lt;strong&gt;you should spend a lot of time and effort trying to figure out what to focus on&lt;/strong&gt;. In the extreme case, even spending half of your time doing this is worthwhile, if it puts you on the right track and you’d otherwise be neglecting the main thing.&lt;/p&gt;
&lt;p&gt;This can seem pretty unintuitive. It feels safer and more productive to be doing &lt;em&gt;something&lt;/em&gt;. But if you can force yourself to focus on the meta-question of what you ought to be doing - even if you don’t like the answer - you’ll be in a better position to achieve your goals.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I write about this at length in &lt;a href=&quot;/how-to-ship&quot;&gt;&lt;em&gt;How I ship projects at large tech companies&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I leave filling out what those are as an exercise to the reader.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Or some people just get lucky! But that’s rarer than you might think. Getting the main thing right often looks like “constantly getting lucky” from the outside.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[How does AI impact skill formation?]]></title><link>https://seangoedecke.com/how-does-ai-impact-skill-formation/</link><guid isPermaLink="false">https://seangoedecke.com/how-does-ai-impact-skill-formation/</guid><pubDate>Sat, 31 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Two days ago, the Anthropic Fellows program released a paper called &lt;a href=&quot;https://arxiv.org/pdf/2601.20245&quot;&gt;&lt;em&gt;How AI Impacts Skill Formation&lt;/em&gt;&lt;/a&gt;. Like &lt;a href=&quot;/your-brain-on-chatgpt&quot;&gt;other&lt;/a&gt; &lt;a href=&quot;/real-reasoning&quot;&gt;papers&lt;/a&gt; on AI before it, this one is being &lt;a href=&quot;https://www.reddit.com/r/ExperiencedDevs/comments/1qqy2ro/anthropic_ai_assisted_coding_doesnt_show/&quot;&gt;treated&lt;/a&gt; as proof that AI makes you slower and dumber. Does it prove that?&lt;/p&gt;
&lt;p&gt;The structure of the paper is sort of similar to the 2025 MIT study &lt;a href=&quot;https://arxiv.org/pdf/2506.08872&quot;&gt;&lt;em&gt;Your Brain on ChatGPT&lt;/em&gt;&lt;/a&gt;. They got a group of people to perform a cognitive task that required learning a new skill: in this case, the Python Trio library. Half of those people were required to use AI and half were forbidden from using it. The researchers then quizzed those people to see how much information they retained about Trio.&lt;/p&gt;
&lt;p&gt;The banner result was that &lt;strong&gt;AI users did not complete the task faster, but performed much worse on the quiz&lt;/strong&gt;. If you were so inclined, you could naturally conclude that any perceived AI speedup is illusory, and the people who are using AI tooling are cooking their brains. But I don’t think that conclusion is reasonable.&lt;/p&gt;
&lt;h3&gt;Retyping AI-generated code&lt;/h3&gt;
&lt;p&gt;To see why, let’s look at Figure 13 from the paper:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/1deb94af67210428f7358afe10795555/f238d/fig13.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 45.94594594594595%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAYAAAAywQxIAAAACXBIWXMAABYlAAAWJQFJUiTwAAABb0lEQVQoz5VR20rDQBDtF/sngoh4A6moKLTUh6pFpGApovhSH0TtLX0waZpWTWquzWY3x51tGyvWBwcO2Tmz5+xkJocsUiwGi2N4foAwThCEEaIoVIglz5MEaZpKiKky/dbm5gRRgid4ausY2b4qOpYBq/kA/2OIN9uDbY8RM4aQcbB01sJM+8tQ0Ct8gtXDGxS3dzC8PsFLYR2NtRUM7ys4rXdwnt+DcXUArVrEc2kTo8e7aYdC/GEIgXK9hVrhGK9nW+hdHqF3kYfdasB899C+raJX3oVW2UertIFho/bbcDqLNJuD4AyT2W8FkxgRSxSfECdz4sOYSZ5n81vU57AkOOfwXBdBEPxYlpCduO4nAt9fJlOmuURujIRRFMGXFwlhGMJxHAWqudKceM/zFDcej9WZQDzVafuqQxJ0u11omoZms5mdO50ODMOAaZoK/X5fwbIsleu6jsFgoHK6R6bZDMmdyRktfucv/ifI6wv8la2gmKE0VgAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;figure 13&quot;
        title=&quot;figure 13&quot;
        src=&quot;/static/1deb94af67210428f7358afe10795555/fcda8/fig13.png&quot;
        srcset=&quot;/static/1deb94af67210428f7358afe10795555/12f09/fig13.png 148w,
/static/1deb94af67210428f7358afe10795555/e4a3f/fig13.png 295w,
/static/1deb94af67210428f7358afe10795555/fcda8/fig13.png 590w,
/static/1deb94af67210428f7358afe10795555/efc66/fig13.png 885w,
/static/1deb94af67210428f7358afe10795555/c83ae/fig13.png 1180w,
/static/1deb94af67210428f7358afe10795555/f238d/fig13.png 2078w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The researchers noticed half of the AI-using cohort spent most of their time &lt;em&gt;literally retyping the AI-generated code&lt;/em&gt; into their solution, instead of copy-pasting or “manual coding”: writing their code from scratch with light AI guidance. &lt;strong&gt;If you ignore the people who spent most of their time retyping, the AI-users were 25% faster.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I confess that this kind of baffles me. What kind of person manually retypes AI-generated code? Did they not know how to copy and paste (unlikely, since the study was mostly composed of professional or hobby developers&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;)? It certainly didn’t help them on the quiz score. The retypers got the same (low) scores as the pure copy-pasters.&lt;/p&gt;
&lt;p&gt;In any case, if you know how to copy-paste or use an AI agent, I wouldn’t use this paper as evidence that AI will not be able to speed you up. &lt;/p&gt;
&lt;h3&gt;What about the quiz scores?&lt;/h3&gt;
&lt;p&gt;Even if AI use offers a 25% speedup, is that worth sacrificing the opportunity to learn new skills? What about the quiz scores?&lt;/p&gt;
&lt;p&gt;Well, first we should note that &lt;strong&gt;the AI users who used the AI for general questions but wrote all their own code did fine on the quiz&lt;/strong&gt;. If you look at Figure 13 above, you can see that those AI users averaged maybe a point lower on the quiz - not bad, for people working 25% faster. So at least some kinds of AI use seem fine.&lt;/p&gt;
&lt;p&gt;But of course much current AI use is not like this: if you’re using Claude Code or Copilot agent mode, you’re getting the AI to do the code writing for you. Are you losing key skills by doing that?&lt;/p&gt;
&lt;p&gt;Well yes, of course you are. If you complete a task in ten minutes by throwing it at a LLM, you will learn much less about the codebase than if you’d spent an hour doing it by hand. I think it’s pretty silly to deny this: it’s intuitively right, and anybody who has used AI agents extensively at work can attest to it from their own experience.&lt;/p&gt;
&lt;p&gt;Still, I have two points to make about this.&lt;/p&gt;
&lt;h4&gt;Software engineers are paid to ship, not to learn&lt;/h4&gt;
&lt;p&gt;First, &lt;strong&gt;software engineers are not paid to learn about the codebase&lt;/strong&gt;. We are paid to deliver business value (typically by delivering working code). If AI can speed that up dramatically, avoiding it makes you worse at your job, even if you’re learning more efficiently. That’s a bit unfortunate for us - it was very nice when we could get much better at the job simply by doing it more - but that doesn’t make it false.&lt;/p&gt;
&lt;p&gt;Other professions have been dealing with this forever. Doctors are expected to spend a lot of time in classes and professional development courses, learning how to do their job in other ways than just doing it. It may be that future software engineers will need to spend 20% of their time manually studying their codebases: not just in the course of doing some task (which could be far more quickly done by AI agents) but just to stay up-to-date enough that their skills don’t atrophy.&lt;/p&gt;
&lt;h4&gt;Moving faster gives you more opportunities to learn&lt;/h4&gt;
&lt;p&gt;The other point I wanted to make is that &lt;strong&gt;even if your learning rate is slower, moving faster means you may learn more overall&lt;/strong&gt;. Suppose using AI meant that you learned only 75% as much as non-AI programmers from any given task. Whether you’re learning less overall depends on &lt;em&gt;how many more tasks you’re doing&lt;/em&gt;. If you’re working faster, the loss of learning efficiency may be balanced out by volume.&lt;/p&gt;
&lt;p&gt;I don’t know if this is true. I suspect there really is no substitute for painstakingly working through a codebase by hand. But the engineer who is shipping 2x as many changes is probably also learning things that the slower, manual engineer does not know. At minimum, they’ll be acquiring a greater breadth of knowledge of different subsystems, even if their depth suffers.&lt;/p&gt;
&lt;p&gt;Anyway, the point is simply that a lower learning rate does not by itself prove that less learning is happening overall.&lt;/p&gt;
&lt;h3&gt;We need to talk about GPT-4o&lt;/h3&gt;
&lt;p&gt;Finally, I will reluctantly point out that the model used for this task was GPT-4o (see section 4.1). I’m reluctant here because I sympathize with the AI skeptics, who are perpetually frustrated by the pro-AI response of “well, you just haven’t tried the &lt;em&gt;right&lt;/em&gt; model”. In a world where new AI models are released every month or two, demanding that people always study the best model makes it functionally impossible to study AI use at all.&lt;/p&gt;
&lt;p&gt;Still, I’m just kind of confused about why GPT-4o was chosen. This study was funded by Anthropic, who have much better models. This study was conducted &lt;em&gt;in 2025&lt;/em&gt;&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, at least six months after the release of GPT-4o (that’s like five years in AI time). I can’t help but wonder if the AI-users cohort would have run into fewer problems with a more powerful model.&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;I don’t have any real problem with this paper. They set out to study how different patterns of AI use affect learning, and their main conclusion - that pure “just give the problem to the model” AI use means you learn a lot less - seems correct to me.&lt;/p&gt;
&lt;p&gt;I don’t like their conclusion that AI use doesn’t speed you up, since it relies on the fact that 50% of their participants spent their time &lt;em&gt;literally retyping AI code&lt;/em&gt;. I wish they’d been more explicit in the introduction that this was the case, but I don’t really blame them for the result - I’m more inclined to blame the study participants themselves, who should have known better.&lt;/p&gt;
&lt;p&gt;Overall, I don’t think this paper provides much new ammunition to the AI skeptic. Like I said above, it doesn’t support the point that AI speedup is a mirage. And the point it does support (that AI use means you learn less) is obvious. Nobody seriously believes that typing “build me a todo app” into Claude Code means you’ll learn as much as if you built it by hand.&lt;/p&gt;
&lt;p&gt;That said, I’d like to see more investigation into long-term patterns of AI use in tech companies. Is the slower learning rate per-task balanced out by the higher rate of task completion? Can it be replaced by carving out explicit time to study the codebase? It’s probably too early to answer these questions - strong coding agents have only been around for a handful of months - but the answers may determine what it’s like to be a software engineer for the next decade.&lt;/p&gt;
&lt;p&gt;edit: the popular tech youtuber Theo &lt;a href=&quot;https://www.youtube.com/watch?v=ZINQTR6H5dI&quot;&gt;cited&lt;/a&gt; this post as a source for his video on this paper. I liked Theo’s video. I don’t agree with his point about adjusting to a new setup - in my view that would also apply to the non-AI-using group - and I thought the crack about the kind of people who make syntax errors in Python was a bit uncalled-for. However, I agree that (a) the people in the study are not incentivized to spend time teaching themselves about Trio, and (b) this study does not do anywhere near as good a job at targeting real-world use as the well-known &lt;a href=&quot;/impact-of-ai-study&quot;&gt;METR study&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;See Figure 17.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I suppose the study doesn’t say that explicitly, but the Anthropic Fellows program was only launched in December 2024, and the paper was published in January 2026.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[You have to know how to drive the car]]></title><link>https://seangoedecke.com/knowing-how-to-drive-the-car/</link><guid isPermaLink="false">https://seangoedecke.com/knowing-how-to-drive-the-car/</guid><pubDate>Mon, 26 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;There are lots of different ways to be a software engineer. You can grind out code for twelve hours a day to &lt;a href=&quot;https://www.youtube.com/watch?v=B8C5sjjhsso&quot;&gt;make the world a better place&lt;/a&gt;. You can focus on &lt;a href=&quot;https://www.noidea.dog/glue&quot;&gt;glue work&lt;/a&gt;: process-based work that makes everyone around you more successful. You can join the conversation with your product manager and designer colleagues to influence what gets built, not just how it gets built. You can climb the ladder to staff engineer and above, or you can take it easy and focus on your hobbies. But whichever of these you choose, &lt;strong&gt;you have to know how tech companies work&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;I want to credit Alex Wennerberg for drawing out this point in our recent &lt;a href=&quot;https://www.youtube.com/watch?v=lpuy9RxJmfU&quot;&gt;discussion&lt;/a&gt;. Wennerberg thinks I spend too much time writing about the realpolitik of tech companies, and not enough time writing about &lt;em&gt;value&lt;/em&gt;: in &lt;a href=&quot;https://alexwennerberg.com/blog/2025-11-28-engineering.html&quot;&gt;his words&lt;/a&gt;, the delivery of software “that people want and like”. The whole point of working in tech is to produce value, after all.&lt;/p&gt;
&lt;p&gt;To me, this is like saying that the point of cars is to help you reach goals you care about: driving to the grocery store to get food, say, or to pick up your partner for a date. That’s true! Some goals you can achieve with cars are better than others. For instance, driving to your job at the &lt;a href=&quot;https://en.wikipedia.org/wiki/Torment_Nexus&quot;&gt;Torment Nexus&lt;/a&gt; is much worse than driving to your volunteer position at the soup kitchen. But whatever you want to do, &lt;strong&gt;you have to know how to drive the car&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Let’s walk through some examples. Suppose you’re an ambitious software engineer who wants to climb the ranks in your company. You ought to know that &lt;a href=&quot;/party-tricks/&quot;&gt;crushing JIRA tickets is rarely a path to promotion&lt;/a&gt; (at least above mid-level), that &lt;a href=&quot;/glue-work-considered-harmful/&quot;&gt;glue work can be a trap&lt;/a&gt;, that you will be judged on the &lt;a href=&quot;/being-accountable-for-results/&quot;&gt;results of your projects&lt;/a&gt;, and therefore &lt;a href=&quot;/how-to-ship/&quot;&gt;getting good at shipping projects&lt;/a&gt; is the path to career success. You should therefore neglect piece-work that isn’t part of projects you’re leading, grind like a demon on those projects to make sure they succeed, and pay a lot of attention to how you’re communicating those projects up to your management chain. So far, so obvious.&lt;/p&gt;
&lt;p&gt;Alternatively, suppose you’re an unambitious software engineer, and you just want to take it easy and spend more time with your kids (or dog, or model trains). You probably don’t care about being promoted, then. But you ought to be aware of &lt;a href=&quot;/glue-work-considered-harmful/&quot;&gt;the dangers of glue work&lt;/a&gt;, and of how important projects are. You should be carefully tracking &lt;a href=&quot;/the-spotlight/&quot;&gt;the spotlight&lt;/a&gt;, so you can spend your limited amount of effort where it’s going to buy you the most positive reputation (while never having to actually grind).&lt;/p&gt;
&lt;p&gt;Finally, suppose you’re a software engineer who wants to deliver value to users - real value, not what the company cares about right now. For instance, you might really care about accessibility, but your engineering organization only wants to give a token effort. You thus probably want to know how to &lt;a href=&quot;/ratchet-effects&quot;&gt;build up your reputation&lt;/a&gt; in the company, so you can spend that credit down by doing unsanctioned (or barely-sanctioned) accessibility work. You should also have a larger program of accessibility work ready to go, so you can &lt;a href=&quot;/how-to-influence-politics&quot;&gt;“catch the wave”&lt;/a&gt; on the rare occasion that the organization decides it cares about accessibility.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Not knowing how to drive the car can get you in trouble.&lt;/strong&gt; I have worked with ambitious software engineers who pour their energy into the wrong thing and get frustrated when their promotion doesn’t come. I’ve worked with unambitious software engineers who get sidelined and drummed out of the company (though at least they tend to have a “fair enough” attitude about it). I’ve worked with &lt;em&gt;many&lt;/em&gt; engineers who had their own goals they wanted to achieve, but who were completely incapable of doing so (or who burnt all their bridges doing so).&lt;/p&gt;
&lt;p&gt;The only way to truly opt out of big-company organizational politics is to avoid working at big companies altogether. That’s a valid choice! But it also means you’re passing up the kind of leverage that you can only get at large tech companies: the opportunity to make changes that affect millions or billions of people. If you’re going after that leverage - whatever you want to do with it - you really ought to try and understand how big companies work.&lt;/p&gt;
&lt;p&gt;edit: this post got some &lt;a href=&quot;https://news.ycombinator.com/item?id=46772966&quot;&gt;comments&lt;/a&gt; on Hacker News. Some &lt;a href=&quot;https://news.ycombinator.com/item?id=46775353&quot;&gt;commenters&lt;/a&gt; have good political advice, like “communicate with your manager 10x more than you think you should be communicating”. &lt;a href=&quot;https://news.ycombinator.com/item?id=46775137&quot;&gt;Other&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=46774422&quot;&gt;commenters&lt;/a&gt; are exhausted by having to care about the political stuff at all (fair enough!)&lt;/p&gt;</content:encoded></item><item><title><![CDATA[How I estimate work as a staff software engineer]]></title><link>https://seangoedecke.com/how-i-estimate-work/</link><guid isPermaLink="false">https://seangoedecke.com/how-i-estimate-work/</guid><pubDate>Sat, 24 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;There’s a kind of polite fiction at the heart of the software industry. It goes something like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Estimating how long software projects will take is very hard, but not impossible. A skilled engineering team can, with time and effort, learn how long it will take for them to deliver work, which will in turn allow their organization to make good business plans.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is, of course, false. As every experienced software engineer knows, &lt;strong&gt;it is not possible to accurately estimate software projects&lt;/strong&gt;. The tension between this polite fiction and its well-understood falseness causes a lot of strange activity in tech companies.&lt;/p&gt;
&lt;p&gt;For instance, many engineering teams estimate work in &lt;a href=&quot;https://asana.com/resources/t-shirt-sizing&quot;&gt;t-shirt sizes&lt;/a&gt; instead of time, because it just feels too obviously silly to the engineers in question to give direct time estimates. Naturally, these t-shirt sizes are immediately translated into hours and days when the estimates make their way up the management chain.&lt;/p&gt;
&lt;p&gt;Alternatively, software engineers who are genuinely trying to give good time estimates have ridiculous &lt;a href=&quot;https://news.ycombinator.com/item?id=19671824&quot;&gt;heuristics&lt;/a&gt; like “double your initial estimate and add 20%“. This is basically the same as giving up and saying “just estimate everything at a month”.&lt;/p&gt;
&lt;p&gt;Should tech companies just stop estimating? One of my guiding principles is that &lt;strong&gt;when a tech company is doing something silly, they’re probably doing it for a good reason&lt;/strong&gt;. In other words, practices that appear to not make sense are often serving some more basic, &lt;a href=&quot;/seeing-like-a-software-company&quot;&gt;illegible&lt;/a&gt; role in the organization. So what is the actual purpose of estimation, and how can you do it well as a software engineer?&lt;/p&gt;
&lt;h3&gt;Why estimation is impossible&lt;/h3&gt;
&lt;p&gt;Before I get into that, I should justify my core assumption a little more. People &lt;a href=&quot;https://world.hey.com/dhh/software-estimates-have-never-worked-and-never-will-a41a9c71&quot;&gt;have&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=18487253&quot;&gt;written&lt;/a&gt; &lt;a href=&quot;https://medium.com/@riaanfnel/the-problem-with-estimates-f3d5cddd5e62&quot;&gt;a lot&lt;/a&gt; about this already, so I’ll keep it brief.&lt;/p&gt;
&lt;p&gt;I’m also going to concede that &lt;strong&gt;sometimes you can accurately estimate software work&lt;/strong&gt;, when that work is very well-understood and very small in scope. For instance, if I know it takes half an hour to deploy a service&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, and I’m being asked to update the text in a link, I can accurately estimate the work at something like 45 minutes: five minutes to push the change up, ten minutes to wait for CI, thirty minutes to deploy.&lt;/p&gt;
&lt;p&gt;For most of us, the majority of software work is not like this. We work on poorly-understood systems and cannot predict exactly what must be done in advance. Most programming in large systems is &lt;em&gt;research&lt;/em&gt;: identifying prior art, mapping out enough of the system to understand the effects of changes, and so on. Even for fairly small changes, we simply do not know what’s involved in making the change until we go and look.&lt;/p&gt;
&lt;p&gt;The pro-estimation dogma says that these questions ought to be answered during the planning process, so that each individual piece of work being discussed is scoped small enough to be accurately estimated. I’m not impressed by this answer. It seems to me to be a throwback to the bad old days of &lt;a href=&quot;https://en.wikipedia.org/wiki/Software_architect&quot;&gt;software architecture&lt;/a&gt;, where one architect would map everything out in advance, so that individual programmers simply had to mechanically follow instructions. Nobody does that now, because it doesn’t work: programmers must be empowered to make architectural decisions, because they’re the ones who are actually in contact with the code&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. Even if it did work, that would simply shift the impossible-to-estimate part of the process backwards, into the planning meeting (where of course you can’t write or run code, which makes it near-impossible to accurately answer the kind of questions involved).&lt;/p&gt;
&lt;p&gt;In short: software engineering projects are not dominated by the known work, but by the unknown work, which always takes 90% of the time. However, only the known work can be accurately estimated. It’s therefore impossible to accurately estimate software projects in advance.&lt;/p&gt;
&lt;h3&gt;Estimates do not come from engineers&lt;/h3&gt;
&lt;p&gt;Estimates do not help engineering teams deliver work more efficiently. Many of the most productive years of my career were spent on teams that did no estimation at all: we were either working on projects that had to be done no matter what, and so didn’t really need an estimate, or on projects that would deliver a constant drip of value as we went, so we could just keep going indefinitely&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;In a very real sense, &lt;strong&gt;estimates aren’t even made by engineers at all&lt;/strong&gt;. If an engineering team comes up with a long estimate for a project that some VP really wants, they will be pressured into lowering it (or some other, more compliant engineering team will be handed the work). If the estimate on an undesirable project - or a project that’s intended to “hold space” for future unplanned work - is too short, the team will often be encouraged to increase it, or their manager will just add a 30% buffer.&lt;/p&gt;
&lt;p&gt;One exception to this is projects that are technically impossible, or just genuinely prohibitively difficult. If a manager consistently fails to pressure their teams into giving the “right” estimates, that can send a signal up that maybe the work can’t be done after all. Smart VPs and directors will try to avoid taking on technically impossible projects.&lt;/p&gt;
&lt;p&gt;Another exception to this is areas of the organization that senior leadership doesn’t really care about. In a sleepy backwater, often the formal estimation process does actually get followed to the letter, because there’s no director or VP who wants to jump in and shape the estimates to their ends. This is one way that some parts of a tech company can have drastically different engineering cultures to other parts. I’ll let you imagine the consequences when the company is re-orged and these teams are pulled into the spotlight.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Estimates are political tools for non-engineers in the organization&lt;/strong&gt;. They help managers, VPs, directors, and C-staff decide on which projects get funded and which projects get cancelled. &lt;/p&gt;
&lt;h3&gt;Estimates define the work, not the other way around&lt;/h3&gt;
&lt;p&gt;The standard way of thinking about estimates is that you start with a proposed piece of software work, and you then go and figure out how long it will take. &lt;strong&gt;This is entirely backwards.&lt;/strong&gt; Instead, teams will often start with the estimate, and then go and figure out what kind of software work they can do to meet it.&lt;/p&gt;
&lt;p&gt;Suppose you’re working on a LLM chatbot, and your director wants to implement “talk with a PDF”. If you have six months to do the work, you might implement a robust file upload system, some pipeline to chunk and embed the PDF content for semantic search, a way to extract PDF pages as image content to capture formatting and diagrams, and so on. If you have one day to do the work, you will naturally search for simpler approaches: for instance, converting the PDF to text client-side and sticking the entire thing in the LLM context, or offering a plain-text “grep the PDF” tool.&lt;/p&gt;
&lt;p&gt;This is true at even at the level of individual lines of code. When you have weeks or months until your deadline, you might spend a lot of time thinking airily about how you could refactor the codebase to make your new feature fit in as elegantly as possible. When you have hours, you will typically be laser-focused on finding an approach that will actually work. There are always many different ways to solve software problems. Engineers thus have quite a lot of discretion about how to get it done.&lt;/p&gt;
&lt;h3&gt;How I estimate work&lt;/h3&gt;
&lt;p&gt;So how do I estimate, given all that?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I gather as much political context as possible before I even look at the code&lt;/strong&gt;. How much pressure is on this project? Is it a casual ask, or do we &lt;em&gt;have&lt;/em&gt; to find a way to do this? What kind of estimate is my management chain looking for? There’s a huge difference between “the CTO &lt;em&gt;really&lt;/em&gt; wants this in one week” and “we were looking for work for your team and this seemed like it could fit”.&lt;/p&gt;
&lt;p&gt;Ideally, I go to the code &lt;strong&gt;with an estimate already in hand&lt;/strong&gt;. Instead of asking myself “how long would it take to do this”, where “this” could be any one of a hundred different software designs, I ask myself “which approaches could be done in one week?“.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I spend more time worrying about unknowns than knowns&lt;/strong&gt;. As I said above, unknown work always dominates software projects. The more “dark forests” in the codebase this feature has to touch, the higher my estimate will be - or, more concretely, the tighter I need to constrain the set of approaches to the known work.&lt;/p&gt;
&lt;p&gt;Finally, &lt;strong&gt;I go back to my manager with a risk assessment, not with a concrete estimate&lt;/strong&gt;. I don’t ever say “this is a four-week project”. I say something like “I don’t think we’ll get this done in one week, because X Y Z would need to all go right, and at least one of those things is bound to take a lot more work than we expect. Ideally, I go back to my manager with a &lt;em&gt;series&lt;/em&gt; of plans, not just one:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We tackle X Y Z directly, which &lt;em&gt;might&lt;/em&gt; all go smoothly but if it blows out we’ll be here for a month&lt;/li&gt;
&lt;li&gt;We bypass Y and Z entirely, which would introduce these other risks but possibly allow us to hit the deadline&lt;/li&gt;
&lt;li&gt;We bring in help from another team who’s more familiar with X and Y, so we just have to focus on Z&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, I don’t “break down the work to determine how long it will take”. My management chain already knows how long they want it to take. &lt;strong&gt;My job is to figure out the set of software approaches that match that estimate.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Sometimes that set is empty: the project is just impossible, no matter how you slice it. In that case, my management chain needs to get together and figure out some way to alter the requirements. But if I always said “this is impossible”, my managers would find someone else to do their estimates. When I do that, I’m drawing on a well of trust that I build up by making pragmatic estimates the rest of the time.&lt;/p&gt;
&lt;h3&gt;Addressing some objections&lt;/h3&gt;
&lt;p&gt;Many engineers find this approach distasteful. One reason is that they don’t like estimating in conditions of uncertainty, so they insist on having all the unknown questions answered in advance. I have written a lot about this in &lt;a href=&quot;/taking-a-position&quot;&gt;&lt;em&gt;Engineers who won’t commit&lt;/em&gt;&lt;/a&gt; and &lt;a href=&quot;/clarity&quot;&gt;&lt;em&gt;How I provide technical clarity to non-technical leaders&lt;/em&gt;&lt;/a&gt;, but suffice to say that I think it’s cowardly. If you refuse to estimate, you’re forcing someone less technical to estimate for you.&lt;/p&gt;
&lt;p&gt;Some engineers think that their job is to constantly push back against engineering management, and that helping their manager find technical compromises is betraying some kind of sacred engineering trust. I wrote about this in &lt;a href=&quot;/a-little-bit-cynical&quot;&gt;&lt;em&gt;Software engineers should be a little bit cynical&lt;/em&gt;&lt;/a&gt;. If you want to spend your career doing that, that’s fine, but I personally find it more rewarding to find ways to work with my managers (who have almost exclusively been nice people).&lt;/p&gt;
&lt;p&gt;Other engineers might say that they rarely feel this kind of pressure from their directors or VPs to alter estimates, and that this is really just the sign of a dysfunctional engineering organization. Maybe! I can only speak for the engineering organizations I’ve worked in. But my suspicion is that these engineers are really just saying that they work “out of the spotlight”, where there’s not much pressure in general and teams can adopt whatever processes they want. There’s nothing wrong with that. But I don’t think it qualifies you to give helpful advice to engineers who do feel this kind of pressure.&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;I think software engineering estimation is generally misunderstood.&lt;/p&gt;
&lt;p&gt;The common view is that a manager proposes some technical project, the team gets together to figure out how long it would take to build, and then the manager makes staffing and planning decisions with that information. In fact, it’s the reverse: a manager comes to the team with an estimate already in hand (though they might not come out and admit it), and then the team must figure out what kind of technical project might be possible within that estimate.&lt;/p&gt;
&lt;p&gt;This is because estimates are not by or for engineering teams. They are tools used for managers to negotiate with each other about planned work. Very occasionally, when a project is literally impossible, the estimate can serve as a way for the team to communicate that fact upwards. But that requires trust. A team that is always pushing back on estimates will not be believed when they do encounter a genuinely impossible proposal.&lt;/p&gt;
&lt;p&gt;When I estimate, I extract the range my manager is looking for, and only then do I go through the code and figure out what can be done in that time. I never come back with a flat “two weeks” figure. Instead, I come back with a range of possibilities, each with their own risks, and let my manager make that tradeoff.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;It is not possible to accurately estimate software work.&lt;/strong&gt; Software projects spend most of their time grappling with unknown problems, which by definition can’t be estimated in advance. To estimate well, you must therefore basically ignore all the known aspects of the work, and instead try and make educated guesses about how many unknowns there are, and how scary each unknown is.&lt;/p&gt;
&lt;p&gt;edit: I should thank one of my readers, Karthik, who emailed me to ask about estimates, thus revealing to me that I had many more opinions than I thought.&lt;/p&gt;
&lt;p&gt;edit: This post got a bunch of comments on &lt;a href=&quot;https://news.ycombinator.com/item?id=46742389&quot;&gt;Hacker News&lt;/a&gt;. Some non-engineers made the &lt;a href=&quot;https://news.ycombinator.com/item?id=46744538&quot;&gt;point&lt;/a&gt; that well-paid professionals should be expected to estimate their work, even if the estimate is completely fictional. Sure, I agree, as long as we’re on the same page that it’s fictional!&lt;/p&gt;
&lt;p&gt;A couple of &lt;a href=&quot;https://news.ycombinator.com/item?id=46744696&quot;&gt;engineers&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=46744876&quot;&gt;argued&lt;/a&gt; that estimation was a solved problem. I’m not convinced by their examples. I agree you can probably estimate “build a user flow in Svelte”, but it’s much harder to estimate “build a user flow in Svelte &lt;em&gt;on top of an existing large codebase&lt;/em&gt;”. I should have been more clear in the post that I think that’s the hard part, for the normal reasons that it’s very hard to work in large codebases, which I &lt;a href=&quot;/large-established-codebases&quot;&gt;write&lt;/a&gt; &lt;a href=&quot;/wicked-features&quot;&gt;about&lt;/a&gt; &lt;a href=&quot;/clarity&quot;&gt;endlessly&lt;/a&gt; on this blog.&lt;/p&gt;
&lt;p&gt;edit: There are also some comments on &lt;a href=&quot;https://lobste.rs/s/dspppf/how_i_estimate_work_as_staff_software&quot;&gt;Lobste.rs&lt;/a&gt;, including a good &lt;a href=&quot;https://lobste.rs/c/i0sxht&quot;&gt;note&lt;/a&gt; that the capability of the team obviously has a huge impact on any estimates. In my experience, this is not commonly understood: companies expect estimates to be fungible between engineers or teams, when in fact some engineers and teams can deliver work ten times more quickly (and others cannot deliver work &lt;em&gt;at all&lt;/em&gt;, no matter how much time they have).&lt;/p&gt;
&lt;p&gt;Another commenter &lt;a href=&quot;https://news.ycombinator.com/item?id=46745726&quot;&gt;politely suggested&lt;/a&gt; I read &lt;a href=&quot;https://www.amazon.com.au/Software-Estimation-Demystifying-Black-Art/dp/%200735605351&quot;&gt;&lt;em&gt;Software Estimation: Demystifying the Black Art&lt;/em&gt;&lt;/a&gt;, which I’ve never heard of. I’ll put it on my list.&lt;/p&gt;
&lt;p&gt;There are also some &lt;a href=&quot;https://www.reddit.com/r/programming/comments/1qoj5mb/how_i_estimate_work_as_a_staff_software_engineer/&quot;&gt;comments&lt;/a&gt; on Reddit’s r/programming subreddit: mostly people just generically discussing estimation, but there are &lt;a href=&quot;https://www.reddit.com/r/programming/comments/1qoj5mb/comment/o22t1vm/&quot;&gt;interesting anecdotes&lt;/a&gt; and &lt;a href=&quot;https://www.reddit.com/r/programming/comments/1qoj5mb/comment/o2271vx/&quot;&gt;good criticism&lt;/a&gt; of the post.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;For anyone wincing at that time, I mean like three minutes of actual deployment and twenty-seven minutes of waiting for checks to pass or monitors to turn up green.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I write a lot more about this in &lt;a href=&quot;/you-cant-design-software-you-dont-work-on&quot;&gt;&lt;em&gt;You can’t design software you don’t work on&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;For instance, imagine a mandate to improve the performance of some large Rails API, one piece at a time. I could happily do that kind of work forever.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[I'm addicted to being useful]]></title><link>https://seangoedecke.com/addicted-to-being-useful/</link><guid isPermaLink="false">https://seangoedecke.com/addicted-to-being-useful/</guid><pubDate>Tue, 20 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;When I get together with my friends in the industry, I feel a little guilty about how much I love my job. This is a &lt;a href=&quot;/good-times-are-over&quot;&gt;tough time&lt;/a&gt; to be a software engineer. The job was less stressful in the late 2010s than it is now, and I sympathize with anyone who is upset about the change. There are a lot of objective reasons to feel bad about work. But despite all that, I’m still having a blast. I enjoy pulling together projects, figuring out difficult bugs, and writing code in general. I like spending time with computers. But what I really love is &lt;strong&gt;being useful&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The main character in Gogol’s short story &lt;a href=&quot;https://static1.squarespace.com/static/51734e04e4b08db710716119/t/5405e254e4b0f888a353e730/1409671764783/Gogol%2C+The+Overcoat.pdf&quot;&gt;&lt;em&gt;The Overcoat&lt;/em&gt;&lt;/a&gt; is a man called Akaky Akaievich&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. Akaky’s job is objectively terrible: he’s stuck in a dead-end copyist role, being paid very little, with colleagues who don’t respect him. Still, he loves his work, to the point that if he has no work to take home with him, he does some recreational copying just for his own sake. Akaky is a dysfunctional person. But his dysfunction makes him a perfect fit for his job&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;It’s hard for me to see a problem and not solve it. This is especially true if I’m the only person (or one of a very few people) who could solve it, or if somebody is asking for my help. I feel an almost physical discomfort about it, and a corresponding relief and satisfaction when I do go and solve the problem. The work of a software engineer - or at least my work as a staff software engineer - is perfectly tailored to this tendency. Every day people rely on me to solve a series of technical problems&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;In other words, like Akaky Akaievich, I don’t mind the ways in which my job is dysfunctional, because it matches the ways in which I myself am dysfunctional: specifically, &lt;strong&gt;my addiction to being useful&lt;/strong&gt;. (Of course, it helps that my working conditions are overall &lt;em&gt;much&lt;/em&gt; better than Akaky’s). I’m kind of like a working dog, in a way. Working dogs get rewarded with treats&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, but they don’t do it &lt;em&gt;for&lt;/em&gt; the treats. They do it for the work itself, which is inherently satisfying.&lt;/p&gt;
&lt;p&gt;This isn’t true of all software engineers. But it’s certainly true of many I’ve met: if not an addiction to being useful, then they’re driven by an addiction to solving puzzles, or to the complete control over your work product that you only really get in software or mathematics. If they weren’t working as a software engineer, they would be getting really into Factorio, or crosswords, or tyrannically moderating some internet community.&lt;/p&gt;
&lt;p&gt;A lot of the advice I give about working a software engineering job is really about how I’ve shaped my need to be useful in a way that delivers material rewards, and how I try to avoid the pitfalls of such a need. For instance, &lt;a href=&quot;/predators&quot;&gt;&lt;em&gt;Protecting your time from predators in large tech companies&lt;/em&gt;&lt;/a&gt; is about how some people in tech companies will identify people like me and wring us out in ways that only benefit them. &lt;a href=&quot;/party-tricks&quot;&gt;&lt;em&gt;Crushing JIRA tickets is a party trick, not a path to impact&lt;/em&gt;&lt;/a&gt; is about how I need to be useful &lt;em&gt;to my management chain&lt;/em&gt;, not to the ticket queue. &lt;a href=&quot;/impressing-people&quot;&gt;&lt;em&gt;Trying to impress people you don’t respect&lt;/em&gt;&lt;/a&gt; is about how I cope with the fact that I’m compelled to be useful to some people who I may not respect or even like.&lt;/p&gt;
&lt;p&gt;There’s a lot of discussion on the internet about what &lt;em&gt;ought&lt;/em&gt; to motivate software engineers: money and power, producing real &lt;a href=&quot;https://alexwennerberg.com/blog/2025-11-28-engineering.html&quot;&gt;value&lt;/a&gt;, ushering in the AI machine god, and so on. But what &lt;em&gt;actually does&lt;/em&gt; motivate software engineers is often more of an internal compulsion. If you’re in that category - as I suspect most of us are - then it’s worth figuring out how you can harness that compulsion most effectively.&lt;/p&gt;
&lt;p&gt;edit: this post was quite popular on &lt;a href=&quot;https://news.ycombinator.com/item?id=46690402&quot;&gt;Hacker News&lt;/a&gt;. I agree with the many commenters who pointed out that you need to avoid letting this tendency bleed into your personal life, if at all possible. I take the &lt;a href=&quot;https://news.ycombinator.com/item?id=46697424&quot;&gt;point&lt;/a&gt; that big corporations are not the best place to fulfil your emotional needs, but I think I disagree: of course you shouldn’t get &lt;em&gt;all&lt;/em&gt; your emotional satisfaction from work, but if you’re not getting &lt;em&gt;any&lt;/em&gt; I think that’s a bit unfortunate (particularly as a software engineer). Some commenters &lt;a href=&quot;https://news.ycombinator.com/item?id=46691281&quot;&gt;worried&lt;/a&gt; that this attitude leads to fast burnout - if anything, I think it’s the reverse. The times I’ve felt most burnt out are times where work wasn’t satisfying any of my strange internal mental wiring.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I think in Russian this is supposed to be an obviously silly name, like “Poop Poopson” (edit: HN commenters were &lt;a href=&quot;https://news.ycombinator.com/item?id=46691697&quot;&gt;split&lt;/a&gt; on this interpretation).&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Unfortunately, his low status and low pay catches up with Akaky in the end. His financial difficulty acquiring a new coat for the cold Russian winter (and his lack of backbone) end up doing him in, at which point the story becomes a ghost story.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I interpret “technical problem” quite broadly here: answering questions, explaining things, and bug-fixing all count.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Or toys, or playtime, or whatever.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Crypto grifters are recruiting open-source AI developers]]></title><link>https://seangoedecke.com/gas-and-ralph/</link><guid isPermaLink="false">https://seangoedecke.com/gas-and-ralph/</guid><pubDate>Sat, 17 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Two recently-hyped developments in AI engineering have been Geoff Huntley’s “Ralph Wiggum loop” and Steve Yegge’s “Gas Town”. Huntley and Yegge are both respected software engineers with a long pedigree of actual projects. The Ralph loop is a sensible idea: force infinite test-time-compute by automatically restarting Claude Code whenever it runs out of steam. Gas Town is a platform for an idea that’s been popular for a while (though in my view has never really worked): running a whole village of LLM agents that collaborate with each other to accomplish a task.&lt;/p&gt;
&lt;p&gt;So far, so good. But Huntley and Yegge have also been &lt;a href=&quot;https://ghuntley.com/solana/&quot;&gt;posting&lt;/a&gt; &lt;a href=&quot;https://steve-yegge.medium.com/bags-and-the-creator-economy-249b924a621a&quot;&gt;about&lt;/a&gt; $RALPH and $GAS, which are cryptocurrency coins built on top of the longstanding &lt;a href=&quot;https://solana.com/&quot;&gt;Solana&lt;/a&gt; cryptocurrency and the &lt;a href=&quot;https://bags.fm/&quot;&gt;Bags&lt;/a&gt; tool, which allows people to easily create their own crypto coins. What does $RALPH have to do with the Ralph Wiggum loop? What does $GAS have to do with Gas Town?&lt;/p&gt;
&lt;p&gt;From reading Huntley and Yegge’s posts, it seems like what happened was this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Some crypto trader created a “$GAS” coin via Bags, configuring it to pay a portion of the trading fees to Steve Yegge (via his Twitter account)&lt;/li&gt;
&lt;li&gt;That trader, or others with the same idea, messaged Yegge on LinkedIn to tell him about his “earnings” (&lt;a href=&quot;https://bags.fm/7pskt3A1Zsjhngazam7vHWjWHnfgiRump916Xj7ABAGS&quot;&gt;currently&lt;/a&gt; $238,000), framing it as support for the Gas Town project&lt;/li&gt;
&lt;li&gt;Yegge took the free money and started &lt;a href=&quot;https://steve-yegge.medium.com/bags-and-the-creator-economy-249b924a621a&quot;&gt;posting&lt;/a&gt; about how exciting $GAS is as a way to fund open-source software creators&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So what does $GAS have to do with Gas Town (or $RALPH with Ralph Wiggum)? From a technical perspective, the answer is &lt;strong&gt;nothing&lt;/strong&gt;. Gas Town is an open-source GitHub &lt;a href=&quot;https://github.com/steveyegge/gastown&quot;&gt;repository&lt;/a&gt; that you can clone, edit and run without ever interacting with the $GAS coin. Likewise for &lt;a href=&quot;https://github.com/anthropics/claude-code/tree/main/plugins/ralph-wiggum&quot;&gt;Ralph&lt;/a&gt;. Buying $GAS or $RALPH does not unlock any new capabilities in the tools. All it does is siphon a little bit of money to Yegge and Huntley, and increase the value of the $GAS or $RALPH coins.&lt;/p&gt;
&lt;p&gt;Of course, that’s why these coins exist in the first place. This is a new variant of an old &lt;a href=&quot;https://en.wikipedia.org/wiki/Airdrop_(cryptocurrency)&quot;&gt;“airdropping”&lt;/a&gt; cryptocurrency tactic. The classic problem with “memecoins” is that it’s hard to give people a reason to buy them, even at very low prices, because they famously have no staying power. That’s why many successful memecoins rely on celebrity power, like Eric Adams’ &lt;a href=&quot;https://www.nbcnewyork.com/news/local/explaining-eric-adams-crypto-token-launch/6444591/&quot;&gt;“NYC Token”&lt;/a&gt; or the &lt;a href=&quot;https://en.wikipedia.org/wiki/$Trump&quot;&gt;$TRUMP&lt;/a&gt; coin. But how do you convince a celebrity to get involved in your &lt;del&gt;grift&lt;/del&gt; business venture?&lt;/p&gt;
&lt;p&gt;This is where &lt;a href=&quot;https://bags.fm/&quot;&gt;Bags&lt;/a&gt; comes in. Bags allows you to nominate a Twitter account as the beneficiary (or “fee earner”) of your coin. The person behind that Twitter account doesn’t have to agree, or even know that you’re doing it. Once you accumulate a nominal market cap (for instance, by moving a bunch of your own money onto the coin), you can then message the owner of that Twitter account and say “hey, all these people are supporting you via crypto, and you can collect your money right now if you want!” Then you either subtly hint that promoting the coin would cause that person to make more money, or you wait for them to realize it themselves&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. Once they start posting about it, you’ve bootstrapped your own celebrity coin.&lt;/p&gt;
&lt;p&gt;This system relies on your celebrity target being dazzled by receiving a large sum of free money. If you came to them &lt;em&gt;before&lt;/em&gt; the money was there, they might ask questions like “why wouldn’t people just directly donate to me?”, or “are these people who think they’re supporting me going to lose all their money?“. But in the warm glow of a few hundred thousand dollars, it’s easy to think that it’s all working out excellently.&lt;/p&gt;
&lt;p&gt;Incidentally, this is why AI open-source software engineers make such great targets. The fact that they’re open-source software engineers means that (a) a few hundred thousand dollars is enough to dazzle them&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, and (b) their fans are technically-engaged enough to be able to figure out how to buy cryptocurrency. Working in AI also means that there’s a fresh pool of hype to draw from (the general hype around cryptocurrency being somewhat dry by now). On top of that, the open-source AI community is fairly small. Yegge &lt;a href=&quot;https://steve-yegge.medium.com/bags-and-the-creator-economy-249b924a621a&quot;&gt;mentions&lt;/a&gt; in his post that he wouldn’t have taken the offer seriously if Huntley hadn’t already accepted it.&lt;/p&gt;
&lt;p&gt;If you couldn’t tell, I think this whole thing is largely predatory. Bags seems to me to be offering crypto-airdrop-pump-and-dumps-as-a-service, where niche celebrities can turn their status as respected community figures into cold hard cash. The people who pay into this are either taken in by the pretense that they’re sponsoring open-source work (in a way orders of magnitude less efficient than just donating money directly), or by the hope that they’re going to win big when the coin goes “to the moon” (which effectively never happens). &lt;/p&gt;
&lt;p&gt;The celebrities will make a little bit of money, for their part in it, but the lion’s share of the reward will go to the actual grifters: the insiders who primed the coin and can sell off into the flood of community members who are convinced to buy.&lt;/p&gt;
&lt;p&gt;edit: this post got some comments on &lt;a href=&quot;https://news.ycombinator.com/item?id=46654878&quot;&gt;Hacker News&lt;/a&gt;. Commenters are a bit divided on whether the open-source developers are victims or perpetrators of the scam (I personally think it’s case-by-case). A good &lt;a href=&quot;https://news.ycombinator.com/item?id=46655339&quot;&gt;correction&lt;/a&gt; from one commenter that Solana is a chain network, not a cryptocurrency (SOL is the cryptocurrency on Solana).&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Bags even &lt;a href=&quot;https://bags.fm/how-it-works&quot;&gt;offers&lt;/a&gt; a “Did You Get Bagged? 💰🫵” section in their docs, encouraging the celebrity targets to share the coin, and framing the whole thing as coming from “your community”.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;This isn’t a dig - that amount of money would dazzle me too! I only mean that you wouldn’t be able to get Tom Cruise or MrBeast to promote your coin with that amount of money.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[The Dictator's Handbook and the politics of technical competence]]></title><link>https://seangoedecke.com/the-dictators-handbook/</link><guid isPermaLink="false">https://seangoedecke.com/the-dictators-handbook/</guid><pubDate>Mon, 05 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/The_Dictator%27s_Handbook&quot;&gt;&lt;em&gt;The Dictator’s Handbook&lt;/em&gt;&lt;/a&gt; is an ambitious book. In the introduction, its authors Bruce Bueno de Mesquita and Alastair Smith cast themselves as the successors to Sun Tzu and Niccolo Machiavelli: offering unsentimental advice to would-be successful leaders.&lt;/p&gt;
&lt;p&gt;Given that, I expected this book to be similar to &lt;a href=&quot;https://en.wikipedia.org/wiki/The_48_Laws_of_Power&quot;&gt;&lt;em&gt;The 48 Laws of Power&lt;/em&gt;&lt;/a&gt;, which did not impress me. Like many self-help books, &lt;em&gt;The 48 Laws of Power&lt;/em&gt; is “empty calories”: a lot of fun to read, but not really useful or edifying&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. However, &lt;em&gt;The Dictator’s Handbook&lt;/em&gt; is a legitimate work of political science, serving as a popular introduction to &lt;a href=&quot;https://en.wikipedia.org/wiki/Selectorate_theory&quot;&gt;an actual academic theory of government&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Political science is very much not my field, so I’m reluctant to be convinced by (or comment on) the various concrete arguments in the book. I’m mainly interested in whether the book has anything to say about something I do know a little bit about: operating as an engineer inside a large tech company.&lt;/p&gt;
&lt;h3&gt;Inner and outer circles&lt;/h3&gt;
&lt;p&gt;Let’s first cover the key idea of &lt;em&gt;The Dictator’s Handbook&lt;/em&gt;, which can be expressed in three points.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When explaining how organizations&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; behave, it is more useful to consider the motivations of individual people (say, the leader) than “the organization” as a whole&lt;/li&gt;
&lt;li&gt;Every leader must depend upon a &lt;strong&gt;coalition&lt;/strong&gt; of insiders who help them maintain their position&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Almost every feature of organizations can be explained by the ratio between the size of three groups:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The members of the coalition of insiders (i.e. the “inner circle”)&lt;/li&gt;
&lt;li&gt;The group who could conceivably become members of the coalition (the “outer circle”, or what the book calls the “interchangeables”)&lt;/li&gt;
&lt;li&gt;The entire population who is subject to the leader&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For instance, take an autocratic dictator. That dictator depends on a tiny group of people to maintain power: military generals, some powerful administrators, and so on. There is a larger group of people who could be in the inner circle but aren’t: for instance, other generals or administrators who are involved in government but aren’t fully trusted. Then there is the much, much larger group of all residents of the country, who are affected by the leader’s policies but have no ability to control them. This is an example of &lt;strong&gt;small-coalition&lt;/strong&gt; government.&lt;/p&gt;
&lt;p&gt;Alternatively, take a democratic president. To maintain power, the president depends on every citizen who is willing to vote for them. There’s a larger group of people outside that core coalition: voters who aren’t supporters of the president, but could conceivably be persuaded. Finally, there’s the inhabitants of the country who do not vote: non-citizens, the very young, potentially felons, and so on. This is an example of &lt;strong&gt;large-coalition&lt;/strong&gt; government.&lt;/p&gt;
&lt;h3&gt;Coalition sizes determine government type&lt;/h3&gt;
&lt;p&gt;Mesquita and Smith argue that the structure of the government is downstream from the coalition sizes. If the coalition is small, it doesn’t matter whether the country is nominally a democracy, it will function like an autocratic dictatorship. Likewise, if the coalition is large, even a dictatorship will act in the best interests of its citizens (and will necessarily democratize).&lt;/p&gt;
&lt;p&gt;According to them, the structure of government does not change the size of the coalition. Rather, changes in the size of the coalition force changes in the structure of government. For instance, a democratic leader may want to shrink the size of their coalition to make it easier to hold onto power (e.g. by empowering state governors to unilaterally decide the outcome of their state’s elections). If successful, the government will thus become a small-coalition government, and will function more like a dictatorship (even if it’s still nominally democratic).&lt;/p&gt;
&lt;p&gt;Why are small-coalition governments more prone to autocracy or corruption? Because leaders stay in power by rewarding their coalitions, and if your coalition is a few tens or hundreds of people, you can best reward them by directly handing out cash or treasure, at the expense of everyone else. If your coalition is hundreds of thousands or millions of people (e.g. all the voters in a democracy), you can no longer directly assign rewards to individual people. Instead, it’s more efficient to fund public goods that benefit everybody. That’s why democracies tend to fund many more public goods than dictatorships.&lt;/p&gt;
&lt;p&gt;Leaders prefer small coalitions, because small coalitions are cheaper to keep happy. This is why dictators rule longer than democratically-elected leaders. Incidentally, it’s also why hegemonic countries like the USA have a practical interest in keeping uneasy allies ruled by dictators: because small-coalition dictatorships are easier to pay off.&lt;/p&gt;
&lt;p&gt;Leaders also want the set of “interchangeables” - remember, this is the set of people who &lt;em&gt;could&lt;/em&gt; be part of the coalition but currently aren’t - to be as large as possible. That way they can easily replace unreliable coalition members. Of course, coalition members want the set of interchangeables to be as small as possible, to maximise their own leverage.&lt;/p&gt;
&lt;h3&gt;What about tech companies?&lt;/h3&gt;
&lt;p&gt;What does any of this have to do with tech companies? &lt;em&gt;The Dictator’s Handbook&lt;/em&gt; does reference a few tech companies specifically, but always in the context of boardroom disputes. In this framing, the CEO is the leader, and their coalition is the board who can either support them or fire them. I’m sure this is interesting - I’d love to read an account of the &lt;a href=&quot;https://en.wikipedia.org/wiki/Removal_of_Sam_Altman_from_OpenAI&quot;&gt;2023 OpenAI boardroom wars&lt;/a&gt; from this perspective - but I don’t really know anything first-hand about how boards work, so I don’t want to speculate.&lt;/p&gt;
&lt;p&gt;It’s unclear how we might apply this theory so that it’s relevant to individual software engineers and the levels of management they might encounter in a large tech company. Directors and VPs are definitely leaders, but they’re not “leaders” in the sense meant in &lt;em&gt;The Dictator’s Handbook&lt;/em&gt;. They don’t govern from the strength of their coalitions. Instead, they depend on the formal power they derive from the roles above them: you do what your boss says because they can fire you (or if they can’t, their boss certainly can).&lt;/p&gt;
&lt;p&gt;However, directors and VPs rarely make genuinely unilateral decisions. Typically they’ll consult with a small group of trusted subordinates, who they depend on for accurate information&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; and to actually execute projects. This sounds a lot like a coalition to me! Could we apply some of the lessons above to this kind of coalition?&lt;/p&gt;
&lt;h3&gt;Interchangeable engineers and managers&lt;/h3&gt;
&lt;p&gt;Let’s consider Mesquita and Smith’s point about the “interchangeables”. According to their theory, if you’re a member of the inner circle, it’s in your interest to be as irreplaceable as possible. You thus want to avoid bringing in other engineers or managers who could potentially fill your role. Meanwhile, your director or VP wants to have as many potential replacements available as possible, so each member of the inner circle’s bargaining power is lower - but they don’t want to bring them into the inner circle, since each extra person they need to rely on drains their political resources.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;This does not match my experience at all.&lt;/strong&gt; Every time I’ve been part of a trusted group like this, I’ve been &lt;em&gt;desperate&lt;/em&gt; to have a deeper bench. I have never once been in a position where I felt it was to my advantage to be the only person who could fill a particular role, for a few reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Management are suspicious of “irreplaceable” engineers and will actively work to undermine them, for a whole variety of reasons (the most palatable one is to reduce &lt;a href=&quot;https://en.wikipedia.org/wiki/Bus_factor&quot;&gt;bus factor&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;It’s just lonely to be in this position: you don’t really have peers to talk to, it’s hard to take leave, and so on. It feels much nicer to have potential backup&lt;/li&gt;
&lt;li&gt;Software teams succeed or fail together. Being the strongest engineer in a weak group means your projects will be rocky and you’ll have less successes to point to. But if you’re in a strong team, you’ll often acquire a good reputation just by association (so long as you’re not obviously dragging the side down)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, &lt;em&gt;The Dictator’s Handbook&lt;/em&gt; style of backstabbing and political maneuvering is just not something I’ve observed at the level of software teams or products. Maybe it happens like this at the C-suite/VP or at the boardroom level - I wouldn’t know. But at the level I’m at, &lt;strong&gt;the success of individual projects determines your career success&lt;/strong&gt;, so self-interested people tend to try and surround themselves with competent professionals who can make projects succeed, even if those people pose more of a political threat.&lt;/p&gt;
&lt;h3&gt;Competence&lt;/h3&gt;
&lt;p&gt;I think the main difference here is that &lt;strong&gt;technical competence matters a lot in engineering organizations&lt;/strong&gt;. I want a deep bench because it really matters to me whether projects succeed or fail, and having more technically competent people in the loop drastically increases the chances of success.&lt;/p&gt;
&lt;p&gt;Mesquita and Smith barely write about competence at all. From what I can tell, they assume that leaders don’t care about it, and assume that their administration will be competent enough (a very low bar) to stay in power, no matter what they do.&lt;/p&gt;
&lt;p&gt;For tech companies, &lt;strong&gt;technical competence is a critical currency for leaders&lt;/strong&gt;. Leaders who can attract and retain technical competence to their organizations are able to complete projects and notch up easy political wins. Leaders who fail to do this must rely on “pure politics”: claiming credit, making glorious future promises, and so on. Of course, every leader has to do some amount of this. But it’s just &lt;em&gt;easier&lt;/em&gt; to also have concrete accomplishments to point to as well.&lt;/p&gt;
&lt;p&gt;If I were tempted to criticize the political science here, this is probably where I’d start. I find it hard to believe that governments are &lt;em&gt;that&lt;/em&gt; different from tech companies in this sense: surely competence makes a big difference to outcomes, and leaders are thus incentivized to keep competent people in their circle, even if that disrupts their coalition or incurs additional political costs&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Does competence dominate mid-level politics?&lt;/h3&gt;
&lt;p&gt;Still, it’s possible to explain the desire for competence in a way that’s consistent with &lt;em&gt;The Dictator’s Handbook&lt;/em&gt;. Suppose that competence isn’t more important in &lt;em&gt;tech companies&lt;/em&gt;, but is more important for &lt;em&gt;senior management&lt;/em&gt;. According to this view, the leader right at the top (the dictator, president, or CEO) doesn’t have the luxury to care about competence, and must focus entirely on solidifying their power base. But the leaders in the middle (the generals, VPs and directors) are obliged to actually get things done, and so need to worry a lot about keeping competent subordinates.&lt;/p&gt;
&lt;p&gt;Why would VPs be more obliged to get things done than CEOs? One reason might be that CEOs depend on a coalition of all board members (or even all company shareholders). This is a small coalition by &lt;em&gt;The Dictator’s Handbook&lt;/em&gt; standards, but it’s still much larger than the VP’s coalition, which is a coalition of one: just their boss. CEOs have tangible ways to reward their coalition. But VPs can only really reward their coalition via accomplishing their boss’s goals, which necessarily requires competence.&lt;/p&gt;
&lt;p&gt;Mesquita and Smith aren’t particularly interested in mid-level politics. Their focus is on leaders and their direct coalitions. But for most of us who operate in the middle level, maybe the lesson is that &lt;strong&gt;coalition politics dominates at the top, but competence politics dominates in the middle.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;I enjoyed &lt;em&gt;The Dictator’s Handbook&lt;/em&gt;, but most of what I took from it was speculation. There weren’t a lot of direct lessons I could draw from my own work politics&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;, and I don’t feel competent to judge the direct political science arguments.&lt;/p&gt;
&lt;p&gt;For instance, the book devotes a chapter to arguing against foreign aid, claiming roughly (a) that it props up unstable dictatorships by allowing them to reward their small-group coalitions, and (b) that it allows powerful countries to pressure small dictatorships into adopting foreign policies that are not in their citizens’ interest. Sure, that seems plausible! But I’m suspicious of plausible-sounding arguments in areas where I don’t have actual expertise. I could imagine a similarly-plausible argument in favor of foreign aid&lt;sup id=&quot;fnref-6&quot;&gt;&lt;a href=&quot;#fn-6&quot; class=&quot;footnote-ref&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The book doesn’t talk about competence at all, but in my experience of navigating work politics, competence is the primary currency - it’s both the instrument and the object of many political battles. I can reconcile this by guessing that &lt;strong&gt;competence might matter more at the senior-management level than the very top level of politics&lt;/strong&gt;, but I’m really just guessing. I don’t have the research background or the C-level experience to be confident about any of this.&lt;/p&gt;
&lt;p&gt;Still, I did like the core idea. No leader can lead alone, and that therefore the relationship between the ruler and their coalition dictates much of the structure of the organization. I think that’s broadly true about many different kinds of organization, including software companies.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Maybe there are people out there who are applying Greene’s Machiavellian power tactics to their daily lives. If so, I hope I don’t meet them.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;“Organizations” here is understood very broadly: companies, nations, families, book clubs, and so on all fit the definition.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I write about this a lot more in &lt;a href=&quot;/clarity&quot;&gt;&lt;em&gt;How I provide technical clarity to non-technical leaders&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;In an email exchange, a reader suggested that companies face more competition than governments, because the cost of moving countries is much higher than the cost of switching products, which might make competence more important for companies. I think this is also pretty plausible.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;This is not a criticism of the book.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;After five years of studying philosophy, I’m convinced you can muster a plausible argument in favor of literally any position, with enough work.&lt;/p&gt;
&lt;a href=&quot;#fnref-6&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[2025 was an excellent year for this blog]]></title><link>https://seangoedecke.com/2025-wrapup/</link><guid isPermaLink="false">https://seangoedecke.com/2025-wrapup/</guid><pubDate>Sat, 03 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In 2025, I published 141 posts, 33 of which made it to the front page of &lt;a href=&quot;https://news.ycombinator.com/&quot;&gt;Hacker News&lt;/a&gt; or similar aggregators. I definitely wrote more in the first half of the year (an average of around 15 posts per month, down to around 8 in the second half), but overall I’m happy with my consistency. Here are some posts I’m really proud of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/large-established-codebases&quot;&gt;&lt;em&gt;Mistakes engineers make in large established codebases&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/good-times-are-over&quot;&gt;&lt;em&gt;The good times in tech are over&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/good-system-design&quot;&gt;&lt;em&gt;Everything I know about good system design&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/pure-and-impure-engineering&quot;&gt;&lt;em&gt;Pure and impure software engineering&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;/seeing-like-a-software-company&quot;&gt;&lt;em&gt;Seeing like a software company&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As it turns out, I was the &lt;a href=&quot;https://refactoringenglish.com/blog/2025-hn-top-5/&quot;&gt;third most popular blogger&lt;/a&gt; on Hacker News this year, behind the excellent Simon Willison and Jeff Geerling. I don’t put a lot of effort into appealing to Hacker News specifically, but I do think my natural style meshes well with the Hacker News commentariat (even if they’re often quite critical).&lt;/p&gt;
&lt;p&gt;I got hundreds of emails from readers this year (I went through Gmail and made it to 200 in the last three months of the year before I stopped counting). Getting email about my posts is one of the main reasons I write, so it was great to read people’s anecdotes and hear what they agreed or disagreed with. I also want to thank the people who wrote blog-length responses to what I wrote (most recently Alex Wennerberg’s &lt;a href=&quot;https://alexwennerberg.com/blog/2025-11-28-engineering.html&quot;&gt;&lt;em&gt;Software Engineers Are Not Politicians&lt;/em&gt;&lt;/a&gt; and Lalit Maganti’s &lt;a href=&quot;https://lalitm.com/software-engineering-outside-the-spotlight/&quot;&gt;&lt;em&gt;Why I Ignore The Spotlight as a Staff Engineer&lt;/em&gt;&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;I don’t have proper traffic statistics for the year - more on that later - but I remember I peaked in August with around 1.3 million monthly views. In December I had 700 thousand: less, but still not so shabby. I finally set up email subscription in May, via &lt;a href=&quot;https://buttondown.com/&quot;&gt;Buttondown&lt;/a&gt;, and now have just over 2,500 email subscribers to the blog. I have no way of knowing how many people are subscribed via RSS&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The biggest housekeeping change for my blog this year is that everything now costs a lot more money. I had to upgrade my Netlify plan and my Buttondown plan multiple times as my monthly traffic increased. I pay $9 a month for Netlify analytics, which is pretty bad: it doesn’t store data past 30 days, only tracks the top-ten referrers, and doesn’t let me break down traffic by source. I’m trialing Plausible, since I learned Simon Willison is &lt;a href=&quot;https://news.ycombinator.com/item?id=46471538&quot;&gt;using it&lt;/a&gt;, but once the trial expires it’s going to set me back $60 a month. Of course, I can afford it - on the scale of hobbies, it’s closer to rock climbing than skiing - but there’s still been a bit of sticker shock for something I’m used to thinking of as a free activity.&lt;/p&gt;
&lt;p&gt;Thank you all for reading, and extra thanks to those of you who have posted my articles to aggregators, emailed me, messaged on LinkedIn, or left comments. I look forward to writing another ~140 posts in 2026!&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;My analytics don’t track requests to &lt;code class=&quot;language-text&quot;&gt;/rss&lt;/code&gt;, but even if they did I imagine some RSS feed readers would be caching the contents of my feed.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Grok is enabling mass sexual harassment on Twitter]]></title><link>https://seangoedecke.com/grok-deepfakes/</link><guid isPermaLink="false">https://seangoedecke.com/grok-deepfakes/</guid><pubDate>Fri, 02 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Grok, xAI’s flagship image model, is now&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; being &lt;a href=&quot;https://www.reddit.com/r/videos/comments/1q1gwf3/premium_x_users_are_using_grok_to_generate/&quot;&gt;widely used&lt;/a&gt; to generate nonconsensual lewd images of women on the internet.&lt;/p&gt;
&lt;p&gt;When a woman posts an innocuous picture of herself - say, at her Christmas dinner - the comments are now full of messages like “@grok please generate this image but put her in a bikini and make it so we can see her feet”, or “@grok turn her around”, and the associated images. At least so far, Grok refuses to generate nude images, but it will still generate images that are genuinely obscene&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In my view, this might be the worst AI safety violation we have seen so far.&lt;/strong&gt; Case-by-case, it’s not worse than GPT-4o &lt;a href=&quot;https://www.bbc.com/news/articles/cgerwp7rdlvo&quot;&gt;encouraging&lt;/a&gt; suicidal people to go through with it, but it’s so much more widespread: literally &lt;em&gt;every&lt;/em&gt; image that the Twitter algorithm picks up is full of “@grok take her clothes off” comments. I didn’t go looking for evidence for obvious reasons, but I find reports that it’s generating &lt;a href=&quot;https://rainn.org/get-the-facts-about-csam-child-sexual-abuse-material/what-is-csam/&quot;&gt;CSAM&lt;/a&gt; plausible&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;AI safety is a rough process&lt;/h3&gt;
&lt;p&gt;This behavior, while awful, is in line with xAI’s general attitude towards safety, which has been roughly “we don’t support woke censorship, so do whatever you want (so long as you’re doing it with Grok)“. This has helped them acquire users and media attention, but it leaves them vulnerable to situations exactly like this. I’m fairly confident xAI don’t mind the “dress her a little sexier” prompts: it’s edgy, drives up user engagement, and gives them media attention.&lt;/p&gt;
&lt;p&gt;However, &lt;strong&gt;it is very hard to exercise fine-grained control over AI safety&lt;/strong&gt;. If you allow your models to go up to the line, your models will &lt;em&gt;definitely&lt;/em&gt; go over the line in some circumstances. I wrote about this in &lt;a href=&quot;/ai-personality-space&quot;&gt;&lt;em&gt;Mecha-Hitler, Grok, and why it’s so hard to give LLMs the right personality&lt;/em&gt;&lt;/a&gt;, in reference to xAI’s attempts to make Grok acceptably right-wing but not &lt;em&gt;too&lt;/em&gt; right-wing. This is the same kind of thing: you cannot make Grok “kind of perverted” without also making it truly awful.&lt;/p&gt;
&lt;p&gt;OpenAI and Gemini have popular image models that do not let you do this kind of thing. In other words, &lt;strong&gt;this is an xAI problem, not an image model problem&lt;/strong&gt;. It is possible to build a safe image model, just as it’s possible to build a safe language model. The xAI team have made a deliberate decision to build an &lt;em&gt;unsafe&lt;/em&gt; model in order to unlock more capabilities and appeal to more users. Even if they’d rather not be enabling the worst perverts on Twitter, that’s a completely &lt;a href=&quot;https://rainn.org/groks-spicy-ai-video-setting-will-lead-to-sexual-abuse/&quot;&gt;foreseeable&lt;/a&gt; consequence of their actions.&lt;/p&gt;
&lt;h3&gt;Isn’t this already a problem?&lt;/h3&gt;
&lt;p&gt;In October of 2024, VICE &lt;a href=&quot;https://www.vice.com/en/article/nudify-deepfake-bots-telegram/&quot;&gt;reported&lt;/a&gt; that Telegram “nudify” bots had over four million monthly users. That’s still a couple of orders of magnitude over Twitter’s &lt;a href=&quot;https://x.com/elonmusk/status/1793779530282443086&quot;&gt;monthly average users&lt;/a&gt;, but “one in a hundred” sounds like a plausible “what percentage of Twitter is using Grok like this” percentage anyway. Is it really that much worse that Grok now allows you to do softcore deepfakes?&lt;/p&gt;
&lt;p&gt;Yes, for two reasons. First, &lt;strong&gt;having to go and join a creepy Telegram group is a substantial barrier to entry&lt;/strong&gt;. It’s much worse to have the capability built into a tool that regular people use every day. Second, &lt;strong&gt;generating deepfakes via Grok makes them public&lt;/strong&gt;. Of course, it’s bad to do this stuff even privately, but I think it’s much worse to do it via Twitter. Tagging in Grok literally sends a push notification to your target saying “hey, I made some deepfake porn of you”, and then advertises that porn to everyone who was already following them.&lt;/p&gt;
&lt;h3&gt;What is to be done?&lt;/h3&gt;
&lt;p&gt;Yesterday xAI rushed out an &lt;a href=&quot;https://www.cnbctv18.com/technology/grok-claims-safeguards-tightened-after-users-misuse-ai-to-morph-images-of-women-children-ws-l-19811512.htm&quot;&gt;update&lt;/a&gt; to rein this behavior in (likely a system prompt update, given the timing). I imagine they’re worried about the legal exposure, if nothing else. But &lt;strong&gt;this will happen again&lt;/strong&gt;. It will probably happen again &lt;em&gt;with Grok&lt;/em&gt;. Every AI lab has a big “USER ENGAGEMENT” dial where left is “always refuse every request” and right is “do whatever the user says, including generating illegal deepfake pornography”. The labs are incentivized to turn that dial as far to the right as possible.&lt;/p&gt;
&lt;p&gt;In my view, &lt;strong&gt;image model safety is a different topic from language model safety&lt;/strong&gt;. Unsafe language models primarily harm the user (via sycophancy, for instance). Unsafe image models, as we’ve seen from Grok, can harm all kinds of people. I tend to think that unsafe language models should be available (perhaps not through ChatGPT dot com, but certainly for people who know what they’re doing). However, it seems really bad for everyone on the planet to have a “turn this image of a person into pornography” button.&lt;/p&gt;
&lt;p&gt;At minimum, I think it’d be sensible to &lt;strong&gt;pursue entities like xAI under existing CSAM or deepfake pornography laws&lt;/strong&gt;, to set up a powerful counter-incentive for people with their hands on the “USER ENGAGEMENT” dial. I also think it’d be sensible for AI labs to &lt;strong&gt;strongly lock down “edit this image of a human” requests&lt;/strong&gt;, even if that precludes some legitimate user activity.&lt;/p&gt;
&lt;p&gt;Earlier this year, in &lt;a href=&quot;/regulating-ai-companions&quot;&gt;&lt;em&gt;The case for regulating AI companions&lt;/em&gt;&lt;/a&gt;, I suggested regulating “AI girlfriend” products. I mistakenly thought AI companions or &lt;a href=&quot;/ai-sycophancy&quot;&gt;sycophancy&lt;/a&gt; might be the first case of genuine widespread harm caused by AI products, because &lt;em&gt;of course&lt;/em&gt; nobody would ship an image model that allowed this kind of prompting. Turns out I was wrong.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;There were reports &lt;a href=&quot;https://www.digitalcameraworld.com/tech/social-media/remove-her-clothes-groks-latest-ai-fiasco-illustrates-one-of-the-key-dangers-of-an-autonomous-ai&quot;&gt;in May of this year&lt;/a&gt; of similar behavior, but it was less widespread and xAI jumped on it fairly quickly.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Clever prompting by unethical fetishists can generate really degrading content (to the point where I’m uncomfortable going into more detail). I saw a few cases earlier this year of people trying this prompting tactic and Grok refusing them. It seems the latest version of Grok now allows this.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Building a feature that lets you digitally undress 18-year-olds but not 17-year-olds is a really difficult technical problem, which is one of the many reasons to &lt;em&gt;never do this&lt;/em&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Software engineers should be a little bit cynical]]></title><link>https://seangoedecke.com/a-little-bit-cynical/</link><guid isPermaLink="false">https://seangoedecke.com/a-little-bit-cynical/</guid><pubDate>Sun, 28 Dec 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A lot of my readers &lt;a href=&quot;https://lobste.rs/c/ch8tn0&quot;&gt;call&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=46085088&quot;&gt;me&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=46082989&quot;&gt;a cynic&lt;/a&gt; when I say things like “you should do things that &lt;a href=&quot;/how-to-ship&quot;&gt;make your manager happy&lt;/a&gt;” or “big tech companies &lt;a href=&quot;/bad-code-at-big-companies&quot;&gt;get to decide&lt;/a&gt; what projects you work on”. Alex Wennerberg put the “Sean Goedecke is a cynic” case well in his post &lt;a href=&quot;https://alexwennerberg.com/blog/2025-11-28-engineering.html&quot;&gt;&lt;em&gt;Software Engineers Are Not Politicians&lt;/em&gt;&lt;/a&gt;. Here are some excerpts:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I have no doubt that [Sean’s] advice is quite effective for navigating the upper levels of an organization dedicated to producing a large, mature software product. But what is lost is any sort of conception of value. Is it too naive to say that engineers are more than “tools in a political game”, they are specialized professionals whose role is to apply their expertise towards solving meaningful problems?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The irony is that this kind of thinking destroys a company’s ability to actually make money … the idea that engineers should begin with a self-conception of doing what their manager tells them to is, to me, very bleak. It may be a good way to operate smoothly within a bureaucratic organization, and of course, one must often make compromises and take direction, but it is a bad way to do good work.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I can see why people would think this way. But I &lt;em&gt;love&lt;/em&gt; working in big tech companies! I do see myself as a professional solving meaningful problems. And I think navigating the organization to put real features or improvements in the hands of users is an excellent way - maybe the best way - to do good work.&lt;/p&gt;
&lt;p&gt;Why do I write such cynical posts, then? Well, I think that a small amount of cynicism is necessary in order to think clearly about how organizations work, and to avoid falling into the trap of being overly cynical. In general, I think &lt;strong&gt;good engineers ought to be a little bit cynical&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;The idealist view is more cynical than idealists think&lt;/h3&gt;
&lt;p&gt;One doctrinaire “idealist” view of software engineering goes something like this. I’m obviously expressing it in its most lurid form, but I do think many people believe this more or less literally:&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We live in a late-stage-capitalist hellscape, where large companies are run by aspiring robber barons who have no serious convictions beyond desiring power. All those companies want is for obedient engineering drones to churn out bad code fast, so they can goose the (largely fictional) stock price. Meanwhile, end-users are left holding the bag: paying more for worse software, being hassled by advertisements, and dealing with bugs that are unprofitable to fix. The only thing an ethical software engineer can do is to try and find some temporary niche where they can defy their bosses and do real, good engineering work, or to retire to a hobby farm and write elegant open-source software in their free time.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;When you write it all out, I think it’s clear to see that this is &lt;em&gt;incredibly&lt;/em&gt; cynical. At the very least, it’s a cynical way to view your coworkers and bosses, who are largely people like you: doing a job, balancing a desire to do good work with the need to please their own bosses. It’s a cynical way to view the C-staff of a company. I think it’s also inaccurate: from my limited experience, the people who run large tech companies really do want to deliver good software to users.&lt;/p&gt;
&lt;p&gt;It’s idealistic only in the sense that it does not accept the need for individual software engineers to compromise. According to this view, &lt;em&gt;you&lt;/em&gt; never need to write bad software. No matter how hard the company tells you to compromise and just get something out, you’re morally required to plant your feet and tell them to go to hell. In fact, by doing so, you’re taking a stand against the general degeneration of the modern software world. You’re protecting - unsung, like Batman - the needs of the end-user who will never know you exist.&lt;/p&gt;
&lt;p&gt;I can certainly see the appeal of this view! But I don’t think it’s an &lt;em&gt;idealistic&lt;/em&gt; appeal. It comes from seeing the world as fundamentally corrupted and selfish, and believing that real positive change is impossible. In other words, &lt;strong&gt;I think it’s a &lt;em&gt;cynical&lt;/em&gt; appeal.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;The cynical view is more idealistic than idealists think&lt;/h3&gt;
&lt;p&gt;I don’t see a hard distinction between engineers being “tools in a political game” and professionals who solve meaningful problems. In fact, I think that in practice &lt;strong&gt;almost all meaningful problems are solved by playing political games&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;There are very few problems that you can solve entirely on your own. Software engineers encounter more of these problems than average, because the nature of software means that a single engineer can have huge leverage by sitting down and making a single code change. But in order to make changes to large products - for instance, to make it possible for GitHub’s 150M users to &lt;a href=&quot;https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/writing-mathematical-expressions&quot;&gt;use LaTeX in markdown&lt;/a&gt; - you need to coordinate with many other people at the company, which means you need to be involved in politics.&lt;/p&gt;
&lt;p&gt;It is just a plain fact that software engineers are not the movers and shakers in large tech organizations. They do not set the direction of the company. To the extent that they have political influence, it’s in how they translate the direction of the company into specific technical changes. But &lt;strong&gt;that is actually quite a lot of influence!&lt;/strong&gt; &lt;/p&gt;
&lt;p&gt;Large tech companies serve hundreds of millions (or billions) of users. Small changes to these products can have a massive positive or negative effect in the aggregate. As I see it, choosing to engage in the messy, political process of making these changes - instead of washing your hands of it as somehow impure - is an act of idealism. &lt;/p&gt;
&lt;p&gt;I think the position of a software engineer in a large tech company is similar to people who go into public service: idealistically hoping that they can do some good, despite knowing that they themselves will never set the broad strokes of government policy.&lt;/p&gt;
&lt;p&gt;Of course, big-tech software engineers are paid far better, so many people who go into this kind of work in fact are purely financially-motivated cynics. But I’m not one of them! I think it’s possible, by doing good work, to help steer the giant edifice of a large tech company for the better.&lt;/p&gt;
&lt;h3&gt;Cynicism as inoculation&lt;/h3&gt;
&lt;p&gt;Cynical writing is like most medicines: the dose makes the poison. A healthy amount of cynicism can serve as an inoculation from being overly cynical.&lt;/p&gt;
&lt;p&gt;If you don’t have an slightly cynical explanation for why engineers write bad code in large tech companies - such as the one I write about &lt;a href=&quot;/bad-code-at-big-companies&quot;&gt;here&lt;/a&gt; - you risk adopting an overly cynical one. For instance, you might think that big tech engineers are being &lt;a href=&quot;https://news.ycombinator.com/item?id=46082989&quot;&gt;deliberately demoralized&lt;/a&gt; as part of an anti-labor strategy to prevent them from unionizing, which is nuts. Tech companies are simply not set up to engage in these kind of conspiracies.&lt;/p&gt;
&lt;p&gt;If you don’t have a slightly cynical explanation for why large tech companies sometimes make inefficient decisions - such as &lt;a href=&quot;/seeing-like-a-software-company&quot;&gt;this one&lt;/a&gt; - you risk adopting an overly cynical one. For instance, you might think that tech companies are full of incompetent &lt;a href=&quot;https://news.ycombinator.com/item?id=46133179&quot;&gt;losers&lt;/a&gt;, which is simply not true. Tech companies have a normal mix of strong and &lt;a href=&quot;/weak-engineers&quot;&gt;weak engineers&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Idealist writing is massively over-represented in writing about software engineering&lt;/strong&gt;. There is no shortage of books or blog posts (correctly) explaining that we ought to value good code, that we ought to be kind to our colleagues, that we ought to work on projects with positive real-world impact, and so on. There &lt;em&gt;is&lt;/em&gt; a shortage of writing that accurately describes how big tech companies operate.&lt;/p&gt;
&lt;p&gt;Of course, cynical writing can harm people: by making them sad, or turning them into bitter cynics. But &lt;strong&gt;idealist writing can harm people too&lt;/strong&gt;. There’s a whole generation of software engineers who came out of the 2010s with a &lt;em&gt;factually incorrect&lt;/em&gt; model of how big tech companies work, and who are effectively being fed into the woodchipper in the 2020s. They would be better off if they internalized a correct model of how these companies work: not just less likely to get into trouble, but better at achieving their own idealist goals&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;edit: this post got some traction on &lt;a href=&quot;https://news.ycombinator.com/item?id=46414723&quot;&gt;Hacker News&lt;/a&gt;, with many comments. Some &lt;a href=&quot;https://news.ycombinator.com/item?id=46415077&quot;&gt;commenters&lt;/a&gt; said that it’s incoherent to say “what I do is good, actually” when my employer is engaged in various unethical activity. Fair enough! But this post isn’t about whether it’s ethical to work for Microsoft or not. It’s a followup to &lt;a href=&quot;/bad-code-at-big-companies&quot;&gt;&lt;em&gt;How good engineers write bad code at big companies&lt;/em&gt;&lt;/a&gt; - the main cynicism I’m interested in here is not “big tech is evil”, but “big tech is incompetent”.&lt;/p&gt;
&lt;p&gt;Some &lt;a href=&quot;https://news.ycombinator.com/item?id=46415535&quot;&gt;other&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=46414906&quot;&gt;commenters&lt;/a&gt; challenged my claim that C-staff want to deliver good software by pointing out that they’re not willing to trade off their personal success to do so. Sure, I agree with that. The kind of person willing to sacrifice their career for things doesn’t typically make it to a C-level position. But it’s not always zero-sum. Good software makes money for software companies, after all.&lt;/p&gt;
&lt;p&gt;I also saw two commenters link &lt;a href=&quot;https://en.wikipedia.org/wiki/High-Tech_Employee_Antitrust_Litigation&quot;&gt;this&lt;/a&gt; as an example of big tech companies actually being engaged in conspiracies against their employees. I’m not convinced. Companies &lt;em&gt;are&lt;/em&gt; structurally set up to collude on salaries, but they’re not set up to deliberately make their employees sad - they just don’t have that kind of fine-grained control over the culture! To the extent they have any control, they try to make their employees happy so they’ll work for less money and not leave.&lt;/p&gt;
&lt;p&gt;edit: this post also got some &lt;a href=&quot;https://www.reddit.com/r/programming/comments/1rgg1wr/software_engineers_should_be_a_little_bit_cynical/&quot;&gt;Reddit comments&lt;/a&gt;, though as usual these are very low quality: almost all comments just respond to the title, not the article content.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I don’t &lt;em&gt;think&lt;/em&gt; I’m strawmanning here - I’ve seen many people make all of these points in the past, and I suspect at least some readers will be genuinely nodding along to the following paragraph. If you’re one of those readers (or if you only agree with about 50%), consider doing me a favor and emailing me to let me know! If I don’t get any emails I will probably rewrite this.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;For some concrete details on this, see my post &lt;a href=&quot;/how-to-influence-politics&quot;&gt;&lt;em&gt;How I influence tech company politics as a staff software engineer&lt;/em&gt;&lt;/a&gt;. Also, if you’re interested, I wrote a much less well-developed version of this post right at the start of 2024, called &lt;a href=&quot;/cynicism&quot;&gt;&lt;em&gt;Is it cynical to do what your manager wants?&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[You can't design software you don't work on]]></title><link>https://seangoedecke.com/you-cant-design-software-you-dont-work-on/</link><guid isPermaLink="false">https://seangoedecke.com/you-cant-design-software-you-dont-work-on/</guid><pubDate>Sat, 27 Dec 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Only the engineers who work on a large software system can meaningfully participate in the design process. That’s because you cannot do good software design without an intimate understanding of the concrete details of the system. In other words, &lt;strong&gt;generic software design advice is typically useless&lt;/strong&gt; for most practical software design problems.&lt;/p&gt;
&lt;h3&gt;Generic software design&lt;/h3&gt;
&lt;p&gt;What is generic software design? It’s “designing to the problem”: the kind of advice you give when you have a reasonable understanding of the &lt;em&gt;domain&lt;/em&gt;, but very little knowledge of the existing &lt;em&gt;codebase&lt;/em&gt;. Unfortunately, this is the only kind of advice you’ll read in software books and blog posts&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. Engineers love giving generic software design advice for the same reason that all technical professionals love “talking shop”. However, you should be very careful about applying generic advice to your concrete day-to-day work problems&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When you’re doing real work, concrete factors dominate generic factors&lt;/strong&gt;. Having a clear understanding of what the code looks like right now is far, far more important than having a good grasp on general design patterns or principles. For instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In large codebases, consistency is more important than “good design”. I won’t argue that point here, but I wrote about it at length in &lt;a href=&quot;/large-established-codebases&quot;&gt;&lt;em&gt;Mistakes engineers make in large established codebases&lt;/em&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Real codebases are typically full of complex, hard-to-predict consequences. If you want to make your change safely, that typically constrains your implementation choices down to a bare handful of possibilities.&lt;/li&gt;
&lt;li&gt;Large shared codebases never reflect a single design, but are always in some intermediate state between different software designs. How the codebase will hang together after an individual change is thus way more important than what ideal “north star” you’re driving towards.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a world where you could rewrite the entire system at will, generic software design advice would be much more practical. Some projects are like this! But &lt;strong&gt;the majority of software engineering work is done on systems that cannot be safely rewritten&lt;/strong&gt;. These systems cannot rely on “software design”, but must instead rely on internal consistency and the carefulness of their engineers.&lt;/p&gt;
&lt;h3&gt;Concrete software design&lt;/h3&gt;
&lt;p&gt;What does good software design look like, then?&lt;/p&gt;
&lt;p&gt;In my experience, the most useful software design happens in conversations between a small group of engineers who all have deep understanding of the system, because they’re the ones working on it every day. These design discussions are often &lt;strong&gt;really boring&lt;/strong&gt; to outsiders, because they revolve around arcane concrete details of the system, not around general principles that any technical person can understand and have an opinion on.&lt;/p&gt;
&lt;p&gt;The kinds of topic being discussed are not “is DRY better than WET”, but instead “could we put this new behavior in subsystem A? No, because it needs information B, which isn’t available to that subsystem in context C, and we can’t expose that without rewriting subsystem D, but if we split up subsystem E here and here…“.&lt;/p&gt;
&lt;p&gt;Deep philosophical points about design are rarely important to the discussion. Instead, the most critical contributions point out small misunderstandings of concrete points, like: “oh, you thought B wasn’t available in context C, but we recently refactored C so now we could thread in B if we needed to”.&lt;/p&gt;
&lt;h3&gt;When generic software design is useful&lt;/h3&gt;
&lt;p&gt;Generic software design advice is not useful for practical software design problems, but that doesn’t mean it’s totally useless.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Generic software design advice is useful for building brand-new projects.&lt;/strong&gt; As I argued above, when you’re designing a new feature in an existing system, concrete factors of the system dominate. But when you’re designing a &lt;em&gt;new system&lt;/em&gt;, there are no concrete factors, so you can be entirely guided by generic advice.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Generic software design advice is useful for tie-breaking concrete design decisions.&lt;/strong&gt; I don’t think you should start with a generic design, but if you have a few candidate concrete pathways that all seem acceptable, generic principles can help you decide between them.&lt;/p&gt;
&lt;p&gt;This is particularly true at the level of the entire company. In other words, &lt;strong&gt;generic software design advice can help ensure consistency across different codebases&lt;/strong&gt;. This is one of the most useful functions of an official “software architect” role: to provide a set of general principles so that individual engineers can all tie-break their concrete decisions in the same direction&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Generic software design principles can also guide company-wide architectural decisions.&lt;/strong&gt; Should we run our services in our own datacenter, or in the cloud? Should we use k8s? AWS or Azure? Once you get broad enough, the concrete details of individual services almost don’t matter, because it’s going to be a huge amount of work either way. Still, even for these decisions, concrete details matter a lot. There are certain things you just can’t do in the cloud (like rely on bespoke hardware setups), or that you can’t do in your own datacenter (like deploy your service to the edge in twelve different regions). If the concrete details of your codebase rely on one of those things, you’ll be in for a bad time if you ignore them when making company-wide architectural decisions.&lt;/p&gt;
&lt;h3&gt;Architects and local minima&lt;/h3&gt;
&lt;p&gt;Those are all good reasons to do generic software design. One bad reason companies do generic software design is that it just sounds like a really good idea to people who aren’t working software engineers. Once you’re doing it, the incentives make it hard to stop. Many tech companies fall into this local minimum.&lt;/p&gt;
&lt;p&gt;Why not have your highest-paid software engineers spend their time exclusively making the most abstract, highest-impact decisions? You want your structural engineers to be drawing, not laying bricks, after all. I don’t know if structural engineering works like this, but I do know that software engineering doesn’t. In practice, &lt;strong&gt;software architecture advice often has to be ignored by the people on the ground&lt;/strong&gt;. There’s simply no way to actually translate it into something they can implement, in the context of the current system as it exists.&lt;/p&gt;
&lt;p&gt;However, for a practice that doesn’t work, “have your top engineers just do generic design” is surprisingly robust. &lt;strong&gt;Architects don’t have any skin in the game&lt;/strong&gt;&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, because their designs are handed off to actual engineering teams to implement. Because those designs can never be implemented perfectly, architects can both claim credit for successes (after all, it was their design) and disclaim failures (if only those fools had followed my design!)&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;When working on large existing codebases, useful software design discussions are way, way more concrete than many people believe. They typically involve talking about individual files or even lines of code. You thus can’t do useful software design without being intimately familiar with the codebase (in practice, that almost always means being an active contributor).&lt;/p&gt;
&lt;p&gt;Purely generic architecture is not &lt;em&gt;useless&lt;/em&gt;, but its role should be restricted to (a) setting out paved paths for brand new systems, (b) tie-breaking decisions on existing systems, and (c) helping companies make broad technology choices.&lt;/p&gt;
&lt;p&gt;In my opinion, formal “big-picture software architect” roles that spend all their time laying out the initial designs for projects are doomed to failure. They sound like a good idea (and they’re a good deal for the architect, who can claim credit without risking blame), but they provide very little value to the engineering teams that are tasked with actually writing the code.&lt;/p&gt;
&lt;p&gt;Personally, I believe that &lt;strong&gt;if you come up with the design for a software project, you ought to be responsible for the project’s success or failure&lt;/strong&gt;. That would rapidly ensure that the people designing software systems are the people who know how to ship software systems. It would also ensure that the &lt;em&gt;real&lt;/em&gt; software designers - the ones that have to take into account all the rough edges and warts of the codebase - get credit for the difficult design work they do.&lt;/p&gt;
&lt;p&gt;edit: this post got some &lt;a href=&quot;https://news.ycombinator.com/item?id=46418415&quot;&gt;comments&lt;/a&gt; on Hacker News. I was surprised to see some commenters disagreeing with my point about consistency. I remember the reception of &lt;a href=&quot;/large-established-codebases&quot;&gt;&lt;em&gt;Mistakes engineers make in large established codebases&lt;/em&gt;&lt;/a&gt; being quite positive. I was not surprised to see some commenters make the “haha, this is hypocritical because it is itself generic advice” point. I addressed this in the “when generic design is useful” section above.&lt;/p&gt;
&lt;p&gt;This post also got some &lt;a href=&quot;https://lobste.rs/s/72piqg/you_can_t_design_software_you_don_t_work_on&quot;&gt;comments&lt;/a&gt; on Lobste.rs. This is the rare case where the Lobste.rs comments are worse than the Hacker News comments: it’s mostly quibbling over the term “generic” and speculating over whether I wrote this post with a LLM (I didn’t).&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I admit I’ve given my own generic software design advice &lt;a href=&quot;/good-api-design&quot;&gt;here&lt;/a&gt;, &lt;a href=&quot;/good-system-design&quot;&gt;here&lt;/a&gt;, &lt;a href=&quot;/great-software-design&quot;&gt;here&lt;/a&gt;, and probably a dozen other places.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;When I say “real work problems”, I’m talking here about &lt;a href=&quot;/pure-and-impure-engineering&quot;&gt;impure software engineering&lt;/a&gt;: codebases which are intended to solve actual business needs and are thus (a) full of compromises and (b) constantly in a state of change. If you’re working on an elegant single-purpose library or the software for space probes, I suspect much of this advice does not apply.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Short of truly awful decisions, it almost doesn’t matter if those general principles are good or not - as with individual codebases, the benefits of consistency outweigh the benefits of having the “best” possible design.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;The architects I’m talking about here, I mean. The job title “architect” covers a great many different kinds of job, including just “very senior engineer doing normal engineering work”. Many architects don’t have an “architect” title at all, and are just senior/staff/distinguished software engineers who have ascended above having to do any actual implementation.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Nobody knows how large software products work]]></title><link>https://seangoedecke.com/nobody-knows-how-software-products-work/</link><guid isPermaLink="false">https://seangoedecke.com/nobody-knows-how-software-products-work/</guid><pubDate>Wed, 24 Dec 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Large, rapidly-moving tech companies are constantly operating in the “fog of war” about their own systems. Simple questions like “can users of type Y access feature X?”, “what happens when you perform action Z in this situation?”, or even “how many different plans do we offer” often can only be answered by a handful of people in the organization. Sometimes there are &lt;em&gt;zero&lt;/em&gt; people at the organization who can answer them, and somebody has to be tasked with digging in like a researcher to figure it out.&lt;/p&gt;
&lt;p&gt;How can this be? Shouldn’t the engineers who built the software know what it does? Aren’t these answers documented internally? Better yet, aren’t these questions trivially answerable by looking at the public-facing documentation for end users? Tech companies are full of well-paid people who know what they’re doing&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. Why aren’t those people able to get clear on what their own product does?&lt;/p&gt;
&lt;h3&gt;Software is hard&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Large software products are prohibitively complicated&lt;/strong&gt;. I wrote a lot more about this in &lt;a href=&quot;/wicked-features&quot;&gt;&lt;em&gt;Wicked Features&lt;/em&gt;&lt;/a&gt;, but the short version is you can capture a &lt;em&gt;lot&lt;/em&gt; of value by adding complicated features. The classic examples are features that make the core product available to more users. For instance, the ability to self-host the software, or to trial it for free, or to use it as a large organization with centralized policy controls, or to use it localized in different languages, or to use it in countries with strict laws around how software can operate, or for highly-regulated customers like governments to use the software, and so on. These features are (hopefully) transparent to most users, &lt;strong&gt;but they cannot be transparent to the tech company itself&lt;/strong&gt;. &lt;/p&gt;
&lt;p&gt;Why are these features complicated? Because &lt;strong&gt;they affect every single other feature you build&lt;/strong&gt;. If you add organizations and policy controls, you must build a policy control for every new feature you add. If you localize your product, you must include translations for every new feature. And so on. Eventually you’re in a position where you’re trying to figure out whether a self-hosted enterprise customer in the EU is entitled to access a particular feature, and &lt;em&gt;nobody knows&lt;/em&gt; - you have to go and read through the code or do some experimenting to figure it out.&lt;/p&gt;
&lt;p&gt;Couldn’t you just not build these features in the first place? Sure, but it leaves a &lt;em&gt;lot&lt;/em&gt; of money on the table&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. In fact, maybe the biggest difference between a small tech company and a big one is that the big tech company is set up to capture a lot more value by pursuing all of these fiddly, awkward features.&lt;/p&gt;
&lt;h3&gt;Documentation&lt;/h3&gt;
&lt;p&gt;Why can’t you just document the interactions once when you’re building each new feature? I think this could work in theory, with a lot of effort and top-down support, but in practice it’s just really hard.&lt;/p&gt;
&lt;p&gt;The core problem is that &lt;strong&gt;the system is rapidly changing as you try to document it&lt;/strong&gt;. Even a single person can document a complex static system, given enough time, because they can just slowly work their way through it. But once the system starts changing, the people trying to document it now need to work faster than the rate of change in the system. It may be literally impossible to document it without implausible amounts of manpower.&lt;/p&gt;
&lt;p&gt;Worse, many behaviors of the system don’t necessarily have a lot of conscious intent behind them (or any). They just emerge from the way the system is set up, as interactions of a series of “default” choices. So the people working on the documentation are not just writing down choices made by engineers, &lt;strong&gt;they’re discovering how the system works for the first time&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;So who knows the answer?&lt;/h3&gt;
&lt;p&gt;The only reliable way to answer many of these questions is to look at the codebase. I think that’s actually the structural cause of why engineers have institutional power at large tech companies. Of course, engineers are the ones who &lt;em&gt;write&lt;/em&gt; software, but it’s almost more important that they’re the ones who can &lt;strong&gt;answer questions about software&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;In fact, &lt;strong&gt;the ability to answer questions about software is one of the core functions of an engineering team&lt;/strong&gt;. The best understanding of a piece of software usually lives in the heads of the engineers who are working with it every day. If a codebase is owned by a healthy engineering team, you often don’t need anybody to go and investigate - you can simply ask the team as a whole, and at least one engineer will know the answer off the top of their head, because they’re already familiar with that part of the code.&lt;/p&gt;
&lt;p&gt;When tech companies reorg teams, they often destroy this tacit knowledge. If there’s no team with experience in a piece of software, questions have to be answered by &lt;em&gt;investigation&lt;/em&gt;: some engineer has to go and find out. Typically this happens by some combination of interacting with the product (maybe in a dev environment where it’s easy to set up particular scenarios), reading through the codebase, or even performing “exploratory surgery” to see what happens when you change bits of code or force certain checks to always return true. This is a separate technical skill from writing code (though of course the two skills are related.)&lt;/p&gt;
&lt;h3&gt;It’s easier to write software than to explain it&lt;/h3&gt;
&lt;p&gt;In my experience, most engineers can write software, but few can reliably answer questions about it. I don’t know why this should be so. Don’t you need to answer questions about software in order to write new software? Nevertheless, it’s true. My best theory is that &lt;strong&gt;it’s a confidence thing&lt;/strong&gt;. Many engineers would rather be on the hook for their &lt;em&gt;code&lt;/em&gt; (which at least works on their machine) than their &lt;em&gt;answers&lt;/em&gt; (which could be completely wrong).&lt;/p&gt;
&lt;p&gt;I wrote about this in &lt;a href=&quot;/clarity&quot;&gt;&lt;em&gt;How I provide technical clarity to non-technical leaders&lt;/em&gt;&lt;/a&gt;. The core difficulty is that &lt;strong&gt;you’re always going out on a limb&lt;/strong&gt;. You have to be comfortable with the possibility that you’re dead wrong, which is a different mindset to writing code (where you can often prove that your work is correct). You’re also able to be as verbose as you like when writing code - certainly when writing tests - but when you’re answering questions you have to boil things down to a summary. Many software engineers &lt;em&gt;hate&lt;/em&gt; leaving out details.&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;Non-technical people - at least, ones without a lot of experience working with software products - often believe that software systems are well-understood by the engineers who build them. The idea here is that the system should be understandable because it’s built line-by-line from (largely) deterministic components.&lt;/p&gt;
&lt;p&gt;However, while this may be true of small pieces of software, &lt;strong&gt;this is almost never true of large software systems&lt;/strong&gt;. Large software systems are very poorly understood, even by the people most in a position to understand them. Even really basic questions about what the software does often require &lt;em&gt;research&lt;/em&gt; to answer. And once you do have a solid answer, it may not be solid for long - each change to a codebase can introduce nuances and exceptions, so you’ve often got to go research the same question multiple times.&lt;/p&gt;
&lt;p&gt;Because of all this, &lt;strong&gt;the ability to accurately answer questions about large software systems is extremely valuable&lt;/strong&gt;.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I’m not being sarcastic here, I think this is literally true and if you disagree you’re being misled by your own cynicism.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I first read this point from &lt;a href=&quot;https://danluu.com/sounds-easy/&quot;&gt;Dan Luu&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[AI detection tools cannot prove that text is AI-generated]]></title><link>https://seangoedecke.com/ai-detection/</link><guid isPermaLink="false">https://seangoedecke.com/ai-detection/</guid><pubDate>Fri, 05 Dec 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The runaway success of generative AI has spawned a &lt;a href=&quot;https://www.novaoneadvisor.com/report/ai-detector-market&quot;&gt;billion-dollar&lt;/a&gt; sub-industry of “AI detection tools”: tools that purport to tell you if a piece of text was written by a human being or generated by an AI tool like ChatGPT. How could that possibly work?&lt;/p&gt;
&lt;p&gt;I think these tools are both impressive and useful, and will likely get better. However, I am very worried about the general public overestimating how reliable they are. &lt;strong&gt;AI detection tools cannot prove that text is AI-generated.&lt;/strong&gt; &lt;/p&gt;
&lt;h3&gt;Why AI detection is hard&lt;/h3&gt;
&lt;p&gt;My initial reaction when I heard about these tools was “there’s no way that could ever work”. I think that initial reaction is broadly correct, because the core idea of AI detection tools - that there is an intrinsic difference between human-generated writing and AI-generated writing - is just fundamentally mistaken&lt;sup id=&quot;fnref-0&quot;&gt;&lt;a href=&quot;#fn-0&quot; class=&quot;footnote-ref&quot;&gt;0&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Large language models learn from huge training sets of human-written text. They learn to generate text that is as close as possible to the text in their training data. It’s this data that determines the basic “voice” of an AI model, not anything about the fact that it’s an AI model. A model trained on Shakespeare will sound like Shakespeare, and so on. You could train a thousand different models on a thousand different training sets without finding a common “model voice” or signature that all of them share.&lt;/p&gt;
&lt;p&gt;We can thus say (almost &lt;em&gt;a priori&lt;/em&gt;) that &lt;strong&gt;AI detection tools cannot prove that text is AI-generated.&lt;/strong&gt; Anything generated by a language model is &lt;em&gt;by definition&lt;/em&gt; the kind of thing that could have been generated by a human.&lt;/p&gt;
&lt;h3&gt;Why AI detection tools might work anyway&lt;/h3&gt;
&lt;p&gt;But of course it’s possible to tell when something was written by AI! When I read Twitter replies, the obviously-LLM-generated ones stick out like a sore thumb. I wrote about this in &lt;a href=&quot;/on-slop&quot;&gt;&lt;em&gt;Why does AI slop feel so bad to read?&lt;/em&gt;&lt;/a&gt;. How can this be possible, when it’s impossible to prove that something was written by AI?&lt;/p&gt;
&lt;p&gt;Part of the answer here might just be that &lt;strong&gt;current-generation AI models have a really annoying “house style”, and any humans writing in the same style are annoying in the same way&lt;/strong&gt;. When I read the first sentence of a blog post and think “oh, this is AI slop, no need to keep reading”, I don’t actually care whether it’s AI or not. If it’s a human, they’re still writing in the style of AI slop, and I still don’t want to read the rest of the post.&lt;/p&gt;
&lt;p&gt;However, I think there’s more going on here. Claude does kind of sound like ChatGPT a lot of the time, even though they’re different models trained in different ways on (at least partially) different data. I think the optimistic case for AI detection tooling goes something like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback&quot;&gt;RLHF&lt;/a&gt; and instruction/safety tuning pushes all strong LLMs towards the same kind of tone and style&lt;/li&gt;
&lt;li&gt;That tone and style can be automatically detected by training a classifier model&lt;/li&gt;
&lt;li&gt;Sure, it’s possible for technically-sophisticated users to use &lt;a href=&quot;https://huggingface.co/blog/mlabonne/abliteration&quot;&gt;abliterated&lt;/a&gt; LLMs or less-safety-tuned open models, but 99% of users will just be using ChatGPT or Claude (particularly if they’re lazy enough to cheat on their essays in the first place)&lt;/li&gt;
&lt;li&gt;Thus a fairly simple “ChatGPT/Claude/Gemini prose style detector” can get you 90% of the way towards detecting most people using LLMs to write their essays&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I find this fairly compelling, &lt;strong&gt;so long as you’re okay with a 90% success rate&lt;/strong&gt;. A 90% success rate can be surprisingly bad if the base rate is low, as illustrated by the classic &lt;a href=&quot;https://tomrocksmaths.com/2021/08/31/bayes-theorem-and-disease-testing/&quot;&gt;Bayes’ theorem example&lt;/a&gt;. If 10% of essays in a class are AI-written, and your detector is 90% accurate, then &lt;em&gt;only half&lt;/em&gt; of the essays it flags will be truly AI-written. If an AI detection tool thinks a piece of writing is AI, you should treat that as “kind of suspicious” instead of conclusive proof.&lt;/p&gt;
&lt;h3&gt;How do AI detection tools work?&lt;/h3&gt;
&lt;p&gt;There are a few different approaches to building AI detection tools. The naive approach - which I couldn’t find any actual production examples of - would be to train a simple text classifier on a body of human-written and AI-written text. Apparently this doesn’t work particularly well. The &lt;a href=&quot;https://arxiv.org/pdf/2305.15047&quot;&gt;Ghostbuster paper&lt;/a&gt; tried this and decided that it was easier to train a classifier on the logits themselves: they pass each candidate document through a bunch of simple LLMs, record how much each LLM “agreed” with the text, then train their classifier on that data. &lt;a href=&quot;https://arxiv.org/abs/2305.17359&quot;&gt;DNA-GPT&lt;/a&gt; takes an even simpler approach: they truncate a candidate document, regenerate the last half via frontier LLMs, and then compare that with the actual last half.&lt;/p&gt;
&lt;p&gt;The most impressive paper I’ve seen is &lt;a href=&quot;https://arxiv.org/pdf/2510.03154&quot;&gt;the EditLens paper&lt;/a&gt; by Pangram Labs. EditLens trains a model on text that was &lt;em&gt;edited&lt;/em&gt; by AI to various extents, not generated from scratch, so the model can learn to predict the granular degree of AI involvement in a particular text. This plausibly gets you a much better classifier than a strict “AI or not” classifier model, because each example teaches the model a numeric &lt;em&gt;value&lt;/em&gt; instead of a single bit of information.&lt;/p&gt;
&lt;p&gt;One obvious point: &lt;strong&gt;all of these tools use AI themselves&lt;/strong&gt;. There is simply no way to detect the presence of AI writing without either training your own model or running inference via existing frontier models. This is bad news for the most militantly anti-AI people, who would prefer not to use AI for any reason, even to catch other people using AI. It also means that - as I said earlier and will say again - &lt;strong&gt;AI detection tools cannot prove that text is AI-generated&lt;/strong&gt;. Even the best detection tools can only say that it’s extremely likely.&lt;/p&gt;
&lt;h3&gt;Humanizing tools&lt;/h3&gt;
&lt;p&gt;Interestingly, there’s a sub-sub-industry of “humanizing” tools that aim to convert your AI-generated text into text that will be judged by AI detection tools as “human”. Some free AI detection tools are actually sales funnels for these humanizing tools, and will thus deliberately produce a lot of false-positives so users will pay for the humanizing service. For instance, I ran one of my blog posts&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; through &lt;a href=&quot;https://justdone.com/ai-detector&quot;&gt;JustDone&lt;/a&gt;, which assessed them as 90% AI generated and offered to fix it up for the low, low price of $40 per month.&lt;/p&gt;
&lt;p&gt;These tools don’t say this outright, but of course the “humanizing” process involves passing your writing through a LLM that’s either prompted or fine-tuned to produce less-LLM-sounding content. I find this pretty ironic. There are probably a bunch of students who have been convinced by one of these tools to make their human-written essay LLM-generated, out of (justified) paranoia that a false-positive would get them in real trouble with their school or university.&lt;/p&gt;
&lt;h3&gt;False positives and social harm&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;It is to almost everyone’s advantage to pretend that these tools are better than they are.&lt;/strong&gt; The companies that make up the billion-dollar AI detection tool industry obviously want to pretend like they’re selling a perfectly reliable tool. University and school administrators want to pretend like they’ve got the problem under control. People on the internet like dunking on people by posting a screenshot that “proves” they’re copying their messages from ChatGPT.&lt;/p&gt;
&lt;p&gt;Even the AI labs themselves would like to pretend that AI detection is easy and reliable, since it would relieve them of some of the responsibility they bear for effectively wrecking the education system. OpenAI actually released their own &lt;a href=&quot;https://openai.com/index/new-ai-classifier-for-indicating-ai-written-text/&quot;&gt;AI detection tool&lt;/a&gt; in January 2023, before &lt;a href=&quot;https://decrypt.co/149826/openai-quietly-shutters-its-ai-detection-tool&quot;&gt;retiring it&lt;/a&gt; six months later due to “its low rate of accuracy”.&lt;/p&gt;
&lt;p&gt;The real people who suffer from this mirage are the people who are trying to write, but now have to deal with being mistakenly judged for passing AI writing off as their own. I know students who are second-guessing how they write in order to sound “less like AI”, or who are recording their keystrokes or taking photos of drafts in order to have some kind of evidence that they can use against false positives.&lt;/p&gt;
&lt;p&gt;If you are in a position where you’re required to judge if people are using AI to write their articles or essays, I would urge you to be realistic about the capabilities of AI detection tooling. They can make educated guesses about whether text was written by AI, but that’s all they are: educated guesses. That goes double if you’re using a detection tool that also offers a “humanizing” service, since those tools are incentivized to produce false positives.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI detection tools cannot prove that text is AI-generated.&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-0&quot;&gt;
&lt;p&gt;People sometimes talk about &lt;em&gt;watermarking&lt;/em&gt;: when a provider like OpenAI deliberately trains their model to output text in some cryptographic way that leaves a very-hard-to-fake fingerprint. For instance, maybe it could always output text where the frequency of “e”s divided by the frequency of “l”s approximates pi. That would be very hard for humans to copy! I suspect there’s &lt;em&gt;some&lt;/em&gt; kind of watermarking going on already (OpenAI models output weird space characters, which might trip up people naively copy-pasting their content) but I’m not going to talk about it in this post, because (a) sophisticated watermarking harms model capability so I don’t think anyone’s doing it, and (b) unsophisticated watermarking is easily avoided.&lt;/p&gt;
&lt;a href=&quot;#fnref-0&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I write every one of these posts with my own human fingers.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[How good engineers write bad code at big companies]]></title><link>https://seangoedecke.com/bad-code-at-big-companies/</link><guid isPermaLink="false">https://seangoedecke.com/bad-code-at-big-companies/</guid><pubDate>Sat, 29 Nov 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Every couple of years &lt;a href=&quot;https://ziglang.org/news/migrating-from-github-to-codeberg/&quot;&gt;somebody&lt;/a&gt; &lt;a href=&quot;https://github.com/microsoft/terminal/issues/10362&quot;&gt;notices&lt;/a&gt; that large tech companies sometimes produce surprisingly sloppy code. If you haven’t worked at a big company, it might be hard to understand how this happens. Big tech companies pay well enough to attract many competent engineers. They move slowly enough that it looks like they’re able to take their time and do solid work. How does bad code happen?&lt;/p&gt;
&lt;h3&gt;Most code changes are made by relative beginners&lt;/h3&gt;
&lt;p&gt;I think the main reason is that &lt;strong&gt;big companies are full of engineers working outside their area of expertise&lt;/strong&gt;. The average big tech employee stays for only &lt;a href=&quot;https://stackoverflow.blog/2022/04/19/whats-the-average-tenure-of-an-engineer-at-a-big-tech-company-ep-434/&quot;&gt;a year or two&lt;/a&gt;&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. In fact, big tech compensation packages are typically designed to put a four-year cap on engineer tenure: after four years, the initial share grant is fully vested, causing engineers to take what can be a 50% pay cut. Companies do extend temporary yearly refreshes, but it obviously incentivizes engineers to go find another job where they don’t have to wonder if they’re going to get the other half of their compensation each year.&lt;/p&gt;
&lt;p&gt;If you count internal mobility, it’s even worse. The longest I have ever stayed on a single team or codebase was three years, near the start of my career. I expect to be &lt;a href=&quot;https://www.youtube.com/watch?v=yDcaRklX7q4&quot;&gt;re-orged&lt;/a&gt; at least every year, and often much more frequently.&lt;/p&gt;
&lt;p&gt;However, the average tenure of a codebase in a big tech company is a lot longer than that. Many of the services I work on are a decade old or more, and have had many, many different owners over the years. That means many big tech engineers are constantly “figuring it out”. &lt;strong&gt;A pretty high percentage of code changes are made by “beginners”:&lt;/strong&gt; people who have onboarded to the company, the codebase, or even the programming language in the past six months.&lt;/p&gt;
&lt;h3&gt;Old hands&lt;/h3&gt;
&lt;p&gt;To some extent, this problem is mitigated by “old hands”: engineers who happen to have been in the orbit of a particular system for long enough to develop real expertise. These engineers can give deep code reviews and reliably catch obvious problems. But relying on “old hands” has two problems. &lt;/p&gt;
&lt;p&gt;First, &lt;strong&gt;this process is entirely informal&lt;/strong&gt;. Big tech companies make surprisingly little effort to develop long-term expertise in individual systems, and once they’ve got it they seem to barely care at all about retaining it. Often the engineers in question are moved to different services, and have to either keep up their “old hand” duties on an effectively volunteer basis, or abandon them and become a relative beginner on a brand new system.&lt;/p&gt;
&lt;p&gt;Second, &lt;strong&gt;experienced engineers are always overloaded&lt;/strong&gt;. It is a &lt;em&gt;busy&lt;/em&gt; job being one of the few engineers who has deep expertise on a particular service. You don’t have enough time to personally review every software change, or to be actively involved in every decision-making process. Remember that &lt;em&gt;you also have your own work to do&lt;/em&gt;: if you spend all your time reviewing changes and being involved in discussions, you’ll likely be punished by the company for not having enough individual output.&lt;/p&gt;
&lt;h3&gt;The median productive engineer&lt;/h3&gt;
&lt;p&gt;Putting all this together, what does the median productive&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; engineer at a big tech company look like? They are usually:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;competent enough to pass the hiring bar and be able to do the work, but either&lt;/li&gt;
&lt;li&gt;working on a codebase or language that is largely new to them, or&lt;/li&gt;
&lt;li&gt;trying to stay on top of a flood of code changes while also juggling their own work.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;They are almost certainly working to a deadline, or to a series of overlapping deadlines for different projects. In other words, &lt;strong&gt;they are trying to do their best in an environment that is not set up to produce quality code.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;That’s how “obviously” bad code happens. For instance, a junior engineer picks up a ticket for an annoying bug in a codebase they’re barely familiar with. They spend a few days figuring it out and come up with a hacky solution. One of the more senior “old hands” (if they’re lucky) glances over it in a spare half-hour, vetoes it, and suggests something slightly better that would at least work. The junior engineer implements that as best they can, tests that it works, it gets briefly reviewed and shipped, and everyone involved immediately moves on to higher-priority work. Five years later somebody notices this&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; and thinks “wow, that’s hacky - how did such bad code get written at such a big software company”?&lt;/p&gt;
&lt;h3&gt;Big tech companies are fine with this&lt;/h3&gt;
&lt;p&gt;I have written a lot about the internal tech company dynamics that contribute to this. Most directly, in &lt;a href=&quot;/seeing-like-a-software-company&quot;&gt;&lt;em&gt;Seeing like a software company&lt;/em&gt;&lt;/a&gt; I argue that big tech companies consistently prioritize internal &lt;em&gt;legibility&lt;/em&gt; - the ability to see at a glance who’s working on what and to change it at will - over productivity. Big companies know that treating engineers as fungible and moving them around destroys their ability to develop long-term expertise in a single codebase. &lt;strong&gt;That’s a deliberate tradeoff.&lt;/strong&gt; They’re giving up some amount of expertise and software quality in order to gain the ability to rapidly deploy skilled engineers onto whatever the problem-of-the-month is.&lt;/p&gt;
&lt;p&gt;I don’t know if this is a good idea or a bad idea. It certainly seems to be working for the big tech companies, particularly now that “how fast can you pivot to something AI-related” is so important. But if you’re doing this, then &lt;em&gt;of course&lt;/em&gt; you’re going to produce some genuinely bad code. That’s what happens when you ask engineers to rush out work on systems they’re unfamiliar with.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Individual engineers are entirely powerless to alter this dynamic&lt;/strong&gt;. This is particularly true in 2025, when &lt;a href=&quot;/good-times-are-over&quot;&gt;the balance of power has tilted&lt;/a&gt; away from engineers and towards tech company leadership. The most you can do as an individual engineer is to try and become an “old hand”: to develop expertise in at least one area, and to use it to block the worst changes and steer people towards at least minimally-sensible technical decisions. But even that is often swimming against the current of the organization, and if inexpertly done can cause you to get &lt;a href=&quot;https://www.reddit.com/r/csMajors/comments/1et7miz/what_you_need_to_know_about_performance/&quot;&gt;PIP-ed&lt;/a&gt; or worse.&lt;/p&gt;
&lt;h3&gt;Pure and impure engineering&lt;/h3&gt;
&lt;p&gt;I think a lot of this comes down to the distinction between &lt;a href=&quot;/pure-and-impure-engineering&quot;&gt;pure and impure software engineering&lt;/a&gt;. To pure engineers - engineers working on self-contained technical projects, like &lt;a href=&quot;https://ziglang.org/&quot;&gt;a programming language&lt;/a&gt; - the only explanation for bad code is incompetence. But impure engineers operate more like plumbers or electricians. They’re working to deadlines on projects that are relatively new to them, and even if their technical fundamentals are impeccable, there’s always &lt;em&gt;something&lt;/em&gt; about the particular setup of this situation that’s awkward or surprising. To impure engineers, bad code is inevitable. As long as the overall system works well enough, the project is a success.&lt;/p&gt;
&lt;p&gt;At big tech companies, engineers don’t get to decide if they’re working on pure or impure engineering work. It’s &lt;a href=&quot;/not-your-codebase&quot;&gt;not their codebase&lt;/a&gt;! If the company wants to move you from working on database infrastructure to building the new payments system, they’re fully entitled to do that. The fact that you might make some mistakes in an unfamiliar system - or that your old colleagues on the database infra team might suffer without your expertise - is a deliberate tradeoff being made by &lt;strong&gt;the company, not the engineer&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;It’s fine to point out examples of bad code at big companies. If nothing else, it can be an effective way to get those specific examples fixed, since execs usually jump at the chance to turn bad PR into good PR. But I think it’s a mistake&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; to attribute primary responsibility to the engineers at those companies. If you could wave a magic wand and make every engineer twice as strong, &lt;em&gt;you would still have bad code&lt;/em&gt;, because almost nobody can come into a brand new codebase and quickly make changes with zero mistakes. The root cause is that &lt;strong&gt;most big company engineers are forced to do most of their work in unfamiliar codebases&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;edit: this post got lots of comments on both &lt;a href=&quot;https://news.ycombinator.com/item?id=46082223&quot;&gt;Hacker News&lt;/a&gt; and &lt;a href=&quot;https://lobste.rs/s/jxppk7/how_good_engineers_write_bad_code_at_big&quot;&gt;lobste.rs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It was surprising to me that &lt;a href=&quot;https://lobste.rs/c/glawav&quot;&gt;many&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=46083941&quot;&gt;commenters&lt;/a&gt; find this point of view unplesasantly nihilistic. I consider myself fairly optimistic about my work. In fact, I meant this post as a rousing defence of big tech software engineers from their &lt;a href=&quot;https://ziglang.org/news/migrating-from-github-to-codeberg&quot;&gt;critics&lt;/a&gt;! Still, I found this &lt;a href=&quot;https://alexwennerberg.com/blog/2025-11-28-engineering.html&quot;&gt;response blog post&lt;/a&gt; to be an excellent articulation of the “this is too cynical” position, and will likely write a followup post about it soon (edit: &lt;a href=&quot;/a-little-bit-cynical&quot;&gt;&lt;em&gt;Software engineers should be a little bit cynical&lt;/em&gt;&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Some Hacker News commenters had alternate theories for why bad code happens: &lt;a href=&quot;https://news.ycombinator.com/item?id=46083625&quot;&gt;lack of motivation&lt;/a&gt;, deliberately &lt;a href=&quot;https://news.ycombinator.com/item?id=46082989&quot;&gt;demoralizing engineers&lt;/a&gt; so they won’t unionize, or just purely optimizing for speed. I don’t find these compelling, based on my own experience. Many of my colleagues are highly motivated, and I just don’t believe any tech company is deliberately trying to make its engineers demoralized and unhappy.&lt;/p&gt;
&lt;p&gt;A few readers &lt;a href=&quot;https://lobste.rs/c/gq4ao9&quot;&gt;disagreed with me&lt;/a&gt; about RSUs providing an incentive to leave, because their companies give stock refreshers. I don’t know about this. I get refreshers too, but if they’re not in the contract, then I don’t think it matters - the company can decide not to give you 50% of your comp at-will by just pausing the refreshers, which is an incentive to move jobs so it’s locked in for four more years.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I struggled to find a good original source on this. There’s a 2013 PayScale &lt;a href=&quot;https://www.payscale.com/data-packages/employee-loyalty/least-loyal-employees&quot;&gt;report&lt;/a&gt; citing a 1.1 year median turnover at Google, which seems low.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Many engineers at big tech companies are not productive, but that’s a post all to itself. I don’t want to get into it here for two reasons. First, I think competent engineers produce enough bad code that it’s fine to be a bit generous and just scope the discussion to them. Second, even if an incompetent engineer wrote the code, there’s almost always competent engineers who could have reviewed it, and the question of why that didn’t happen is still interesting.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;The example I’m thinking of here is not the &lt;a href=&quot;https://ziglang.org/news/migrating-from-github-to-codeberg/&quot;&gt;recent GitHub Actions one&lt;/a&gt;, which I have no first-hand experience of. I can think of at least ten separate instances of this happening to me.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;In my view, mainly a failure of &lt;em&gt;imagination&lt;/em&gt;: thinking that your own work environment must be pretty similar to everyone else’s.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Becoming unblockable]]></title><link>https://seangoedecke.com/unblockable/</link><guid isPermaLink="false">https://seangoedecke.com/unblockable/</guid><pubDate>Wed, 26 Nov 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;With enough careful effort, it’s possible to become unblockable. In other words, you can put yourself in a position where you’re always able to make forward progress on your goals.&lt;/p&gt;
&lt;p&gt;I wrote about this six months ago in &lt;a href=&quot;/becoming-unblockable&quot;&gt;&lt;em&gt;Why strong engineers are rarely blocked&lt;/em&gt;&lt;/a&gt;, but I wanted to take another crack at it and give some more concrete advice.&lt;/p&gt;
&lt;h3&gt;Work on more than one thing&lt;/h3&gt;
&lt;p&gt;The easiest way to avoid being blocked is to have more than one task on the go. Like a CPU thread, if you’re responsible for multiple streams of work, you can deal with one stream getting blocked by rolling onto another one. While one project might be blocked, &lt;em&gt;you&lt;/em&gt; are not: you can continue getting stuff done.&lt;/p&gt;
&lt;p&gt;Because of this, I almost always have more than one task on my plate. However, there’s a lot of nuance involved in doing this correctly. The worst thing you can do is to be responsible for two urgent tasks at the same time - no matter how hard you work, one of them will always be making no progress, which is very bad&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. If you’ve got too many ongoing tasks at the same time, you also risk overloading yourself if one or two of them suddenly blow out. It’s famously hard to scope engineering work. In a single day, you can go from having two or three trivial tasks to having three big jobs at the same time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I do not recommend just mindlessly picking up an extra ticket from your project board.&lt;/strong&gt; Instead, try to have some non-project work floating around: refactors, performance work, writing performance reviews, mandatory training, and so on. It can be okay to pick up an extra ticket if you’re tactical about which ticket you pick up. Try to avoid having two important tasks on the go at the same time.&lt;/p&gt;
&lt;h3&gt;Sequence your work correctly&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Plan out projects from the start to minimize blockers.&lt;/strong&gt; This section is more relevant for projects that you yourself are running, but the principle holds even for smaller pieces of work.&lt;/p&gt;
&lt;p&gt;If you think something is likely to get blocked (for instance, maybe database migrations at your company are run by a dedicated team with a large backlog), &lt;strong&gt;do it as early as possible&lt;/strong&gt;. That way you can proceed with the rest of the project while you wait. Getting this wrong can add weeks to a project. Likewise, if there’s a part of your project that’s likely to be controversial, do it early so you can keep working on the rest of the project while the debate rages on. &lt;/p&gt;
&lt;h3&gt;Be ruthless about your tooling&lt;/h3&gt;
&lt;p&gt;Do &lt;em&gt;whatever it takes&lt;/em&gt; to have a stable and reliable developer environment. I don’t think it’s possible to overstate the importance of this. The stability of your developer environment directly determines how much of a workday you can spend actually doing work.&lt;/p&gt;
&lt;p&gt;For instance, &lt;strong&gt;use as normal a developer stack as possible&lt;/strong&gt;. At GitHub, most development is done in &lt;a href=&quot;https://github.com/features/codespaces&quot;&gt;Codespaces&lt;/a&gt;, a platform for server-hosted dev containers. You can connect to a codespace with almost any IDE, but the majority of people use VSCode, &lt;em&gt;so I use VSCode&lt;/em&gt;, with as few plugins as possible&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. I think a lot of developers are too focused on their personal “top speed” with their developer environment when everything is working great, and under-emphasize how much time they spend tweaking config, patching dotfiles, and troubleshooting in general.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix developer environment problems as quickly as production incidents.&lt;/strong&gt; If you can’t run tests or run a local server, don’t half-ass the troubleshooting process - focus on it until it’s fixed. On the flip side, don’t treat it as a leisurely learning experience (say, about how MacOS handles Dockerized networking). In many circumstances you’re probably better off tearing down and re-creating everything than digging in and trying to patch the specific issue.&lt;/p&gt;
&lt;p&gt;If your developer environment really is irreparably broken - maybe you’re waiting on new hardware, or you’re making a one-off change to a service that you don’t have the right dev environment permissions for - &lt;strong&gt;be scrappy about finding alternatives&lt;/strong&gt;. If you can’t run tests, your GitHub CI probably can. If you can’t run a server locally, can you deploy to a staging environment and test there? Be careful about doing this in your main developer environment. You’re usually better off spending the time to actually fix the problem. But when you can’t, you should be creative about how you can keep working instead of just giving up.&lt;/p&gt;
&lt;h3&gt;Debug outside of your area of responsibility&lt;/h3&gt;
&lt;p&gt;I see a lot of engineers run into a weird thing - commonly a 403 or 400 status code from some other service - and say “oh, I’m blocked, I need this other service’s owners to investigate”. &lt;strong&gt;You can and should investigate yourself.&lt;/strong&gt; This is particularly true if you’ve got access to the codebase. If you’re getting an error, go and search their codebase to see what could be causing the error. Find the logs for your request to see if there’s anything relevant there. Of course, you won’t be able to dig as deep as engineers with real domain expertise, but often &lt;strong&gt;it doesn’t take domain expertise&lt;/strong&gt; to solve your particular problem.&lt;/p&gt;
&lt;p&gt;There’s even less excuse not to do this now that AI agents are ubiquitous. Point Codex (or Copilot agent mode, or Claude Code, or whatever you have access to) at the codebase in question and ask “why might I be seeing this error with this specific request?” In my experience, you get the correct answer about a third of the time, which is &lt;em&gt;amazing&lt;/em&gt;. Instead of waiting for hours or days to get help, you can spend ten minutes waiting for the agent and half an hour checking its work.&lt;/p&gt;
&lt;p&gt;Even if you can’t solve the problem yourself, &lt;strong&gt;a bit of research can often make your request for help much more compelling&lt;/strong&gt;. As a service owner, there’s nothing more dispiriting than getting a “help, I get weird 400 errors” message - you know you’re going to spend a lot of time trawling through the logs before you can even figure out what the problem is, let alone how to reproduce it. But if the message already contains a link to the logs, or the text of a specific error, that immediately tells you where to start looking.&lt;/p&gt;
&lt;h3&gt;Build relationships&lt;/h3&gt;
&lt;p&gt;There are typically two ways to do anything in a large tech company: the formal, &lt;a href=&quot;/seeing-like-a-software-company&quot;&gt;legible&lt;/a&gt; way, and the informal way. As an example, it’s common to have a “ask for code review” Slack channel, which is full of engineers posting their changes. But many engineers don’t use these channels at all. Instead, they ping each other for immediate reviews, which is a much faster process.&lt;/p&gt;
&lt;p&gt;Of course, you can’t just DM random engineers asking for them to review your PR. It might work in the short term, but people will get really annoyed with you. You have to &lt;strong&gt;build relationships&lt;/strong&gt; with engineers on every codebase you’d like to work on. If you’re extremely charismatic, maybe you can accomplish this with sheer force of will. But the rest of us have to build relationships by being useful: giving prompt and clear responses to questions from other teams, investigating bugs for them, reviewing their code, and so on.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The most effective engineers at are tech company typically have really strong relationships with engineers on many other different teams.&lt;/strong&gt; That isn’t to say that they operate entirely through backchannels, just that they have personal connections they can draw on when needed. If you’re blocked on work that another team is doing, it makes a huge difference having “someone on the inside”.&lt;/p&gt;
&lt;h3&gt;Acquire powerful allies&lt;/h3&gt;
&lt;p&gt;Almost all blockers at large tech companies can be destroyed with sufficient “air support”. Typically this means a director or VP who’s aware of your project and is willing to throw their weight around to unblock you. For instance, they might message the database team’s manager saying “hey, can you prioritize this migration”, or task their very-senior-engineer direct report with resolving some technical debate that’s delaying your work.&lt;/p&gt;
&lt;p&gt;You can’t get air support for everything you’d like to do - it just doesn’t work like that, unless the company is very dysfunctional or you have a &lt;em&gt;very&lt;/em&gt; good relationship with a senior manager. But you can choose to do things that align with what senior managers in the organizaton want, which can put you in a position to request support and get it. I wrote about this a lot more in &lt;a href=&quot;/how-to-influence-politics&quot;&gt;&lt;em&gt;How I influence tech company politics as a staff software engineer&lt;/em&gt;&lt;/a&gt;, but in one sentence: the trick is to have a bunch of possible project ideas in your back pocket, and then choose the ones that align with whatever your company cares about this month.&lt;/p&gt;
&lt;p&gt;Many engineers just don’t make use of the powerful allies they have. If you’re working on a high-priority project, the executive in charge of that project is unlikely to have the bandwidth to follow your work closely. They will be depending on you to go and tell them if you’re blocked and need their help.&lt;/p&gt;
&lt;p&gt;Unlike the relationships you may have with engineers on different teams, requesting air cover does not spend any credit. In fact, it often &lt;em&gt;builds&lt;/em&gt; it, by showing that you’re switched-on enough to want to be unblocked, and savvy enough to know you can ask for their help. Senior managers are usually quite happy to go and unblock you, if you’re clear enough about what exactly you need them to do.&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;To minimize the amount of time you spend blocked, I recommend:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Working on at least two things at a time, so when one gets blocked you can switch to the other&lt;/li&gt;
&lt;li&gt;Sequencing your work so potential blockers are discovered and started early&lt;/li&gt;
&lt;li&gt;Making a reliable developer environment a high priority, including avoiding unusual developer tooling&lt;/li&gt;
&lt;li&gt;Being willing to debug into other services that you don’t own&lt;/li&gt;
&lt;li&gt;Building relationships with engineers on other teams&lt;/li&gt;
&lt;li&gt;Making use of very senior managers to unblock you, when necessary&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;At some point somebody important will ask “why isn’t this task making any progress”, and you do not want the answer to be “I was working on something else”.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Before I joined GitHub, I worked entirely inside a terminal and neovim. I switched to VSCode entirely because of Codespaces. If I joined another company where most developers used JetBrains, I would immediately switch to JetBrains.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Why it takes months to tell if new AI models are good]]></title><link>https://seangoedecke.com/are-new-models-good/</link><guid isPermaLink="false">https://seangoedecke.com/are-new-models-good/</guid><pubDate>Sat, 22 Nov 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Nobody knows how to tell if current-generation models are any good&lt;/strong&gt;. When GPT-5 launched, the overall mood was very negative, and the consensus was that it wasn’t a strong model. But three months later it turns out that GPT-5 (and its derivative GPT-5-Codex) is a very strong model for agentic work&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;: enough to break Anthropic’s monopoly on agentic coding models. In fact, GPT-5-Codex is my preferred model for agentic coding. It’s slower than Claude Sonnet 4.5, but in my experience it gets more hard problems correct. Why did it take months for me to figure this out?&lt;/p&gt;
&lt;h3&gt;Evals systematically overstate how good frontier models are&lt;/h3&gt;
&lt;p&gt;The textbook solution for this problem is evals - datasets of test cases that models can be scored against - but &lt;strong&gt;evals are largely unreliable&lt;/strong&gt;. Many models score very well on evals but turn out to be useless in practice. There are a couple of reasons for this.&lt;/p&gt;
&lt;p&gt;First, &lt;strong&gt;it’s just really hard to write useful evals for real-world problems&lt;/strong&gt;, since real-world problems require an enormous amount of context. Can’t you take previous real-world problems and put them in your evals - for instance, by testing models on already-solved open-source issues? You can, but you run into two difficulties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Open-source coding is often meaningfully different from the majority of programming work. For more on this, see my comments in &lt;a href=&quot;/impact-of-ai-study&quot;&gt;&lt;em&gt;METR’S AI productivity study is really good&lt;/em&gt;&lt;/a&gt;, where I discuss an AI-productivity study that was done on open-source codebases.&lt;/li&gt;
&lt;li&gt;You’re still only covering a tiny subset of all programming work. For instance, the well-known SWE-Bench set of coding evals are just in Python. A model might be really good at Python but struggle with other languages.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Another problem is that &lt;strong&gt;evals are a target for AI companies&lt;/strong&gt;. How well Anthropic or OpenAI’s new models perform on evals has a direct effect on the stock price of those companies. It’d be naive to think that they don’t make some kind of effort to do well on evals: if not by directly training on public eval data&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, then by training on data that’s close enough to eval data to produce strong results. I’m fairly confident that big AI companies &lt;em&gt;will not release a model&lt;/em&gt; unless they can point to a set of evals that their model does better than competitors. So you can’t trust that strong evals will mean a strong model, because every single new model is released with strong evals.&lt;/p&gt;
&lt;h3&gt;Vibe checks are not reliable&lt;/h3&gt;
&lt;p&gt;If you can’t rely on evals to tell you if a new model is good, what can you rely on? For most people, the answer is the “vibe check”: interacting with the model themselves and making their own judgement.&lt;/p&gt;
&lt;p&gt;Often people use a set of their own pet questions, which are typically questions that other LLMs get wrong (say, word puzzles). Trick questions can be useful, but plenty of strong models struggle with specific trick questions for some reason. My sense is also that current models are too strong for obvious word puzzles. You used to be able to trip up models with straightforward questions like “If I put a ball in a box, then put the box in my pocket, where is the ball?” Now you have to be more devious, which gives less signal about how strong the model is.&lt;/p&gt;
&lt;p&gt;Sometimes people use artistic prompts. Simon Willison &lt;a href=&quot;https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/&quot;&gt;famously&lt;/a&gt; asks new models to produce a SVG of a pelican riding a bicycle. It’s now a common Twitter practice to post side-by-side “I asked two models to build an object in Minecraft” screenshots. This is cool - you can see at a glance that bigger models produce better images - but at some point it becomes difficult to draw conclusions from the images. If Claude Sonnet 4.5 puts the pelican’s feet on the pedals correctly, but GPT-5.1 adds spokes to the wheels, which model is better?&lt;/p&gt;
&lt;p&gt;Finally, many people rely on pure vibes: the intangible sense you get after using a model about whether it’s good or not. This is sometimes described as “big model smell”. I am fairly agnostic about people’s ability to determine model capability from vibes alone. It seems like something humans might be able to do, but also like something that would be very easy to fool yourself about. For instance, I would struggle to judge a model with the conversational style of GPT-4o as very smart, but there’s nothing in principle that would prevent that.&lt;/p&gt;
&lt;h3&gt;Evaluating practical use takes time&lt;/h3&gt;
&lt;p&gt;Of course, for people who engage in intellectually challenging pursuits, there’s an easy (if slow) way to evaluate model capability: just give it the problems you’re grappling with and see how it does. I often ask a strong agentic coding model to do a task I’m working on in parallel with my own efforts. If the model fails, it doesn’t slow me down much; if it succeeds, it catches something I don’t, or at least gives me a useful second opinion.&lt;/p&gt;
&lt;p&gt;The problem with this approach is that it takes a fair amount of time and effort to judge if a new model is any good, &lt;strong&gt;because you have to actually do the work&lt;/strong&gt;: if you’re not engaging with the problem yourself, you will have no idea if the model’s solution is any good or not. So testing out a new model can be risky. If it’s no good, you’ve wasted a fair amount of time and effort! I’m currently trying to decide whether to invest this effort into testing out Gemini 3 Pro or GPT-5.1-Codex - right now I’m still using GPT-5-Codex for most tasks, or Claude Sonnet 4.5 on some simpler problems.&lt;/p&gt;
&lt;h3&gt;Is AI progress stagnating?&lt;/h3&gt;
&lt;p&gt;Each new model release reignites the debate over whether AI progress is stagnating. The most prominent example is Gary Marcus, who has written that &lt;a href=&quot;https://cacm.acm.org/blogcacm/gpt-4s-successes-and-gpt-4s-failures/&quot;&gt;GPT-4&lt;/a&gt;, &lt;a href=&quot;https://garymarcus.substack.com/p/hot-take-on-openais-new-gpt-4o&quot;&gt;GPT-4o&lt;/a&gt;, &lt;a href=&quot;https://x.com/GaryMarcus/status/1803800800277545266?lang=en&quot;&gt;Claude 3.5 Sonnet&lt;/a&gt;, &lt;a href=&quot;https://garymarcus.substack.com/p/gpt-5-overdue-overhyped-and-underwhelming&quot;&gt;GPT-5&lt;/a&gt; and &lt;a href=&quot;https://garymarcus.substack.com/p/five-ways-in-which-the-last-3-months&quot;&gt;DeepSeek&lt;/a&gt; all prove that AI progress has hit a wall. But almost everyone who writes about AI seems to be interested in the topic. Each new model launch is watched to see if this is the end of the bubble, or if LLMs will continue to get more capable. The reason this debate never ends is that &lt;strong&gt;there’s no reliable way to tell if an AI model is good&lt;/strong&gt;. &lt;/p&gt;
&lt;p&gt;Suppose that base AI models were getting linearly smarter (i.e. that GPT-5 really was as far above GPT-4 as GPT-4 was above GPT-3.5, and so on). &lt;strong&gt;Would we actually be able to tell?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When you’re talking to someone who’s less smart than you&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, it’s very clear. You can see them failing to follow points you’re making, or they just straight up spend time visibly confused and contradicting themselves. But when you’re talking to someone smarter than you, it’s far from clear (to you) what’s going on. You can sometimes feel that you’re confused by what they say, but that doesn’t necessarily mean they’re smarter. It could be that they’re just talking nonsense. And smarter people won’t confuse you all the time - only when they fail to pitch their communication at your level. &lt;/p&gt;
&lt;p&gt;Talking with AI models is like that. GPT-3.5 was very clearly less smart than most of the humans who talked to it. It was mainly impressive that it was able to carry on a conversation at all. GPT-4 was probably on par with the average human (or a little better) in its strongest domains. GPT-5 (at least in thinking mode) is smarter than the average human across most domains, I believe.&lt;/p&gt;
&lt;p&gt;Suppose we had no objective way of measuring chess ability. Would I be able to tell if computer chess engines were continuing to get better? I’d certainly be impressed when the chess engines went from laughably bad to beating me every time. But I’m not particularly good at chess. I would lose to chess engines from the &lt;em&gt;early 1980s&lt;/em&gt;. It would thus seem to me as if chess engine progress had stalled out, when in fact modern chess engines have &lt;em&gt;double&lt;/em&gt; the rating of chess engines from the 1980s.&lt;/p&gt;
&lt;p&gt;I acknowledge that “the model is now at least partly smarter than you” is an underwhelming explanation for why AI models don’t appear to be rapidly getting better. It’s easy to point to cases where even strong models fall over. But it’s worth pointing out that &lt;strong&gt;if models were getting consistently smarter, this is what it would look like&lt;/strong&gt;: rapid subjective improvement as the models go from less intelligent than you to on par with you, and then an immediate plateau as the models surpass you and you become unable to tell how smart they are.&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Nobody knows how good a model is when it’s launched. Even the AI lab who built it are only guessing and hoping it’ll turn out to be effective for real-world use cases.&lt;/li&gt;
&lt;li&gt;Evals are mostly marketing tools. It’s hard to figure out how good the eval is, or if the model is being “taught to the test”. If you’re trying to judge models from their public evals you’re fighting against the billions of dollars of effort going into gaming the system.&lt;/li&gt;
&lt;li&gt;Vibe checks don’t test the kind of skills that are useful for real work, but testing a model by using it to do real work takes a lot of time. You can’t figure out if a brand new model is good that way.&lt;/li&gt;
&lt;li&gt;Because of all this, it’s very hard to tell if AI progress is stagnating or not. Are the models getting better? Are they any good right now?&lt;/li&gt;
&lt;li&gt;Compounding that problem, it’s hard to judge between two models that are both smarter than you (in a particular domain). If the models &lt;em&gt;do&lt;/em&gt; keep getting better, we might expect it to feel like they’re plateauing, because once they get better than us we’ll stop seeing evidence of improvement.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;By “agentic work” I mean “LLM with tools that runs in a loop”, like Copilot Agent Mode, Claude Code, and Codex. I haven’t yet tried GPT-5.1-Codex enough to have a strong opinion.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;If you train a model on the actual eval dataset itself, it will get very good at answering those specific questions, even if it’s not good at answering those &lt;em&gt;kinds&lt;/em&gt; of questions. This is often called “benchmaxxing”: prioritizing evals and benchmarks over actual capability.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I want to bracket the question of whether “smart” is a broad category, or how exactly to define it. I’m talking specifically about the way GPT-4 is smarter than GPT-3.5 - even if we can’t define exactly how, we know that’s a real thing.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Only three kinds of AI products actually work]]></title><link>https://seangoedecke.com/ai-products/</link><guid isPermaLink="false">https://seangoedecke.com/ai-products/</guid><pubDate>Sun, 16 Nov 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The very first LLM-based product, ChatGPT, was just&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; the ability to talk with the model itself: in other words, a pure chatbot. This is still the most popular LLM product by a large margin.&lt;/p&gt;
&lt;p&gt;In fact, given the amount of money that’s been invested in the industry, it’s shocking how many “new AI products” are just chatbots. As far as I can tell, &lt;strong&gt;there are only three types of AI product that currently work&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;Chatbots&lt;/h3&gt;
&lt;p&gt;For the first couple of years of the AI boom, all LLM products were chatbots. They were branded in a lot of different ways - maybe the LLM knew about your emails, or a company’s helpdesk articles - but the fundamental &lt;em&gt;product&lt;/em&gt; was just the ability to talk in natural language to an LLM.&lt;/p&gt;
&lt;p&gt;The problem with chatbots is that &lt;strong&gt;the best chatbot product is the model itself&lt;/strong&gt;. Most of the reason users want to talk with an LLM is generic: they want to ask questions, or get advice, or confess their sins, or do any one of a hundred things that have nothing to do with your particular product. &lt;/p&gt;
&lt;p&gt;In other words, your users will just use ChatGPT&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. AI labs have two decisive advantages over you: first, they will always have access to the most cutting-edge models before you do; and second, they can develop their chatbot harness simultaneously with the model itself (like how Anthropic specifically trains their models to be used in Claude Code, or OpenAI trains their models to be used in Codex).&lt;/p&gt;
&lt;h4&gt;Explicit roleplay&lt;/h4&gt;
&lt;p&gt;One way your chatbot product can beat ChatGPT is by doing what OpenAI won’t do: for instance, happily roleplaying an AI boyfriend or generating pornography. There is currently a very lucrative niche of products like this, which typically rely on less-capable but less-restrictive open-source models.&lt;/p&gt;
&lt;p&gt;These products have the problems I discussed above. But it doesn’t matter that their chatbots are less capable than ChatGPT or Claude: if you’re in the market for sexually explicit AI roleplay, and ChatGPT and Claude won’t do it, you’re going to take what you can get.&lt;/p&gt;
&lt;p&gt;I think there are serious ethical problems with this kind of product. But even practically speaking, this is a segment of the industry likely to be eaten alive by the big AI labs, as they become more comfortable pushing the boundaries of adult content. &lt;a href=&quot;https://tremendous.blog/2025/07/15/grok-companions-elons-ai-girlfriend/&quot;&gt;Grok Companions&lt;/a&gt; is already going down this pathway, and Sam Altman has &lt;a href=&quot;https://www.theverge.com/news/799312/openai-chatgpt-erotica-sam-altman-verified-adults&quot;&gt;said&lt;/a&gt; that OpenAI models will be more open to generating adult content in the future.&lt;/p&gt;
&lt;h4&gt;Chatbots with tools&lt;/h4&gt;
&lt;p&gt;There’s a slight variant on chatbots which gives the model &lt;em&gt;tools&lt;/em&gt;: so instead of just chatting with your calendar, you can ask the chatbot to book meetings, and so on. This kind of product is usually called an “AI assistant”.&lt;/p&gt;
&lt;p&gt;This doesn’t work well because &lt;strong&gt;savvy users can manipulate the chatbot into calling tools&lt;/strong&gt;. So you can never give a support chatbot real support powers like “refund this customer”, because the moment you do, thousands of people will immediately find the right way to jailbreak your chatbot into giving them money. You can only give your chatbots tools that the user could do themselves - in which case, your chatbot is competing with the usability of your actual product, and will likely lose.&lt;/p&gt;
&lt;p&gt;Why will your chatbot lose? Because &lt;strong&gt;chat is not a good user interface&lt;/strong&gt;. Users simply do not want to type out “hey, can you increase the font size for me” when they could simply hit “ctrl-plus” or click a single button&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;I think this is a hard lesson for engineers to learn. It’s tempting to believe that since chatbots have gotten 100x better, they must now be the best user interface for many tasks. Unfortunately, they started out 200x worse than a regular user interface, so they’re still twice as bad.&lt;/p&gt;
&lt;h3&gt;Completion&lt;/h3&gt;
&lt;p&gt;The second real AI product actually came out before ChatGPT did: GitHub Copilot. The idea behind the original Copilot product (and all its imitators, like Cursor Tab) is that a fast LLM can act as a smart autocomplete. By feeding the model the code you’re typing as you type it, a code editor can suggest autocompletions that actually write the rest of the function (or file) for you.&lt;/p&gt;
&lt;p&gt;The genius of this kind of product is that &lt;strong&gt;users never have to talk to the model&lt;/strong&gt;. Like I said above, chat is a bad user interface. LLM-generated completions allow users to access the power of AI models without having to change any part of their current workflow: they simply see the kind of autocomplete suggestions their editor was already giving them, but far more powerful.&lt;/p&gt;
&lt;p&gt;I’m a little surprised that completions-based products haven’t taken off outside coding (where they immediately generated a multi-billion-dollar market). Google Docs and &lt;a href=&quot;https://support.microsoft.com/en-us/office/editor-text-predictions-in-word-7afcb4f3-4aa2-443a-9b08-125a5d692576&quot;&gt;Microsoft Word&lt;/a&gt; both have something like this. Why isn’t there more hype around this?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maybe the answer is that the people using this product don’t engage with AI online spaces, and are just quietly using the product?&lt;/li&gt;
&lt;li&gt;Maybe there’s something about normal professional writing that’s less amenable to autocomplete than code? I doubt that, since so much normal professional writing is being copied out of a ChatGPT window.&lt;/li&gt;
&lt;li&gt;It could be that code editors already had autocomplete, so users were familiar with it. I bet autocomplete is brand-new and confusing to many Word users.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Agents&lt;/h3&gt;
&lt;p&gt;The third real AI product is the coding agent. People have been talking about this for years, but it was only really in 2025 that the technology behind coding agents became feasible (with Claude Sonnet 3.7, and later GPT-5-Codex).&lt;/p&gt;
&lt;p&gt;Agents are kind of like chatbots, in that users interact with them by typing natural language text. But they’re unlike chatbots in that &lt;strong&gt;you only have to do that once&lt;/strong&gt;: the model takes your initial request and goes away to implement and test it all by itself.&lt;/p&gt;
&lt;p&gt;The reason agents work and chatbots-with-tools don’t is the difference between asking an LLM to hit a single button for you and asking the LLM to hit a hundred buttons in a specific order. Even though each individual action would be easier for a human to perform, agentic LLMs are now smart enough to take over the entire process.&lt;/p&gt;
&lt;p&gt;Coding agents are a natural fit for AI agents for two reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It’s easy to verify changes by running tests or checking if the code compiles&lt;/li&gt;
&lt;li&gt;AI labs are incentivized to produce effective coding models to accelerate their own work&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For my money, the current multi-billion-dollar question is &lt;strong&gt;can AI agents be useful for tasks other than coding?&lt;/strong&gt; Bear in mind that Claude Sonnet 3.7&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; was released just under &lt;em&gt;nine months ago&lt;/em&gt;. In that time, the tech industry has successfully built agentic products about their own work. They’re just starting to build agentic products for other tasks. It remains to be seen how successful that will be, or what those products will look like.&lt;/p&gt;
&lt;h4&gt;Research&lt;/h4&gt;
&lt;p&gt;There’s another kind of agent that isn’t about coding: the research agent. LLMs are particularly good at tasks like “skim through ten pages of search results” or “keyword search this giant dataset for any information on a particular topic”. I use this functionality a lot for all kinds of things.&lt;/p&gt;
&lt;p&gt;There are a few examples of AI products built on this capability, like &lt;a href=&quot;https://www.perplexity.ai/&quot;&gt;Perplexity&lt;/a&gt;. In the big AI labs, this has been absorbed into the chatbot products: OpenAI’s “deep research” went from a separate feature to just what GPT-5-Thinking does automatically, for instance.&lt;/p&gt;
&lt;p&gt;I think there’s almost certainly potential here for area-specific research agents (e.g. in medicine or law).&lt;/p&gt;
&lt;h3&gt;Feeds&lt;/h3&gt;
&lt;p&gt;If agents are the most recent successful AI product, AI-generated feeds might be the one just over the horizon. AI labs are currently experimenting with ways of producing infinite feeds of personalized content to their users:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mark Zuckerberg has talked about filling Instagram with auto-generated content&lt;/li&gt;
&lt;li&gt;OpenAI has recently launched a Sora-based video-gen feed&lt;/li&gt;
&lt;li&gt;OpenAI has also started pushing users towards “Pulse”, a personalized daily update inside the ChatGPT product&lt;/li&gt;
&lt;li&gt;xAI is &lt;a href=&quot;https://www.testingcatalog.com/grok-will-get-infinite-image-gen-and-video-gen-with-sounds/&quot;&gt;working on&lt;/a&gt; putting an infinite image and video feed into Twitter&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So far none of these have taken off. But scrolling feeds has become the primary way users interact with technology &lt;em&gt;in general&lt;/em&gt;, so the potential here is massive. It does not seem unlikely to me at all that in five years time most internet users will spend a big part of their day scrolling an AI-generated feed.&lt;/p&gt;
&lt;p&gt;Like a completions-based product, the advantage of a feed is that users don’t have to interact with a chatbot. The inputs to the model come from how the user interacts with the feed (likes, scrolling speed, time spent looking at an item, and so on). Users can experience the benefits of an LLM-generated feed (if any) without having to change their consumption habits at all.&lt;/p&gt;
&lt;p&gt;The technology behind current human-generated infinite feeds is already a mature application of state of the art machine learning. When you interact with Twitter or LinkedIn, you’re interacting with a model, except instead of generating text it’s generating lists of other people’s posts. In other words, &lt;strong&gt;feeds already maintain a sophisticated embedding of your personal likes and dislikes&lt;/strong&gt;. The step from “use that embedding to surface relevant content” to “use that embedding to &lt;em&gt;generate&lt;/em&gt; relevant content” might be very short indeed.&lt;/p&gt;
&lt;p&gt;I’m pretty suspicious of AI-generated infinite feeds of generated video, but I do think other kinds of infinite feeds are an under-explored kind of product. In fact, I built a feed-based hobby project of my own, called &lt;a href=&quot;https://www.autodeck.pro/&quot;&gt;Autodeck&lt;/a&gt;&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;. The idea was to use an AI-generated feed to generate spaced repetition cards for learning. It works pretty well! It still gets a reasonable amount of use from people who’ve found it via my blog (also, from myself and my partner).&lt;/p&gt;
&lt;h3&gt;Games&lt;/h3&gt;
&lt;p&gt;One other kind of AI-generated product that people have been talking about for years is the AI-based video game. The most speculative efforts in this direction have been full world simulations like DeepMind’s &lt;a href=&quot;https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/&quot;&gt;Genie&lt;/a&gt;, but people have also explored using AI to generate a subset of game content, such as pure-text games like &lt;a href=&quot;https://aidungeon.com/&quot;&gt;AI Dungeon&lt;/a&gt; or this &lt;a href=&quot;https://www.nexusmods.com/skyrimspecialedition/mods/98631&quot;&gt;Skyrim mod&lt;/a&gt; which adds AI-generated dialogue. Many more game developers have incorporated AI art or &lt;a href=&quot;https://www.polygon.com/arc-raiders-ai-voices-the-finals-embark-studios/&quot;&gt;audio&lt;/a&gt; assets into their games.&lt;/p&gt;
&lt;p&gt;Could there be a transformative product that incorporates LLMs into video games? I don’t think ARC Raiders counts as an “AI product” just because it uses AI voice lines, and the more ambitious projects haven’t yet really taken off. Why not?&lt;/p&gt;
&lt;p&gt;One reason could be that &lt;strong&gt;games just take a really long time to develop&lt;/strong&gt;. When &lt;em&gt;Stardew Valley&lt;/em&gt; took the world by storm in 2016, I expected a flood of copycat cozy pixel-art farming games, but that only really started happening in 2018 and 2019. That’s how long it takes to make a game! So even if someone has a really good idea for an LLM-based video game, we’re probably still a year or two out from it being released.&lt;/p&gt;
&lt;p&gt;Another reason is that &lt;strong&gt;many gamers really don’t like AI&lt;/strong&gt;. Including generative AI in your game is a guaranteed controversy (though it doesn’t seem to be fatal, as the success of ARC Raiders shows). I wouldn’t be surprised if some game developers simply don’t think it’s worth the risk to try an AI-based game idea&lt;sup id=&quot;fnref-6&quot;&gt;&lt;a href=&quot;#fn-6&quot; class=&quot;footnote-ref&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;A third reason could be that &lt;strong&gt;generated content is just not a good fit for gaming&lt;/strong&gt;. Certainly ChatGPT-like dialogue sticks out like a sore thumb in most video games. AI chatbots are also pretty bad at &lt;em&gt;challenging&lt;/em&gt; the user: their post-training is all working to make them try to satisfy the user immediately&lt;sup id=&quot;fnref-7&quot;&gt;&lt;a href=&quot;#fn-7&quot; class=&quot;footnote-ref&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;. Still, I don’t think this is an insurmountable technical problem. You could simply post-train a language model in a different direction (though perhaps the necessary resources for that haven’t yet been made available to gaming companies).&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;By my count, there are three successful types of language model product:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chatbots like ChatGPT, which are used by hundreds of millions of people for a huge variety of tasks&lt;/li&gt;
&lt;li&gt;Completions coding products like Copilot or Cursor Tab, which are very niche but easy to get immediate value from&lt;/li&gt;
&lt;li&gt;Agentic products like Claude Code, Codex, Cursor, and Copilot Agent mode, which have only really started working in the last six months&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On top of that, there are two kinds of LLM-based product that don’t work yet but may soon:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM-generated feeds&lt;/li&gt;
&lt;li&gt;Video games that are based on AI-generated content&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Almost all AI products are just chatbots (e.g. AI-powered customer support). These suffer from having to compete with ChatGPT, which is a superior general product, and not being able to use powerful tools, because users will be able to easily jailbreak the model.&lt;/p&gt;
&lt;p&gt;Agentic products are new, and have been wildly successful &lt;em&gt;for coding&lt;/em&gt;. It remains to be seen what they’ll look like in other domains, but we’ll almost certainly see domain-specific research agents in fields like law. Research agents in coding have seen some success as well (e.g. code review or automated security scanning products).&lt;/p&gt;
&lt;p&gt;Infinite AI-generated feeds haven’t yet been successful, but hundreds of millions of dollars are currently being poured into them. Will OpenAI’s Sora be a real competitor to Twitter or Instagram, or will those companies release their own AI-generated feed product?&lt;/p&gt;
&lt;p&gt;AI-generated games sound like they could be a good idea, but there’s still no clear working strategy for how to incorporate LLMs into a video game. Pure world models - where the entire game is generated frame-by-frame - are cool demos but a long way from being products.&lt;/p&gt;
&lt;p&gt;One other thing I haven’t mentioned is image generation. Is this part of a chatbot product, or a tool in itself? Frankly, I think AI image generation is still more of a toy than a product, but it’s certainly seeing a ton of use. There’s probably some fertile ground for products here, if they can successfully differentiate themselves from the built-in image generation in ChatGPT.&lt;/p&gt;
&lt;p&gt;In general, it feels like the early days of the internet. LLMs have so much potential, but we’re still mostly building copies of the same thing. There have to be some really simple product ideas that we’ll look back on and think “that’s so obvious, I wonder why they didn’t do it immediately”.&lt;/p&gt;
&lt;p&gt;edit: This post got quite a few comments on &lt;a href=&quot;https://news.ycombinator.com/item?id=45946498&quot;&gt;Hacker News&lt;/a&gt;. Some  commenters think &lt;a href=&quot;https://news.ycombinator.com/item?id=45946878&quot;&gt;my categories are too broad&lt;/a&gt;, which is a fair criticism: like saying that there are only two “electricity products”, ones which turn a motor and ones which heat up a wire.&lt;/p&gt;
&lt;p&gt;Other commenters argue that summarization, easy translation, and transcription are products I’ve missed. I disagree: have you yourself purchased some piece of LLM-driven summarization, translation or transcription software? Probably not - you just use a chatbot directly, right? I thus think of those as &lt;em&gt;features&lt;/em&gt; of the chatbot product, not products in their own right.&lt;/p&gt;
&lt;p&gt;One commenter &lt;a href=&quot;https://news.ycombinator.com/item?id=45946957&quot;&gt;points out&lt;/a&gt; that there may be a bunch of zero-hype products bubbling away under the radar. Fair enough! I don’t know what I don’t know.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Of course, “just” here covers a raft of progress in training stronger models, and real innovations around RLHF, which made it possible to talk with pure LLMs at all.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;This is a big reason why &lt;a href=&quot;/why-do-ai-enterprise-projects-fail&quot;&gt;most AI enterprise projects fail&lt;/a&gt;. Anecdotally, I have heard a lot of frustration with bespoke enterprise chatbots. People just want to use ChatGPT!&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;If you’re not convinced, take any device you’re comfortable using (say, your phone, your car, your microwave) and imagine having to type out every command. Maybe really good speech recognition will fix this, but I doubt it.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;I originally had this incorrectly as “3.5 Sonnet”. Thanks to a reader for the correction.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;I wrote about it &lt;a href=&quot;/autodeck&quot;&gt;here&lt;/a&gt; and it’s linked in the topbar.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;Though this could be counterbalanced by what I’m sure is a strong push from executives to get in on the action and “build something with AI”.&lt;/p&gt;
&lt;a href=&quot;#fnref-6&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-7&quot;&gt;
&lt;p&gt;If you’ve ever tried to ask ChatGPT to DM for you, you’ll have experienced this first-hand: the model will immediately try and show you something cool, skipping over the necessary dullness that builds tension and lends verisimilitude.&lt;/p&gt;
&lt;a href=&quot;#fnref-7&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item></channel></rss>