<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[seangoedecke.com RSS feed]]></title><description><![CDATA[Sean Goedecke's personal blog]]></description><link>https://seangoedecke.com</link><generator>GatsbyJS</generator><lastBuildDate>Mon, 15 Jun 2026 01:38:06 GMT</lastBuildDate><item><title><![CDATA[AI GPUs probably live longer than three years]]></title><link>https://seangoedecke.com/ai-gpus-live-longer-than-three-years/</link><guid isPermaLink="false">https://seangoedecke.com/ai-gpus-live-longer-than-three-years/</guid><pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://www.wheresyoured.at/ai-is-slowing-down&quot;&gt;People&lt;/a&gt; who think current AI use is unsustainable often rely on the &lt;a href=&quot;https://www.tomshardware.com/pc-components/gpus/datacenter-gpu-service-life-can-be-surprisingly-short-only-one-to-three-years-is-expected-according-to-unnamed-google-architect&quot;&gt;claim&lt;/a&gt; that inference GPUs only last “three years at the most” under load&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. The idea here is that once the AI bubble money drains away, current infrastructure will rapidly become obsolete, and there won’t be enough money floating around to buy a whole slate of brand-new GPUs. Inference costs would thus rapidly become way too expensive for current AI products to make any financial sense.&lt;/p&gt;
&lt;p&gt;Where does this “three years at the most” claim come from? Is it plausible? &lt;/p&gt;
&lt;h3&gt;Sourcing the quote&lt;/h3&gt;
&lt;p&gt;The original Tom’s Hardware article quotes this &lt;a href=&quot;https://x.com/techfund1/status/1849031571421983140&quot;&gt;tweet&lt;/a&gt; from Tech Fund, an anonymous former PM and tech investor, who quotes an anonymous “GenAI principal architect” at Google as saying “if you have a high utilization rate, then constant high utilization rate for a year or two, I think the lifespan will be three years at most”.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/9cf90a11e418ecd73f81e5178a9328c2/1b853/tweet.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 85.8108108108108%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAARCAYAAADdRIy+AAAACXBIWXMAABJ0AAASdAHeZh94AAACiUlEQVQ4y42U6XKjQAyEef8X3Gxim3M4DT4xxnai7U+E1P7ZqqVqmBkhaVrqHqK+H6xuWsuKYHXdWBZKX2/jxOeqri0vCiur2pI8+FyUle0Vx/P19eVjXUe8Pj8/7fV6ybDM7B+Phw/WT7ctgXx/Pp8+//2siaO+7+33ZmO7NLW3943FaWa5UIIwyTILQg3KWN8/tluLs9y6fW/zPFu732v0djidfhJHt2myw/Fo58vFhsPBLterr0/ni6/H281ut0n784/fON7sIZTE4ne/3x011URHORWhsn44CGVmTbe3UJbqa+P9ytQ/bHkIjjhVX5u2tVx9rIQ+TjL3zfLC6razaDgMcswd3a/3D09WBEhJbZekFiAhLGTsFMy6kE+qtmBrlIQ2bXY7ryg6n86evem6ZZYDSUsxX9WtK2CxVTr06IhT+dFHkK6Ml+rzqNZEnRoL9FBVLpNW5dWeTCUzZD/q0G0cezCEUf7qs8aCzkmh2an0lRfBSwdBpn0iVmG0UiAJOIwqCk9YedkkWxQR20YtGsRH1A+DvX1s3CkPi6ApKVWyQidzGIGgAwl2mIXV+Vurd0lomu6+j6B/HEeXgH+UBOaZefYgHJkJ8qHviPxfT4QDOmOgORJcdcCkAy7X0Q9jxo5twl8zelz1eVb/rq7ZaSk5TnPvDRKpXHuLNLDTH+4wJGCDbcpHKqiCNsAwvefWOCkkatVwCKgQNER8J+Xase+lU9YkhDT8OARVrIqgCr8pSbb8TWrpDjGDKJUNllfkMEwgN8X/PEKI+DkgBa38/OpxQ4DvCdvOryE/AlAtaIofcYOQaxa+f2FJnru///6kEHodnSUBnJHK0rtFHon2XDOCKeV/nz8newEInqX7qwAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;tweet&quot;
        title=&quot;tweet&quot;
        src=&quot;/static/9cf90a11e418ecd73f81e5178a9328c2/fcda8/tweet.png&quot;
        srcset=&quot;/static/9cf90a11e418ecd73f81e5178a9328c2/12f09/tweet.png 148w,
/static/9cf90a11e418ecd73f81e5178a9328c2/e4a3f/tweet.png 295w,
/static/9cf90a11e418ecd73f81e5178a9328c2/fcda8/tweet.png 590w,
/static/9cf90a11e418ecd73f81e5178a9328c2/1b853/tweet.png 592w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;This screenshot looks like it was from an interview. What interview? I scrolled back to October 2024 on Tech Fund’s Twitter feed and saw a bunch of &lt;a href=&quot;https://x.com/techfund1/status/1828858794480140391?s=20&quot;&gt;similarly-formatted&lt;/a&gt; &lt;a href=&quot;https://x.com/techfund1/status/1826875528751534448?s=20&quot;&gt;screenshots&lt;/a&gt;, some of which were cited as coming from &lt;a href=&quot;https://tegus.com/&quot;&gt;Tegus&lt;/a&gt;. Tegus is apparently a company with a &lt;a href=&quot;https://www.reddit.com/r/expertnetworks/comments/1ghe2ls/tegus_analyst_reached_out_to_me/&quot;&gt;business model&lt;/a&gt; of reaching out to insiders (in this case, AI company employees) and paying them hundreds of dollars an hour in order to answer specific technical questions. It’s essentially gig work for &lt;em&gt;almost-but-not-quite&lt;/em&gt; insider trading: the more informed and confident you sound, the more likely Tegus analysts will pick you for future interviews.&lt;/p&gt;
&lt;p&gt;I’m sure the source for this tweet is in fact a GenAI principal architect, since Tegus would have presumably asked for some proof of that before they paid them out. But it’s pretty clear that the incentives here are to sound confident and authoritative, even on questions that you’re not sure about. With that in mind, the quote itself also reads a bit suspiciously. I’ve worked with enough principal engineers and architects to take their casual back-of-envelope estimates with a grain of salt. If they knew the actual rate at which GPUs fail and get retired in Google datacenters, wouldn’t they have just said that?&lt;/p&gt;
&lt;h3&gt;Evidence for a longer lifespan&lt;/h3&gt;
&lt;p&gt;We have some anecdotal evidence that points the other way. Google has &lt;a href=&quot;https://www.datacenterdynamics.com/en/news/google-says-tpu-demand-is-outstripping-supply-claims-8yr-old-hardware-iterations-have-100-utilization&quot;&gt;publicly claimed&lt;/a&gt; to have eight year old TPUs (their version of GPUs) running in production at “100% utilization”. Nvidia only made A100 GPUs from &lt;a href=&quot;https://www.amax.com/nvidia-h100-vs-nvidia-a100/&quot;&gt;2020-2024&lt;/a&gt;, but in February 2026 the AWS CEO &lt;a href=&quot;https://www.datacenterdynamics.com/en/news/aws-has-never-retired-an-nvidia-a100-server-ceo-matt-garman-claims/&quot;&gt;claimed&lt;/a&gt; that AWS had never retired an A100 server (and you can still easily rent A100s for AI work)&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. AI GPU usage isn’t exactly like crypto mining GPU usage, but it certainly seems like years-old ex-crypto GPUs are &lt;a href=&quot;https://www.youtube.com/watch?v=UFytB3bb1P8&quot;&gt;functional&lt;/a&gt;. There’s also &lt;a href=&quot;https://news.ycombinator.com/item?id=48456717&quot;&gt;this comment&lt;/a&gt; from Hacker News I noticed where someone claims that their GPU cluster in academia has lasted six years with less than 20% failure rate.&lt;/p&gt;
&lt;p&gt;What about hard data? It’s hard to get concrete data on the lifespan of AI GPUs, because modern AI datacenters have only existed for a handful of years. But an interesting case study would be recent supercomputer clusters like Oak Ridge’s &lt;a href=&quot;https://www.datacenterdynamics.com/en/news/oak-ridge-national-laboratory-to-retire-summit-supercomputer-in-november-2024/&quot;&gt;Summit&lt;/a&gt;, which had over 27 thousand Nvidia V100s running from 2018 to 2024, or its predecessor, the Cray &lt;a href=&quot;https://christian-engelmann.de/publications/ostrouchov20gpu.pdf&quot;&gt;Titan&lt;/a&gt; supercomputer that ran from 2012 to 2019. I couldn’t find any evidence that Summit had to buy an additional 27,000 GPUs to replace their old ones, and GPU failures in Titan have been &lt;a href=&quot;https://christian-engelmann.de/publications/ostrouchov20gpu.pdf&quot;&gt;carefully studied&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/3922b6bf4aaf6d3202cacd2f472034aa/019a6/figure10.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 60.810810810810814%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAMCAYAAABiDJ37AAAACXBIWXMAABYlAAAWJQFJUiTwAAAB5ElEQVQoz3VT7ZKbMBDL+79UH6E/e53m0gTCVwjGgG2MDaha53JNOy0zmxjZq5V3xQF8+l4ju2S4nM84Ho8oigLX6zVFnl9xPP3E9+MPlEX5ieV5jvfTCSdGWRbp3VqLw7ZtGMcRIUT4ecbiPWKMWJYFxliEJWKwDtpZbOsKv3jiJv075+D9gm1bMU0T6rrGYeUhQ2Y7KlgzYd8BwUIImFlACu4f8Te+fWA7k4RQ1CeFlko6Eupp+CNRFMhaFEv8DxdCuaVc+yCLvlfcAK8Zk0I5EOOrkv2fCvcXhdK/pmlw6PsebdtiZtVpMuhVy9AYRodhGKie/dMKShuemZloiI+Ps8wVtVJA1olQGqyUShUf15mg7jl0d4ObPRVscHZE1eRodOT+kopvxNu2Rj84+mRPc0iEIldIfzcY2HZOzdRMfuC8MeaxxP2W425WKnIJi8HizMlGCxi2Lsuyx1BeCSVkHeP62auEkaCrvmLqLrQMz9Eq/EF9e8O3akDXWlRV8erDkLwnEcKSyKTQE5cCzhvooWIf6Vf6MEQ6pM3wdv6CsqvR1LyyfCmvZM/wNLjE0yYyIDG/OMF7En7kCOkyWwow6T31UNSIMbXWSa0kS8i7hAxNpih7SnUJE/WPYB7Pylclzy+EUp6sFg3U1gAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;fig10&quot;
        title=&quot;fig10&quot;
        src=&quot;/static/3922b6bf4aaf6d3202cacd2f472034aa/fcda8/figure10.png&quot;
        srcset=&quot;/static/3922b6bf4aaf6d3202cacd2f472034aa/12f09/figure10.png 148w,
/static/3922b6bf4aaf6d3202cacd2f472034aa/e4a3f/figure10.png 295w,
/static/3922b6bf4aaf6d3202cacd2f472034aa/fcda8/figure10.png 590w,
/static/3922b6bf4aaf6d3202cacd2f472034aa/efc66/figure10.png 885w,
/static/3922b6bf4aaf6d3202cacd2f472034aa/c83ae/figure10.png 1180w,
/static/3922b6bf4aaf6d3202cacd2f472034aa/019a6/figure10.png 1818w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;These cages of GPUs are stacked vertically, and cold air is pumped in from the bottom, which explains why cage 0 (at the bottom) has better survival rates than cage 2 (at the top). Let’s consider cage 0, so we’re just looking at the GPU lifespan instead of at the lifespan of improperly-cooled GPUs. At three years, over 95% of GPUs survived&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. At six years, nodes 2 and 3 (the GPUs closest to the bottom of the cage) were still at above 90% survival rate, and the highest nodes were over 60%.&lt;/p&gt;
&lt;p&gt;It’s possible that newer Nvidia GPUs are less reliable than older ones (they certainly draw more power), or that AI datacenters are under-cooled, or that something about LLM utilization is more stressful than the workloads that ran on traditional GPU datacenters. But this is at least circumstantial evidence that GPUs can survive under load for far longer than three years.&lt;/p&gt;
&lt;h3&gt;Economic lifespans&lt;/h3&gt;
&lt;p&gt;This discussion is complicated by the fact that GPUs may have a short &lt;em&gt;economic&lt;/em&gt; lifespan. Supposedly a B100 GPU &lt;a href=&quot;https://bizon-tech.com/blog/nvidia-b200-b100-h200-h100-a100-comparison?srsltid=AfmBOoqugX-R8Y9AoVlyxRMheglf4gJ2Xc5hefXVxL6Cv3Htl0P_rHx1&quot;&gt;draws&lt;/a&gt; twice as much power as an A100, but can do five times as much work. For some AI providers, that might mean that A100s are only worth running until they can be replaced with B100s (if you’re bottlenecked on electricity, you should spend it all on B100s and throw out your obsolete A100s). This is why the Titan supercomputer was decommissioned in favor of Summit: it could have continued to operate, but it was more profitable to spend the money and maintenance effort on newer hardware.&lt;/p&gt;
&lt;p&gt;It should be obvious that this doesn’t support the “inference will become more expensive when the bubble pops” argument. So long as A100s are profitable &lt;em&gt;right now&lt;/em&gt;, cash-poor AI providers can continue profitably serving inference from them, even if there are more efficient options available for those with the capital to upgrade.&lt;/p&gt;
&lt;p&gt;On top of that, GPUs only represent one part of AI datacenter infrastructure spending. If your GPUs wear out, you don’t have to go and build an entirely new datacenter. About 30-50% of &lt;a href=&quot;https://epoch.ai/assets/images/data-insights/ai-datacenter-cost-breakdown/ai-datacenter-cost-breakdown-upfront.png&quot;&gt;datacenter&lt;/a&gt; &lt;a href=&quot;https://www.reuters.com/commentary/breakingviews/how-big-techs-630-bln-ai-splurge-will-fall-short-2026-03-26/&quot;&gt;spend&lt;/a&gt; goes to land, power, cooling, and so on. The remaining 50-70% is the cost of the entire server rack, which includes a bunch of things that aren’t GPUs.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Like the idea that AI inference &lt;a href=&quot;/water-impact-of-ai/&quot;&gt;requires using huge amounts of water&lt;/a&gt;, the idea that AI GPUs only live a year or two is popular because it’s a useful idea for AI skeptics, not because it’s true. It comes from a pseudonymous tweet quoting an anonymous source who’s being paid hundreds of dollars to sound like a credible expert on AI. Other public communications from AI inference providers cite much higher lifespan numbers, and the statistics from supercomputers (the traditional examples of large GPU clusters) don’t bear out the claim that the maximum lifespan is three years.&lt;/p&gt;
&lt;p&gt;It might be true that the &lt;em&gt;economic&lt;/em&gt; lifespan is three years, in a world where new GPUs come out every eighteen months and GPU providers are flush with cash to upgrade, but that doesn’t tell us much about the economics of inference in an AI winter. If money becomes a lot more scarce, it’s likely that AI datacenters will continue profitably&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; running their B300s (or their H100s or even A100s) for six years or longer.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Of course, like previous claims about AI and water usage, “three years at the most” is often cited as &lt;a href=&quot;https://ithy.com/article/data-center-gpu-lifespan-explained-7mpjwwyp&quot;&gt;“1-2 years, with some lasting up to 3 years under optimal conditions”&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Of course, pronouncements from CEOs/CTOs should be taken with a grain of salt as well (for instance, maybe they have a big backlog of unused A100s they keep swapping out), but (a) executives don’t often straight-up lie about concrete technical facts, and (b) they’re going up against an unsourced quote from a tweet, so the bar isn’t that high.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;What about proactive GPU replacement? In the “Survival Analysis” section, the study attempts to account for this. I haven’t dug into exactly how.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Assuming inference is profitable, which I believe (when you’re not attempting to amortize the cost of training).&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Working with product managers]]></title><link>https://seangoedecke.com/working-with-product-managers/</link><guid isPermaLink="false">https://seangoedecke.com/working-with-product-managers/</guid><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The relationship engineers have with product management is more dysfunctional than with any other part of the company. There’s no shared culture or language like there is with other engineers, and the rules of “who gets to tell who what to do” aren’t as clear-cut as they are with managers. Engineers don’t have a lot in common with legal, or design, or sales, but they also don’t need to interact much with those roles. In my experience, engineers are communicating with product managers almost every single day.&lt;/p&gt;
&lt;h3&gt;Against the “product mommy”&lt;/h3&gt;
&lt;p&gt;The worst version of the product/engineering relationship goes something like this:&lt;/p&gt;
&lt;p&gt;Engineers are technically competent but are too autistic to be fully trusted. They need a kind-but-stern parental figure who knows how to communicate to other stakeholders in the organization (for instance, by being comfortable using the word “stakeholders”), and how to keep engineers from going off in the wrong direction.&lt;/p&gt;
&lt;p&gt;This entire gross dynamic is neatly captured by the popular term &lt;a href=&quot;https://x.com/search?q=%22product%20mommy%22&amp;#x26;src=typed_query&quot;&gt;“product mommy”&lt;/a&gt;&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.
I really, really don’t like that term, or this entire dynamic in general. Almost none of my relationships with my product managers have been anything like this, though I have seen it at a distance.&lt;/p&gt;
&lt;p&gt;Working well with product managers can be the difference between succeeding and failing at a company. Why is it so hard to maintain good relationships between engineering and product? What does a good relationship look like?&lt;/p&gt;
&lt;h3&gt;Why it’s so hard to build trust&lt;/h3&gt;
&lt;p&gt;Product managers and engineers have largely non-overlapping skillsets. Product managers don’t understand the technical work engineers do and aren’t equipped to talk about it: if an engineer gives a technical reason for something, product managers generally have to shrug and say “sure, I guess”. Likewise, engineers don’t have anything like the visibility into the organization that product managers do. Particularly in large organizations, it is the product manager who is the source of truth about who wants what and which features are important. When a product manager says that something is critical, engineers generally have to shrug and say “sure, I guess”.&lt;/p&gt;
&lt;p&gt;This obviously requires a lot of trust. What’s a little less obvious is that &lt;strong&gt;this trust is continually broken by both sides&lt;/strong&gt;. Every single product manager has been told &lt;em&gt;thousands&lt;/em&gt; of times that technical task X is technically impossible or would be disastrous, only for that task to end up being done fairly smoothly and successfully. Every single engineer has been told &lt;em&gt;thousands&lt;/em&gt; of times that requirement X is absolutely critical and worth going to enormous effort for, only for that requirement to be silently dropped or changed with no apology.&lt;/p&gt;
&lt;p&gt;Of course this isn’t malicious. Engineers often give wrong estimates because &lt;a href=&quot;/how-i-estimate-work/&quot;&gt;estimation is impossible&lt;/a&gt;, and sometimes the dire consequences they warn about really do happen (they’re just handled behind the scenes, like engineers handle many other kinds of technical dysfunction). Product managers “change their minds” because what’s important in a large tech company does genuinely change hour-by-hour&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, and even the best attempts to only filter the most reliable priorities through to the engineering team will sometimes go wrong.&lt;/p&gt;
&lt;h3&gt;Manipulation and lies&lt;/h3&gt;
&lt;p&gt;The consequence of this broken trust is that the relationship becomes very difficult to maintain. When you’re an engineer, and you explain something to your product manager, and you &lt;em&gt;know&lt;/em&gt; they don’t believe you (despite having no ability themselves to judge the question), it can be incredibly frustrating. Likewise, when you’re a product manager, and you’re desperately trying to explain what we need to do to an engineer, and you know they’re internally shrugging their shoulders, it must be unbearable. Don’t they know this is critical to the company? You were just in a meeting with the leaders of the organization!&lt;/p&gt;
&lt;p&gt;The natural tool for a mistrustful product manager is &lt;em&gt;manipulation&lt;/em&gt;. I still remember a product manager who tried to extract a commitment from my team by asking us to go around and all say “I commit to getting this work done in two weeks”, after a conversation where we’d explained the risks that cause it to take longer. I suppose the idea was that we’d all work much harder, having taken a sacred oath? More subtle variants of this approach involve suggesting that you would be really disappointed if this work was delayed (in true “product mommy” style), or vaguely suggesting the possibility of some abstract reward (that the product manager is not empowered to deliver) if work gets done ahead of schedule.&lt;/p&gt;
&lt;p&gt;The natural tool for a mistrustful engineer is &lt;em&gt;lies&lt;/em&gt;. The most benign version of this is exaggerating estimates: for instance, the classic advice to &lt;a href=&quot;https://news.ycombinator.com/item?id=19671824&quot;&gt;double your estimate and add 20%&lt;/a&gt;. I’ve seen engineers claim that they’ve had to follow up on all sorts of largely-fake tasks (one common example is “reaching out to a neighbor team to confirm X”) in order to gain more time. In the worst case, engineers might even straight-out lie that work has been completed, and then track the “it doesn’t work in production” feedback as a bug.&lt;/p&gt;
&lt;p&gt;Once this starts happening, it’s nearly impossible to repair the relationship. I can’t bring myself to trust a product manager who’s clearly trying to pull my strings, and I’m sure a product manager can’t trust an engineer who’s lied to their face in the past. That’s why it’s so important to avoid getting into a bad relationship in the first place.&lt;/p&gt;
&lt;h3&gt;Don’t fight with the product manager&lt;/h3&gt;
&lt;p&gt;Why bother? If it’s so hard to hammer out a good working relationship with product managers, why not just settle for a bad one? Product managers can absolutely &lt;em&gt;bury&lt;/em&gt; you if you’re not careful.&lt;/p&gt;
&lt;p&gt;Product managers are almost always more politically sophisticated than engineers. This is partly structural: product managers are simply in more conversations with the company’s movers and shakers, and so naturally have a better relationship with them (and are thus better attuned to which way the wind is blowing). It’s also partly selection bias: engineers can be hired even with relatively poor social skills, because they’re primarily being assessed on technical ability, but social skills are a core part of the product role&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If you are feuding with a product manager, you will probably lose&lt;/strong&gt;. Unless you’re unusually influential, they will simply have far more opportunities to quietly talk you down in influential circles than you will. All it takes is a few comments like “oh, I probably wouldn’t pick Sean for that project” to wreck your reputation. In the case where you are &lt;em&gt;openly&lt;/em&gt; feuding with a product manager, the company’s leaders will by default take the product manager’s side over yours. They’re likely to know them better, have more shared cultural context with them, and in general be willing to interpret the situation as “another engineer who doesn’t understand how the organization works”.&lt;/p&gt;
&lt;p&gt;There are huge benefits to being trusted by a product manager. Product managers &lt;em&gt;want to ship things&lt;/em&gt;, and typically understand a fair amount about all of the non-technical barriers to shipping. If you also want to ship things, you can become a fearsome team.&lt;/p&gt;
&lt;p&gt;On top of that, because trust between engineers and product managers is so difficult, once you’re in you’re in all the way. Product managers often pick one or two engineers as their go-to for getting the “real story” on technical questions. If that’s you, you have an outsized position of influence in the organization, which you can use to &lt;a href=&quot;/how-to-influence-politics/&quot;&gt;get the things you want done&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;How can you build trust with product managers?&lt;/h3&gt;
&lt;p&gt;As an engineer, how can you build trust with your product manager?&lt;/p&gt;
&lt;p&gt;The first step is to &lt;strong&gt;understand where they’re coming from&lt;/strong&gt;. When they tell you something is important or that a requirement has come in, be aware that this is rarely their decision. It’s not them who’s jerking you around, it’s someone higher up in the food chain jerking you both around. If you can adopt a conspiratorial mindset &lt;em&gt;with&lt;/em&gt; them, instead of &lt;em&gt;against&lt;/em&gt; them, that’s a good start. Try just asking “oh man, alright, what can we do about this?” instead of complaining.&lt;/p&gt;
&lt;p&gt;The second step is to &lt;strong&gt;be right, a lot&lt;/strong&gt;. This is a silly-sounding Amazon leadership principle that turns out to be entirely accurate. I wrote more about it &lt;a href=&quot;https://www.seangoedecke.com/being-right-a-lot/&quot;&gt;here&lt;/a&gt;, but (as unfair as it sounds) you really do have to be mostly accurate if you want to build trust with a product manager. When you say something will ship, it has to ship; when you say something is impossible, it can’t happen days or weeks later. It’s okay to be wrong &lt;em&gt;sometimes&lt;/em&gt;, but you have to establish a pattern of you providing them useful, correct technical information.&lt;/p&gt;
&lt;p&gt;The third step is to &lt;strong&gt;let them make the political calls most of the time&lt;/strong&gt;. If you expect them to trust your technical calls, you have to extend them the same trust when it comes to navigating the organization. Don’t publicly undermine them in meetings, bring up your concerns in private. If they say something is important and you’re not so sure, at least act like it is. Accept that sometimes they’re going to be wrong, just like you’re sometimes wrong about technical questions.&lt;/p&gt;
&lt;p&gt;The fourth step is to &lt;strong&gt;get lucky&lt;/strong&gt;. Sometimes your product manager will just be a dud. You can’t build trust with someone incompetent: there’s nothing for you to trust them with, and they aren’t in a position where they can usefully extend trust to you. Working in large organizations requires getting comfortable with the fact that some of your colleagues will be stronger than others, and figuring out ways to work with (or bypass) people who make the work harder, not easier.&lt;/p&gt;
&lt;h3&gt;“Technical” product managers&lt;/h3&gt;
&lt;p&gt;Many product managers were once engineers. If your product manager is technical, does that make you immune from these problems? Absolutely not!&lt;/p&gt;
&lt;p&gt;You likely won’t have much choice in which product managers you work with, but be aware that having once been an engineer is a &lt;em&gt;negative&lt;/em&gt;, not a positive. No product manager can ever be technical enough to matter, because &lt;a href=&quot;/you-cant-design-software-you-dont-work-on/&quot;&gt;they don’t work on the codebase&lt;/a&gt;: even if they were a full-time engineer, they wouldn’t have the time to build the specific context on the system they’d need to be a real participant in technical discussions. It’s thus better to have a product manager who knows they’re not technical than to have one who mistakenly thinks they might be.&lt;/p&gt;
&lt;p&gt;The worst-case scenario is an ex-engineering product manager who believes they’re technical enough to detect when engineers are lying to them. This kind of paranoia is an easy trap for “technical” product managers to fall into, particularly when they don’t have a trusted engineer on the team they can lean on. If you’re dealing with one of these, prepare to spend a lot of time explaining why you can’t “just” do things (and prepare to have those explanations not be believed).&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;At its worst, a product manager relationship is like an unhealthy family: driven by condescension, emotional manipulation, lies, and mistrust. This isn’t because product managers are bad people! It’s because the structure of the relationship creates conflict. Both sides must make commitments (about the technical system or goals of the organization) that are (a) often wrong, and that (b) the other side is unable to independently verify. To avoid the trap, both sides have to be generous, willing to trust each other in their areas of expertise, and most importantly &lt;em&gt;competent&lt;/em&gt;.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Unlike most roles in tech, product management (particularly the lower-level roles that are more engineer-facing) has close to an &lt;a href=&quot;https://www.productplan.com/blog/gender-diversity-better-products&quot;&gt;even&lt;/a&gt; gender split.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;For instance, based on the whims (or snap decisions, more charitably) of the CEO.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I have worked with product managers with poor social skills, but it’s rare: about as rare as working with engineers with genuinely poor (i.e. by general-population standards) technical skills.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Doing nothing at work]]></title><link>https://seangoedecke.com/doing-nothing-at-work/</link><guid isPermaLink="false">https://seangoedecke.com/doing-nothing-at-work/</guid><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Many engineers should be doing less work. I don’t necessarily mean producing less code or fewer changes, but literally working fewer hours in the day. When they do work, they should be working at a slower pace. I like to aim to be running at 80% utilization by default: unless I have a high-pressure project going on, I spend 20% of my workday away from the computer.&lt;/p&gt;
&lt;h3&gt;High-impact opportunities&lt;/h3&gt;
&lt;p&gt;Why? &lt;strong&gt;Performance at tech companies is dominated by outlier events&lt;/strong&gt;. When I think about the most impactful changes I’ve made, many of them involved a surprisingly trivial amount of work. There are no points for effort in software development. What matters is solving the right problem at the right time.&lt;/p&gt;
&lt;p&gt;In large engineering organizations, there are usually trivial pieces of engineering work you could do that would make tens or hundreds of millions of dollars for the company. Here are three common examples:&lt;/p&gt;
&lt;p&gt;First, when the company is trying to sign a big enterprise deal, stepping in with a feature or bugfix can make the deal happen. It doesn’t even have to be a &lt;em&gt;good&lt;/em&gt; feature: sometimes just showing that you’re willing and able to make a concrete change will be enough.&lt;/p&gt;
&lt;p&gt;Second, preventing or mitigating an incident early (even by just knowing the right feature flag to turn off) can save huge amounts of money: both immediate lost revenue during the incident and future lost revenue from customers who would have pulled their business or refused to sign pending contracts.&lt;/p&gt;
&lt;p&gt;Third, when the company is trying to ship a high-profile feature, success or failure often hinges on trivial but obscure changes (e.g. the ability to rapidly add a new field in user settings, or to update the crufty enterprise-data-export functionality nobody has touched in years). Familiarity with the system can be the difference between one of these changes taking a few hours or a whole week.&lt;/p&gt;
&lt;p&gt;What do these examples have in common? They’re all &lt;em&gt;time-dependent&lt;/em&gt;. You can’t just log on in the morning and decide to unblock a big deal, or mitigate an incident, or speed up a high-profile feature. Is it just a matter of being in the right place at the right time? Not quite. &lt;strong&gt;You also have to not already be busy.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;Staying loose&lt;/h3&gt;
&lt;p&gt;I wrote about this a couple of years ago in &lt;a href=&quot;https://www.seangoedecke.com/party-tricks/&quot;&gt;&lt;em&gt;Crushing JIRA tickets is a party trick, not a path to impact&lt;/em&gt;&lt;/a&gt;. If you’re always 100% utilized on a steady stream of low-priority work (for instance, if you’re just picking up tickets from the backlog, crushing them, then picking up the next one), you’ll miss your chance to do high-impact work in two ways.&lt;/p&gt;
&lt;p&gt;First, you’ll be too busy to even &lt;em&gt;notice&lt;/em&gt; the opportunities. You won’t be chatting with people who are working on other things, or reading team updates, or keeping an eye on ongoing incidents. So you’ll miss out on the best way to get involved in high-impact work, which is to volunteer your expertise.&lt;/p&gt;
&lt;p&gt;Second, if you perpetually look busy, your manager won’t want to volunteer for you. This is the second-best way to get involved in high-impact work: to have your manager or product manager say “oh, Sean has capacity to help out here, let me tag him in”. Why is this better? Because managers and product managers usually have a much better read on what high-impact work is going on. They’re in meetings that you aren’t in.&lt;/p&gt;
&lt;h3&gt;Doing nothing&lt;/h3&gt;
&lt;p&gt;If you’re supposed to keep your time free for high-impact work, and you’re not supposed to just grind tickets, what should you be doing on a minute-by-minute basis? Should you just be doing nothing? Yep!&lt;/p&gt;
&lt;p&gt;Doing nothing is good, actually. Software engineering can be a stressful job, but it’s typically not &lt;em&gt;consistently&lt;/em&gt; stressful: the stress comes from the occasional incident, or high-pressure urgent piece of work, or (these days) layoff. If you approach the comparatively low-pressure parts of your work with urgent intensity, you’ll already be exhausted and frazzled when you have to handle the high-pressure parts.&lt;/p&gt;
&lt;p&gt;Even in high-pressure parts of the job, doing nothing can still be good. One thing I recommend for engineers new to on-call is to avoid rushing: take a few breaths before joining the call or before speaking, and in general try to &lt;a href=&quot;/thinking-clearly/&quot;&gt;“think in slow motion”&lt;/a&gt;. Most incidents resolve on their own. Most frantic “maybe this will help” changes during incidents make things worse, not better. As a general rule, if you can simply avoid panicking, you will be doing better than most engineers at incident response.&lt;/p&gt;
&lt;p&gt;Nothing is a space things can happen in&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. If you give your brain a chance to rest, you will find you’re more likely to have new ideas. If someone hands you an important task, you can tackle it with your full attention (instead of juggling it with the three other things you’re working on in the background). When you’re not busy, you have time to just &lt;em&gt;look at things&lt;/em&gt; and take in new data.&lt;/p&gt;
&lt;h3&gt;Deliberately not doing specific things&lt;/h3&gt;
&lt;p&gt;A lot of engineers are uncomfortable seeing a task that needs doing and not doing it. I’m like this as well. I wrote about it in &lt;a href=&quot;/addicted-to-being-useful/&quot;&gt;&lt;em&gt;I’m addicted to being useful&lt;/em&gt;&lt;/a&gt;: it’s a psychological quirk that many software engineers share, because having that quirk (to a point) makes you a good fit for the job. In order to spend time doing nothing, sometimes you need to force yourself to not step in.&lt;/p&gt;
&lt;p&gt;For instance, I believe that &lt;strong&gt;engineers should generally avoid glue work&lt;/strong&gt;&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. Most glue work - making sure people talk to each other, updating docs for work you’re not leading, volunteering to address technical debt - reflects the fact that the organization is not explicitly prioritizing this work. If they were, you wouldn’t need to volunteer for it. Either that’s fine, or it’s a big mistake. If it’s fine, then you shouldn’t step up and do it: you’ll be wasting your time and annoying your manager. If it’s a big mistake, &lt;em&gt;you still shouldn’t do it&lt;/em&gt;, because you’ll be insulating the company from the consequences of its own mistakes at the cost of your own career and mental well-being.&lt;/p&gt;
&lt;p&gt;That’s a bad deal for you, and a bad example for your junior colleagues, and sets a bad precedent for someone else to jump into the same position when you inevitably burn out&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. If the consequences truly are severe, let them happen, so the organization can feel the pain and change its policies.&lt;/p&gt;
&lt;p&gt;I also believe that &lt;strong&gt;being too helpful leaves you vulnerable to predators&lt;/strong&gt;. Tech companies are full of people who want to extract uncompensated work from software engineers&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. This is different from work that arrives via normal channels, and for which you’re compensated by promotions, bonuses (and just your normal salary). I’m talking about work that arrives via backchannels, from people who don’t have the ability or willingness to ensure that work is formally recorded under your name. For instance, a product manager from another organization messaging you to say “you’re so good at querying data, would you mind pulling some statistics for me about X?”, or an engineer from another team asking you to “pair” on a piece of work that will ultimately involve you writing all the code and them quietly submitting the change under their own name.&lt;/p&gt;
&lt;p&gt;Doing some amount of this kind of work is fine. You may as well help people out when you can. But you need to be able to apply backpressure, either by saying no or simply delaying your response by a few hours or days.&lt;/p&gt;
&lt;p&gt;It’s also a good idea to &lt;strong&gt;avoid investing too much in work that is likely going to disappear&lt;/strong&gt;. For instance, suppose you’re working with a product designer who is figuring out what they want in real time. At 9am they message you saying they want the page header to look one way, then at 10am they have tweaks, and more changes at 11am, and so on. You should not throw yourself into fully rewriting the page every hour. Instead, you should do nothing (say, go for a walk) and rewrite the page once in the afternoon, based on the most recent design. Another common instance of this is “big idea from a manager without the political clout to follow through on it”. Often you can just run out the clock until the project gets inevitably cancelled&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;A lot of software engineering advice and tooling is designed around the ability to scale up your ability to exert technical effort: to do more things at the same time, to take on projects of larger scope, or to just write more code. But software engineering success is not determined by any of these. It is determined by the ability to do the &lt;em&gt;right&lt;/em&gt; things at the &lt;em&gt;right&lt;/em&gt; time, which requires that you deliberately hold back some of your effort during ordinary work.&lt;/p&gt;
&lt;p&gt;In my experience, it’s still possible to be a “high performing engineer” at 80% effort. In fact, it’s &lt;em&gt;easier&lt;/em&gt;, because you’ll be less likely to make silly mistakes from stress, and you’ll be in a position to jump on the kind of high-impact tasks that deliver outsized returns.&lt;/p&gt;
&lt;p&gt;This doesn’t mean you should never grind at 100% effort. I think there are probably two or three times a year where I work as hard as I possibly can: long hours, intense focus, thinking about the problem from when I wake up to when I go to bed. But I reserve this mode of work for &lt;a href=&quot;/the-spotlight&quot;&gt;when the rewards are really high&lt;/a&gt;. For the rest of the year, I take it relatively easy.&lt;/p&gt;
&lt;p&gt;edit: this post got some comments on &lt;a href=&quot;https://news.ycombinator.com/item?id=48442880&quot;&gt;Hacker News&lt;/a&gt;. Commenters discuss &lt;a href=&quot;https://news.ycombinator.com/item?id=48446245&quot;&gt;how to not get in trouble&lt;/a&gt; with your manager when you’re taking slack time (in my experience, if you’re generally productive it’s fine, but managers vary a lot) and &lt;a href=&quot;https://news.ycombinator.com/item?id=48443273&quot;&gt;whether engineers really do have control&lt;/a&gt; over their workload.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;One of my big influences is Rich Hickey’s talk &lt;a href=&quot;https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/HammockDrivenDev.md&quot;&gt;&lt;em&gt;Hammock Driven Development&lt;/em&gt;&lt;/a&gt;. This is &lt;em&gt;kind of&lt;/em&gt; like what he’s talking about, except (a) Hickey is more talking about what it takes to design solutions to really hard problems, rather than what it takes to be a strong engineer in an ordinary tech company, and so (b) Hickey recommends using your time-away-from-the-computer to focus on a hard problem, instead of to simply decompress and let solutions congeal in your head. It’s also like Zvi Mowshowitz’s post on &lt;a href=&quot;https://thezvi.substack.com/p/slack&quot;&gt;“slack”&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I wrote about this a lot more in &lt;a href=&quot;/glue-work-considered-harmful/&quot;&gt;&lt;em&gt;Glue work considered harmful&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Why inevitably? Because in my view, burnout is &lt;em&gt;hard work unrewarded&lt;/em&gt;, and taking on a personal crusade that your job doesn’t care about is a great way to do a lot of unrewarded work.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;I wrote about this in &lt;a href=&quot;/predators&quot;&gt;&lt;em&gt;Protecting your time from predators in large tech companies&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;Of course, you have to be careful with this. If you try this strategy and you’re wrong about the level of political support for the project, you will come off like a slacker and then have to deliver in a rush.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Anti-AI nostalgia and the cult of the past]]></title><link>https://seangoedecke.com/anti-ai-nostalgia/</link><guid isPermaLink="false">https://seangoedecke.com/anti-ai-nostalgia/</guid><pubDate>Thu, 04 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Programmers were better back in the day, weren’t they? Back when we had real programmers. Not just people who got paid to write code, but people who &lt;em&gt;lived&lt;/em&gt; it, who were obsessed with their craft, and whose code was a lively expression of themselves. Hackers were hackers in those days before money took over the industry.&lt;/p&gt;
&lt;p&gt;Don’t even get me started on LLMs. Could there be a better example of today’s degenerate spirit? A machine to mass-produce software (not good software, just barely good enough), so that the weak minds that dominate the industry can indulge their obsession with &lt;em&gt;quantity&lt;/em&gt;: of slop code, of features, and ultimately of money, which is the only way they can understand value. If they weren’t destroying our way of life, they would be pitiable. All of them together don’t have a fraction of the spiritual integrity of someone like &lt;a href=&quot;https://users.cs.utah.edu/~elb/folklore/mel.html&quot;&gt;Mel&lt;/a&gt;. But as it is, we must band together to crush them and drive them from our industry like the parasites they are.&lt;/p&gt;
&lt;h3&gt;Returning to the past&lt;/h3&gt;
&lt;p&gt;Okay, that’s not actually what I believe. But there sure are a lot of posts&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; and comments on the internet that sound a bit like the paragraph above. Here are some older quotes that might sound similar:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;…the third collapse, in which power tends to pass into the hands of the lowest of the traditional castes, the caste of the beasts of burden and the standardized individuals. The result of this transfer of power was a reduction of horizon and value to the plane of matter, the machine, and the reign of quantity.&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Usura rusteth the chisel \ It rusteth the craft and the craftsman \ It gnaweth the thread in the loom&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;The actual accomplishments of the past will nevertheless remain accomplishments, while the artistic stammerings of the painting, music, sculpture, and architecture produced by these types of charlatans will one day be nothing but proof of the magnitude of a nation’s downfall.&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;These are all from the writings (or speeches) of famous fascists: Julius Evola, Ezra Pound, and Hitler himself. Mussolini’s &lt;a href=&quot;https://sjsu.edu/faculty/wooda/2B-HUM/Readings/The-Doctrine-of-Fascism.pdf&quot;&gt;&lt;em&gt;Doctrine of Fascism&lt;/em&gt;&lt;/a&gt; begins by defining fascism as a “spiritual attitude”, which the fascist man adopts in order to regain the mysterious qualities that were lost by the transition to modern life. In his classic &lt;a href=&quot;https://theanarchistlibrary.org/library/umberto-eco-ur-fascism&quot;&gt;&lt;em&gt;Ur-Fascism&lt;/em&gt;&lt;/a&gt;, Umberto Eco’s first two defining features of fascism are the “cult of tradition” and the “rejection of modernism”. So when someone tells me that the industry has lost its way and we must deny the corrupting influence of modern technology in order to &lt;a href=&quot;https://www.urbandictionary.com/define.php?term=retvrn&quot;&gt;retvrn&lt;/a&gt; to the time of virile &lt;a href=&quot;https://users.cs.utah.edu/~elb/folklore/mel.html&quot;&gt;real programmers&lt;/a&gt; (who understood and appreciated the spiritual dimension of programming), I get suspicious.&lt;/p&gt;
&lt;h3&gt;Fascism and crypto-fascism&lt;/h3&gt;
&lt;p&gt;It’s strange to describe anti-AI sentiment as potentially fascist, since a very &lt;a href=&quot;https://tante.cc/2026/04/21/ai-as-a-fascist-artifact/&quot;&gt;popular argument&lt;/a&gt; is that LLMs themselves are an inherently fascist tool. Surely both sides of the debate can’t be fascist? I do think that the structure of fascist arguments is &lt;a href=&quot;/many-anti-ai-arguments-are-conservative&quot;&gt;generally persuasive&lt;/a&gt;, and that many avowedly anti-fascist groups do sometimes fall into this trap: describing the world as a &lt;a href=&quot;https://en.wikipedia.org/wiki/300_(film)&quot;&gt;struggle&lt;/a&gt; between the spiritual power of the macho, traditional man and the corrupting influence of degenerate (often foreign) capital.&lt;/p&gt;
&lt;p&gt;For instance, I am a big fan of Lord of the Rings. I’ve read the series and watched the films multiple times, and even made a failed attempt to learn Elvish as a kid. But it’s hard to deny that fascists absolutely &lt;em&gt;love&lt;/em&gt; Lord of the Rings. “Marble statue of a Roman emperor” might be the most popular avatar for fascists on the internet, but Aragorn is the second most popular. Neo-fascist movements &lt;a href=&quot;https://www.cbc.ca/radio/tapestry/lord-of-the-rings-italy-1.6756668&quot;&gt;in Italy&lt;/a&gt; explicitly take up Lord of the Rings as a foundational text. Why? Because the core conflict in the text is between the traditional, nostalgic heroism of the Shire and Gondor, and the corrupting modern industrial (partly &lt;a href=&quot;https://tolkiengateway.net/wiki/Haradrim&quot;&gt;foreign&lt;/a&gt;) influence of Saruman and Sauron&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;I don’t think Lord of the Rings (or anti-AI rhetoric) is intrinsically fascist. In fact, the surface-level reading of the text is anti-fascist: the plucky people of the West banding together to fight Sauron’s command-and-control totalitarian society. But I can see why fascists love it.&lt;/p&gt;
&lt;h3&gt;The Luddites&lt;/h3&gt;
&lt;p&gt;One common historical touch-point for anti-AI folks is the Luddites, who were a violent conservative labor movement in early 1800s England. Anti-AI blogs adopt Luddite language like “smashing frames”, and positively &lt;a href=&quot;https://tante.cc/2026/04/21/ai-as-a-fascist-artifact/&quot;&gt;cite&lt;/a&gt; the Luddites as “the go-to enemies of fascism since its inception”. I’ve written at length about what we can learn from the Luddites in &lt;a href=&quot;/luddites-and-ai-datacenters/&quot;&gt;&lt;em&gt;Luddites and burning down AI datacenters&lt;/em&gt;&lt;/a&gt;, but one point I think is under-emphasized by the (generally pro-Luddite) books is that &lt;strong&gt;the Luddites were a little bit fascist themselves&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Brian Merchant’s &lt;a href=&quot;https://www.amazon.com.au/Blood-Machine-Origins-Rebellion-Against/dp/0316487740&quot;&gt;&lt;em&gt;Blood in the Machine&lt;/em&gt;&lt;/a&gt; is the most popular recent book on the Luddites. I enjoyed it, but Merchant’s attempts to paint the Luddites as a friendly, left-wing, proto-feminist movement&lt;sup id=&quot;fnref-6&quot;&gt;&lt;a href=&quot;#fn-6&quot; class=&quot;footnote-ref&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; seemed really unconvincing to me. From the writings of the Luddites, it’s clear that they were interested in protecting the rights of their all-male elite guild fraternity. Here’s one Luddite threat to a workshop that explicitly includes a threat against the female workers&lt;sup id=&quot;fnref-7&quot;&gt;&lt;a href=&quot;#fn-7&quot; class=&quot;footnote-ref&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We think it quite inconsistent with our duty as men, as husbands and as fathers to suffer ourselves to be ruined any longer by a set of vagabond strumpets and those gibbet-deserving rascals that are looking over them. We will lead them to their satisfaction. We sincerely hope, gentlemen, that you will discharge the bitches and take men into your employ again, or they must take what they get.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;These were fundamentally conservative people who felt (correctly) that modernity had deprived them of their elite status, handing it instead to lower-paid inferiors: women, vagabonds, and foreigners.&lt;/p&gt;
&lt;p&gt;The Luddites were obviously not fascists&lt;sup id=&quot;fnref-8&quot;&gt;&lt;a href=&quot;#fn-8&quot; class=&quot;footnote-ref&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;. However, the basic ingredients were there: wounded pride, a masculine elite identity, hatred of modern economics, and violence aimed at restoring their previous position in society. The currents that produced Luddism are the same currents that guided so many unhappy people towards fascism. When things are looking grim for an elite group, they often turn towards any movement that promises a return to an idealized past.&lt;/p&gt;
&lt;h3&gt;Everything is permitted&lt;/h3&gt;
&lt;p&gt;If my blog has themes, one of them is surely that many software engineers labor under a delusion that their job is to be excellent at their craft. Of course, &lt;em&gt;wanting&lt;/em&gt; to be an excellent programmer is not a delusion; it is a completely legitimate value to hold, and a legitimate purpose to pursue. It’s just not what you’re paid to do at work. Your &lt;em&gt;job&lt;/em&gt;, unfortunately, is producing &lt;a href=&quot;/shareholder-value&quot;&gt;shareholder value&lt;/a&gt;. This delusion has been punctured by the &lt;a href=&quot;/good-times-are-over&quot;&gt;end of ZIRP&lt;/a&gt;, and again more recently by the rise of AI coding.&lt;/p&gt;
&lt;p&gt;In this environment, I worry that some software engineers will form exactly the kind of disillusioned elite that was the audience for Ezra Pound’s poems about “usury” or the Luddites’ campaign against unapprenticed (often female) textile workers. I worry that AI, and the companies that build AI, are becoming an enemy against which anything is permitted: an enemy which in Umberto Eco’s &lt;a href=&quot;https://theanarchistlibrary.org/library/umberto-eco-ur-fascism&quot;&gt;words&lt;/a&gt; is “at the same time too strong and too weak”, &lt;a href=&quot;/illusion-of-thinking/&quot;&gt;unable to reason&lt;/a&gt; and yet powerful enough to drastically reshape the global labor market for the worse.&lt;/p&gt;
&lt;h3&gt;Nuance&lt;/h3&gt;
&lt;p&gt;The enemy of fascism is nuance. Fascism presents a good, clean, rousing story about a spiritual conflict between right and wrong. It is anathema to fascism to stop and muddy the waters a bit: in this case, to explore the ways in which LLMs, like any transformative technology, can both support and endanger traditional values.&lt;/p&gt;
&lt;p&gt;In &lt;a href=&quot;/the-left-wing-case-for-AI&quot;&gt;&lt;em&gt;The left-wing case for AI&lt;/em&gt;&lt;/a&gt; I wrote about how AI is being used &lt;em&gt;right now&lt;/em&gt; as a disability aid, and many disabled readers wrote in to share their positive experiences with LLMs, and often how alienated they feel by the anti-AI mainstream on the left. I recently got an email describing how there’s a sudden flood of accessibility software for blind people&lt;sup id=&quot;fnref-9&quot;&gt;&lt;a href=&quot;#fn-9&quot; class=&quot;footnote-ref&quot;&gt;9&lt;/a&gt;&lt;/sup&gt; that’s &lt;em&gt;actually built by blind people&lt;/em&gt;, who can now iterate with a LLM to get a product that meets their needs. Framing AI as an ontological evil erases experiences like these.&lt;/p&gt;
&lt;p&gt;Being anti-AI is not inherently fascist. Many of the anti-AI posts I’ve quoted are thoughtful, sensitive pieces exploring how the author thinks about one of the biggest changes to our industry. I still think the world needs more articles like that, not less, but the more of them I read, the more I recognize the tropes: spiritually pure lovers of the craft, degenerate peddlers of corrupt modernism, a need to return to the traditional ways of the hacker, and a lament for the (potentially) waning power of an elite fraternity of programmers.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;I know I’m tiptoeing around &lt;a href=&quot;https://www.lesswrong.com/posts/yCWPkLi8wJvewPbEp/the-noncentral-fallacy-the-worst-argument-in-the-world&quot;&gt;the worst argument in the world&lt;/a&gt;. It isn’t a refutation of anti-LLM arguments to say that they are structurally similar in some ways to fascist arguments, any more than it’s a devastating critique to say the same thing about Lord of the Rings. Sometimes it is good to try and halt the march of progress! Some of our past traditions really were purer and more spiritually robust! It just bothers me, that’s all.&lt;/p&gt;
&lt;p&gt;I used to read &lt;a href=&quot;https://users.cs.utah.edu/~elb/folklore/mel.html&quot;&gt;The Story of Mel&lt;/a&gt; with unalloyed pleasure. Now it makes me nervous. If you believe you’re fighting &lt;a href=&quot;https://tante.cc/2026/04/21/ai-as-a-fascist-artifact/&quot;&gt;the embodiment of fascism&lt;/a&gt;, or for &lt;a href=&quot;https://sinclairtarget.com/blog/2026/06/01/quality-in-the-age-of-slop/&quot;&gt;the idea of value itself&lt;/a&gt;, what &lt;a href=&quot;https://www.theguardian.com/technology/2026/apr/18/sam-altman-house-attack-ai&quot;&gt;tactics&lt;/a&gt; are off-limits? What positions might you eventually come to accept?&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;It feels wrong to directly associate my caricature with any actual posts, but it also feels wrong to make a blanket assertion without examples. Just so you know what I’m talking about, &lt;a href=&quot;https://huronbikes.mataroa.blog/blog/i-am-not-a-software-engineer/&quot;&gt;here&lt;/a&gt; &lt;a href=&quot;https://lpcvoid.com/blog/0018_why_i_am_against_genai/index.html&quot;&gt;are&lt;/a&gt; &lt;a href=&quot;https://alextardif.com/AI.html&quot;&gt;some&lt;/a&gt; &lt;a href=&quot;https://sinclairtarget.com/blog/2026/06/01/quality-in-the-age-of-slop/&quot;&gt;posts&lt;/a&gt; that have elements of this attitude. I like some of these posts and dislike others.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Page 329 of my copy of Julius Evola’s &lt;em&gt;Revolt Against the Modern World&lt;/em&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Ezra Pound, Canto XLV. “Usura” should be read as “usury”, or today we could gloss it as “capitalism”: all Pound’s examples of great art were from the pre-capitalist patronage era of art.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Adolf Hitler, from his speech at the 1933 Party Congress in Nuremberg.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;Of course, there’s also historically been a strong &lt;em&gt;pro&lt;/em&gt;-technology current in fascist thinking (even specificially &lt;em&gt;Italian&lt;/em&gt; fascist &lt;a href=&quot;https://artmejo.com/how-italian-futurism-influenced-the-rise-of-fascism/&quot;&gt;thinking&lt;/a&gt;).&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;Page 134 of &lt;em&gt;Blood in the Machine&lt;/em&gt; has a brief argument that Luddism was feminist because the (exclusively male) artisans’ wives would provide food for their meetings. No, really.&lt;/p&gt;
&lt;a href=&quot;#fnref-6&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-7&quot;&gt;
&lt;p&gt;From Kevin Binfield’s &lt;em&gt;Writings of the Luddites&lt;/em&gt;, page 40. I’ve taken the liberty of re-rendering it in modern spelling and grammar.&lt;/p&gt;
&lt;a href=&quot;#fnref-7&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-8&quot;&gt;
&lt;p&gt;Aside from being too early, they didn’t have any connection to the state apparatus of power (in fact, they were ultimately crushed by it) and they famously lacked a singular leader.&lt;/p&gt;
&lt;a href=&quot;#fnref-8&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-9&quot;&gt;
&lt;p&gt;The example cited was &lt;a href=&quot;https://github.com/serrebidev/BlindRSS&quot;&gt;BlindRSS&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-9&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Weird projects I shipped with AI]]></title><link>https://seangoedecke.com/weird-projects-i-shipped-with-ai/</link><guid isPermaLink="false">https://seangoedecke.com/weird-projects-i-shipped-with-ai/</guid><pubDate>Mon, 01 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Where are all the AI-generated projects? This is a &lt;a href=&quot;https://news.ycombinator.com/item?id=46262545&quot;&gt;common question&lt;/a&gt; from AI skeptics: if LLMs are so good at writing code, where is the tsunami of new AI-generated apps, services and games?&lt;/p&gt;
&lt;p&gt;I personally don’t find this to be much of a paradox. Writing code is only one of the bottlenecks involved in actually &lt;a href=&quot;/how-to-ship&quot;&gt;shipping&lt;/a&gt; a new product, after all. It’s also impossible to talk about the paid work I’ve done with AI (you’ll simply have to take my word that it’s increased my productivity). But one thing I can do is share a list of personal projects I’ve built with AI in the last twelve months.&lt;/p&gt;
&lt;p&gt;I definitely would not have done &lt;em&gt;all&lt;/em&gt; of these by hand. I might have found the time to do one or two of them, but based on my pre-AI track record they would probably have stayed in the “GitHub repo with a few commits” stage. This list is a kind of &lt;a href=&quot;https://sites.millersville.edu/bikenaga/math-proof/existence-proofs/existence-proofs.html&quot;&gt;existence proof&lt;/a&gt;: a bunch of weird projects, useful to at least some people, that would not have existed without AI assistance&lt;sup id=&quot;fnref-0&quot;&gt;&lt;a href=&quot;#fn-0&quot; class=&quot;footnote-ref&quot;&gt;0&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Skifreedle&lt;/h3&gt;
&lt;p&gt;Most recently I’ve built &lt;a href=&quot;https://skifreedle.com/&quot;&gt;skifreedle.com&lt;/a&gt;, a daily-game version of the classic Windows SkiFree &lt;a href=&quot;http://ski.ihoc.net/&quot;&gt;game&lt;/a&gt; (i.e. “like Wordle, but for SkiFree”). The code for that is &lt;a href=&quot;https://github.com/sgoedecke/skifreedle&quot;&gt;here&lt;/a&gt;&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. &lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/6cdb3210b3e29b8b9f8ba68fee4d2502/105d8/skifreedle1.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 114.86486486486487%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAXCAYAAAALHW+jAAAACXBIWXMAABYlAAAWJQFJUiTwAAACpElEQVQ4y5VUy2sTQRjPzT+kN8G/wJsn8aIHb54FQRAUBO+CPfpAD70p+CDUB20pptDXQVCqNWlsmuwr+2qSptluEvLa7M7sz9nZbJpNS9gOfMw388385je/75tJGT+zEN8tQv2+jmqlggoz2z5hfQ1LGwpWtmRougHTNGBZFk6YHQqrqApp1EwBJ3YLrVYTsqJA0StIbd99jN25q1i7dhsry8tYYlY82Mf65g/MXf+AKzc/4lP6Kz4vppHL5VHYz2PlzWVkXl3CdmYBxVIZpVIR2exfLK9mkNKNOta+rSK/swNd16FpGhRZhiSKyOVLyO4V2ViBqqqQR/OK+A+qlGe+AEmSoLHYxuYWCpKB1BCAUj2GSzzMar4/M8xkOYTV8ZDqMpyipKLTbvMAJWRsZNS7QxeOM4zFKCPA13iE7xPlMo6afaR6bFySdXQ7nZAJpTELqLmuhyEDjcaTMUoo3ycpKuqtwfmAp5tY71MO5jhxwKhPCDg6nYbX8Zks/YFzChQIGhzIGZJkDCOwYvcARl+H26cg/lmGUZsJGBihLh8/EO7jtfmc+31WxBGzCKxj12CbBYiSzACd2UkJ2p3sDcyLT7j/J7MLq2qF6wjl+tr1NmoFG4JYns0w0DGQqt0x0e03eOzIaqDX68XXTV65OTPLcY3O+H5YAQmTQieAybn1F81TeqE6PCerU9lNxHD6pcQKfQR2PKihPWwyRCRjGBqN1WVwTULDT+Sp8BBvtRfJ63BSq6jIwzjz2Vktp4Gum5BhcLVprcyeBmtwHOoWHeBfUMNgg9Er8/l54RHeGy/HQNGaRFmOwAj78+7t3UK+9Zv5Qz63UV/CQvnZiKmXnGGYCPa0WDYdr8vBgrbX/IXN+pcxy8S/zTjbE1ek1DvzWqb/w/+ANdSk/ZHjHwAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;skifreedle&quot;
        title=&quot;skifreedle&quot;
        src=&quot;/static/6cdb3210b3e29b8b9f8ba68fee4d2502/fcda8/skifreedle1.png&quot;
        srcset=&quot;/static/6cdb3210b3e29b8b9f8ba68fee4d2502/12f09/skifreedle1.png 148w,
/static/6cdb3210b3e29b8b9f8ba68fee4d2502/e4a3f/skifreedle1.png 295w,
/static/6cdb3210b3e29b8b9f8ba68fee4d2502/fcda8/skifreedle1.png 590w,
/static/6cdb3210b3e29b8b9f8ba68fee4d2502/efc66/skifreedle1.png 885w,
/static/6cdb3210b3e29b8b9f8ba68fee4d2502/105d8/skifreedle1.png 1170w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/6ee543da4626f592eb2537653f7d2b18/a13c9/skifreedle2.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 127.02702702702702%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAZCAYAAAAxFw7TAAAACXBIWXMAABYlAAAWJQFJUiTwAAAFL0lEQVQ4y5WVSWwbZRTHfUJCYimtOEFBaYVQD+wHkDhx4IJAlKpQLlxKqZIKqUVlq7gUKoQQqDQt6kIV1DRN48ZZmtVOxksS23HcLE4cL2OPZ7zvSewk3rc/b8ZOF0oOjPT3+8bv+37zvuW9TxYYGESs7Rp4si4PC7vNinBIwILVhdYOC7pHluB22eBwLCORTCDos4E1tcIz/zfcbhv1X8ZqIgFB4BAMByFLBsMwTQ/DbFUhEY8hGgkiHo3Azi9Cu6QDLwQQi4Xo/xAi5IuE/YiGWESCLJLRGKz+OQw7FGBdLFLracjylTLCK1Fki2lUygWUSzlSntp51KqiitJ7qZhHsZCr9ymXSEWgUoFv3YXJYD+i8SjyxSJk2XyOIoiiRLZEMHFgqZiTVCwQpJi9C8vnMw1/tt5H/AB9rJzLIBKloIghy9KgWDyM7MYqcusbyGXXkNtI1yW219eRy6SQ2VzDxlrqYf/GOvlSiCfjBMxDVihVYXcYcfVyO+RX5OjS/4zr54bRrfgTnePn0fnbKG5pfsH1mzfQeaEbcsMZdFwYRPfNy+hkzqLrDxXO3ziN8UklKtUaZEVaQ8v8HJb6lBDUOnCMBpxyAh6GATc0Bm6YgUelAjeoBqciMeqGXw33mAZ+jRG69nZ0tF9DDaAI6ceqYuB9bB8Su15DXNRTLyHx3NtIH/kG6eZTWHv/c6RbTiH12QmkDrYg8eybiO94GbFdryL19OuYfWQvui5eQbUOrGFpcgr2Dw7Dc/ArsAeOw/HhcbAfn4Tn+3PwfHsWXMsZ8D9cgOfEr+COnobrk5NwfXRc6us79DUM736Km1fb6sA87ZrDxYEPWGE3HIXf0oKk8xiClmbwhsMQjCTTkXqbrDDzBYTZZghzLfDNN8NpOoapGQ16+gdQqVVFYI6AXsR4HRzyR2HpeAIzbY9jhdkBOHcBNtLyTrINLTdk3Sn58/onoVXeQN/ACAEr9QidbgE+txGDrbvR9uNutH73DG6fex7LPXuwIG+CrfcFLJMWu/fAqtiLxVtNsJCsiiborzVBM6Yg4PAWsCCljc/npRTUwO2cg9djgc1qxsK8GcvWOShHh6DVjGHGNIXJCQaWBTPpDpYWZ2nMFPR6PXr7+7emXICDdSIQ8EuAkZFBeDgnLW8F1UoO4jM7a4LRMEH+GQi8E5sbSSktgRLlfwgMM46+27fvAZ0UYSgUhMEwCa2WkazROIVpkpHaY6oRMONK6LTjFKEGBr0O83MzJDNMJgNFOU0R9jWApSJlih3ZTIZyvUr5mkcqlQbHeag8cfB6vfAF4uC9EfiDCXiEMAJkU+tFZHMlaQZ+vx+3FN10bGr3gBkCik+UkpzneZo2h5W1VYrgDiJzX8Knew+BqQMQdPvhYvaDG38HC1MXpTFeL/8fwM1NycmyLHp7e9HT04NR2oxLl/6CVn4I8t9fpLx+BYrWN6A4/xYUZ/dhoOsnaYwg8NtHWCqVkM1mqd6VqaBGYDaboZmYwYR+HsNKOqusX+qXoT0pV6QmnRDfw8BsA3j/UygUaIMM0E9qaeH1GBoaoCUJN7w11Gq17YGbjSlXq3RcqlVpg2q0a4VCEbFkBsnVjFSpRUilstWn0lhD74NAGwHFKW738JEaQivbuqWNlHc3gAVaMxGoVI5CRXVPpVLelVJZt2pGRedQdff9fr9arUY71UMRKNXDHF0BkViEjgkL1mmDi7U/IFaUs65/+7bkpCtW8AkQk0Qm3gMxurHE2w1SRav8T9GYWgnxRP1O+Qe4Rrb0JcRZEQAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;skifreedle2&quot;
        title=&quot;skifreedle2&quot;
        src=&quot;/static/6ee543da4626f592eb2537653f7d2b18/fcda8/skifreedle2.png&quot;
        srcset=&quot;/static/6ee543da4626f592eb2537653f7d2b18/12f09/skifreedle2.png 148w,
/static/6ee543da4626f592eb2537653f7d2b18/e4a3f/skifreedle2.png 295w,
/static/6ee543da4626f592eb2537653f7d2b18/fcda8/skifreedle2.png 590w,
/static/6ee543da4626f592eb2537653f7d2b18/efc66/skifreedle2.png 885w,
/static/6ee543da4626f592eb2537653f7d2b18/a13c9/skifreedle2.png 1178w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;I enjoy coding small web games by hand, but &lt;em&gt;definitely&lt;/em&gt; would not have had the time to wire up all the different SkiFree objects or build neat features like a ghost of your fastest run. I also tried out a lot of different visual themes for the game UI before landing on something I liked. If I’d done this by hand, I would have only had time to try out two or three different looks, instead of fifteen or twenty.&lt;/p&gt;
&lt;p&gt;I’m very happy with how this turned out. I’ve been enjoying competing against my brother to get better times, since both of us have a lot of nostalgia for the original SkiFree game.&lt;/p&gt;
&lt;h3&gt;Autodeck&lt;/h3&gt;
&lt;p&gt;Last year I built &lt;a href=&quot;https://www.autodeck.pro/&quot;&gt;Autodeck&lt;/a&gt;! I wrote a &lt;a href=&quot;/autodeck/&quot;&gt;blog post&lt;/a&gt; about this before, but this came from my partner wishing there was some way to automatically generate Anki cards about random topics she wanted to learn about. It ended up being relatively straightforward to set up an endless feed of auto-generated spaced repetition cards:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/ab189b44f0c389b77ca4df74dcf8259b/71c1d/autodeck.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 70.27027027027026%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAYAAAAvxDzwAAAACXBIWXMAABYlAAAWJQFJUiTwAAABdklEQVQ4y41T227CMAzl/39r2tM07WkTu7FxhwlG0zZpkib1fFygBBWGpVOniX1y6tqDqtLknBV43/l07RIc4k9RVYaMKWlgraGmIYr8qEOgECL7KOsYea8OnOCZqG7B732Gi4QQzLCi1KTyglHSZpvRdqfY72iX5VRZl6Dhy08Bg8o9oRZ1UBZiC6iCh6LI/pLhDABnQgiCvDRU6orywrBaw8oK8ZkqZS9TBXst+5tfJfvW1QJfx5QQ6h6fnunu/oFehp+SMHwb0fdkQcP3L5ovf+iV/WyxpjHvYf0xmvDne7I+SF1ThUyIuo2nCypYJQKdxw9quBzE553nUFmjEojpVYgbrAs0mS1pOl/J7av1VoJFhfj6+ImV69a9hKghEpB8xD4YKq7hisK6N/gWSPx5Df9TcgAu75q/BQYgITxv1EuAoTxodmkpHgK0E9oNk6J10U3KrQZe63wHrnc7nq5TaAzGTpFSGUNdQXue53kCzfktT0l/h7tC4XKA5C0AAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;autodeck&quot;
        title=&quot;autodeck&quot;
        src=&quot;/static/ab189b44f0c389b77ca4df74dcf8259b/fcda8/autodeck.png&quot;
        srcset=&quot;/static/ab189b44f0c389b77ca4df74dcf8259b/12f09/autodeck.png 148w,
/static/ab189b44f0c389b77ca4df74dcf8259b/e4a3f/autodeck.png 295w,
/static/ab189b44f0c389b77ca4df74dcf8259b/fcda8/autodeck.png 590w,
/static/ab189b44f0c389b77ca4df74dcf8259b/efc66/autodeck.png 885w,
/static/ab189b44f0c389b77ca4df74dcf8259b/c83ae/autodeck.png 1180w,
/static/ab189b44f0c389b77ca4df74dcf8259b/71c1d/autodeck.png 1536w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;I set up Stripe payments for this one, more because I was worried about someone running away with my Groq balance than because I wanted to make money, but I was pleasantly surprised to see a bunch of people actually use this. Over five hundred people have tried it out, with enough paid subscribers to cover inference and hosting.&lt;/p&gt;
&lt;p&gt;I &lt;em&gt;might&lt;/em&gt; have built this without LLM assistance, but I almost certainly would not have &lt;em&gt;deployed&lt;/em&gt; it as a website. The hassle of setting up a database and Stripe would have just been too much work.&lt;/p&gt;
&lt;h3&gt;Endless Wiki&lt;/h3&gt;
&lt;p&gt;I also built an AI-generated &lt;a href=&quot;https://www.endlesswiki.com/&quot;&gt;endless wiki&lt;/a&gt;. I wrote a &lt;a href=&quot;https://www.seangoedecke.com/endless-wiki/&quot;&gt;blog post&lt;/a&gt; about this one as well. Like Autodeck, I was fascinated with the idea of non-chat interfaces for LLMs, and I thought a wiki-based approach where you interact with the model by clicking links was pretty cool.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/50f19e416dd78a54917fe8f645449839/6578c/ewiki.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 39.189189189189186%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAICAYAAAD5nd/tAAAACXBIWXMAABYlAAAWJQFJUiTwAAABUElEQVQoz01RCZKDMAzj/6/YX+xjdtpyE5JAUsJVLq3tFnY1I2wC2JKIwvjCQw2omgnTNCP0A/phxDBOUpnrtoGx74fwOIDXssD5DvM04CcL+PouMMwbIt8FjNOEE89nh7wooLVGWZZomgZK1bBUGceH/8FL1nWjRQeiXPcozXA97EKPQrUo9ROV9qitQ13XQmutLHDOoW3biyGEa01k/IxMj1jWXQ68d0iShNRVpJRZIs9zUZumqfRZXiKhnt/js1obESIK+bJt72wYjQt4pAppYZFlOX0QU80+PQ8skJYWLeV3KjWNo6xnURn9JXFcliuloWqipvyMJ/tO7HOunKdSlVif5xnjOFKdpO77/lZ4kjH0PX1YC41poG0LbUiFsTRIycAzUx4qJJUL/XXJ8NL3Gcg2brcb7vc7ZRSTzQRxHIvd06L3XtjSPZ9xz+oYv3aLYoFdwXwIAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;endlesswiki&quot;
        title=&quot;endlesswiki&quot;
        src=&quot;/static/50f19e416dd78a54917fe8f645449839/fcda8/ewiki.png&quot;
        srcset=&quot;/static/50f19e416dd78a54917fe8f645449839/12f09/ewiki.png 148w,
/static/50f19e416dd78a54917fe8f645449839/e4a3f/ewiki.png 295w,
/static/50f19e416dd78a54917fe8f645449839/fcda8/ewiki.png 590w,
/static/50f19e416dd78a54917fe8f645449839/efc66/ewiki.png 885w,
/static/50f19e416dd78a54917fe8f645449839/c83ae/ewiki.png 1180w,
/static/50f19e416dd78a54917fe8f645449839/6578c/ewiki.png 2242w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;I learned the hard way that putting a LLM generation call on the end of a regular link was a bad idea: scrapers would exhaust my inference budget quickly. I ended up faking the no-article-exists-yet links with JavaScript, which at least so far has defeated scrapers. People still email me about Endless Wiki, and there are over 280 thousand pages generated.&lt;/p&gt;
&lt;p&gt;My original goal was to see if you could eventually generate a page for Neon Genesis Evangelion, starting at the root page and only following links (kind of like &lt;a href=&quot;https://dev.to/zmbailey/wikigolf-an-automated-traversal-of-wikipedia-9o0&quot;&gt;wiki golf&lt;/a&gt;). I was successful! You can read the “Evangelion Anime” page &lt;a href=&quot;https://www.endlesswiki.com/wiki/evangelion_anime&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Almost exactly a month after I launched Endless Wiki, xAI launched &lt;a href=&quot;https://en.wikipedia.org/wiki/Grokipedia&quot;&gt;Grokipedia&lt;/a&gt;. Obviously they didn’t plagiarize me. This is a very easy idea to have, and my site was not the first infinite wiki (though I think it was the first one where you had to discover new pages by clicking on links). But it did take some of the shine off.&lt;/p&gt;
&lt;h3&gt;VicFlora Offline&lt;/h3&gt;
&lt;p&gt;I built a &lt;a href=&quot;https://vicfloraoffline.netlify.app/&quot;&gt;PWA&lt;/a&gt; that caches the VicFlora plant identification database so it could be used with low or no internet. This was more of a utility project for my partner, who likes plants and occasionally goes on field trips where internet is spotty.&lt;/p&gt;
&lt;p&gt;I would definitely not have done this without LLMs. It was reasonably difficult to scrape the basic dichotomous key from the VicFlora website: their API documentation was out of date, there were multiple possible pathways for fetching data (most of which were not functional), and the format of the data I did manage to fetch was hard to parse. I think I &lt;em&gt;could&lt;/em&gt; have done it, with enough effort, but it would have been a substantial amount of work.&lt;/p&gt;
&lt;p&gt;I’m very happy with how this turned out. It’s not perfect, but it’s functional, and I’ve even had the occasional Victorian botanist email me with bug reports or feature requests, so it’s clearly seeing a little bit of usage.&lt;/p&gt;
&lt;h3&gt;Other projects&lt;/h3&gt;
&lt;p&gt;I did a bunch of other stuff that doesn’t necessarily rise to the level of a “deployed project”: my &lt;a href=&quot;https://github.com/sgoedecke/gh-standup&quot;&gt;gh-standup&lt;/a&gt; GitHub CLI extension to automatically generate a standup report, which has just over a hundred stars, my (low quality) image geolocation &lt;a href=&quot;https://github.com/sgoedecke/ai_geolocation&quot;&gt;benchmark&lt;/a&gt;, which I blogged about &lt;a href=&quot;/the-o3-geoguessr-prompt-did-not-work/&quot;&gt;here&lt;/a&gt;, or my &lt;a href=&quot;https://github.com/sgoedecke/skills/blob/main/skills/extract-features-clamp-inference/SKILL.md&quot;&gt;skill&lt;/a&gt; for extracting features from open-source models.&lt;/p&gt;
&lt;p&gt;There may not be a flood of AI-generated companies (yet), but at least for me there’s been a flood of small, weird projects that would not have existed without significant LLM assistance.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-0&quot;&gt;
&lt;p&gt;I also want to shout out Simon Willison’s &lt;a href=&quot;https://simonwillison.net/2025/Sep/4/highlighted-tools/&quot;&gt;version of this&lt;/a&gt;, which is another great example of “weird useful tools that only exist because the cost of creating them was so low”.&lt;/p&gt;
&lt;a href=&quot;#fnref-0&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I did lift the spritesheet from DanielHough’s &lt;a href=&quot;https://github.com/basicallydan/skifree.js&quot;&gt;SkiFree.js&lt;/a&gt;, which attributes it to &lt;a href=&quot;http://spriters-resource.com/submitter/Wing%20Wang%20Wao&quot;&gt;Wing Wang Wao&lt;/a&gt;. Of course, the original sprites and art belong to Chris Pirih’s SkiFree and Microsoft.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Build agents, not pipelines]]></title><link>https://seangoedecke.com/build-agents-not-pipelines/</link><guid isPermaLink="false">https://seangoedecke.com/build-agents-not-pipelines/</guid><pubDate>Sun, 31 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;There are only two ways to use LLMs in a computer program: as part of a pipeline, or as an agent. In other words, either you express the control flow of the program in code, or you give a LLM tools and allow it to manage the control flow itself&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Here’s how you might structure a trivial “summarize a bunch of information and email it to me” program as a pipeline:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;ruby&quot;&gt;&lt;pre class=&quot;language-ruby&quot;&gt;&lt;code class=&quot;language-ruby&quot;&gt;context &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; gather_context&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;various&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; sources&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
llm_response &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; llm_summarize&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;context&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
summary &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; parse&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;llm_response&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
email_me&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;summary&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; my_email&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And here’s how you’d do it as an agent:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;ruby&quot;&gt;&lt;pre class=&quot;language-ruby&quot;&gt;&lt;code class=&quot;language-ruby&quot;&gt;read_data_tool &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; build_read_data_tool&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;various&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; data&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; sources&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
email_tool &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; build_email_tool&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;my_email&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
run_agent&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token symbol&quot;&gt;tools&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;read_data_tool&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; email_tool&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It’s like the difference &lt;a href=&quot;https://news.ycombinator.com/item?id=46375199&quot;&gt;between&lt;/a&gt; a library and a framework. When you use a library, you define the structure of the program yourself, and call out to various library helpers along the way. When you use a framework, the main structure of the program lives in the framework, and it calls your code at various points. There are tradeoffs involved in both approaches. Frameworks let you get started more quickly and typically give you features “for free”, but can be difficult when you want to do something that isn’t part of the framework’s design. Libraries give you a lot more control, but require you to write (and maintain) more boilerplate code.&lt;/p&gt;
&lt;p&gt;In the trivial case, the distinction between a pipeline and an agent melts away. If you only have a few paragraphs of possible context for the problem, an agent with a &lt;code class=&quot;language-text&quot;&gt;gather_context&lt;/code&gt; and an &lt;code class=&quot;language-text&quot;&gt;email_me&lt;/code&gt; tool will perform exactly the same steps as a pipeline that calls a reasoning model with the context injected into the prompt (i.e. the agent will reproduce the trivial control flow of your pipeline). But when you have more context than will fit into a single prompt, or you want to take an action and then react to the result, the choice between pipelines and agents becomes very significant.&lt;/p&gt;
&lt;h3&gt;Predictability, flexibility and intelligence&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Pipelines are more predictable, but agents are more flexible&lt;/strong&gt;. When you give a problem to an agent, work stops when the LLM thinks it’s done. Depending on the perceived difficulty of the problem, this can take anywhere from a few LLM turns to hundreds (and thus cost anywhere from a few cents to many dollars). If you’re building something intended to run at scale, this unpredictability can be a nightmare. Any subtle change to the user data could cause the LLM to take twice as long on each task, which would double your latency&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; and cost.&lt;/p&gt;
&lt;p&gt;Pipelines are only immune to this problem if they don’t use reasoning models, or don’t allow the model to “think out loud” in its output tokens (for instance, by using &lt;a href=&quot;https://developers.openai.com/api/docs/guides/structured-outputs&quot;&gt;structured output&lt;/a&gt;). However, individual LLMs offer much tighter control over model reasoning than over how long an agentic loop will take. In all frontier model APIs, you can explicitly set the level of reasoning you want. That doesn’t give you total control, but it does cap “take longer” at maybe ten or twenty percent (instead of with agents, where it can be 2x or more).&lt;/p&gt;
&lt;p&gt;Why use agents, then? &lt;strong&gt;Agents are smarter&lt;/strong&gt;. If you’re happy to accept the unpredictability, an agentic system can handle &lt;em&gt;much&lt;/em&gt; more difficult tasks, by virtue of being able to loop for longer, and to gather more information after thinking about the problem. There’s a reason that the most successful AI products (coding agents like Claude Code, Codex, Cursor, and Copilot&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;) are agents: coding is a hard enough task that you simply cannot build a functional coding agent with pipelines.&lt;/p&gt;
&lt;h3&gt;Context-gathering&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;The context-gathering stage is far more delicate for pipelines than for agents&lt;/strong&gt;. If an agent is trying to solve a problem and realizes it needs more data, it can simply go and get it. But for a pipeline, all the required data has to be present in the context already, because the LLM only gets to run once.&lt;/p&gt;
&lt;p&gt;Much of the work involved in building pipelines is in getting context-gathering right. &lt;strong&gt;Agents are much easier.&lt;/strong&gt; For instance, with a coding agent, you can basically just provide a “grep” and “read file” tool and let the agent figure out what chunks of code are relevant to the current file. In a pipeline, you have to figure that out yourself: good luck, it’s an unsolved technical problem! Typically you’ll end up doing some set of clever tricks, like walking the AST to identify which parts of code “contribute” to the current file, or indexing the whole codebase with semantic embeddings and doing some kind of nearest-neighbor search to build the context (called RAG, or “retrieval-augmented generation”). Neither of these will work as well as using an agent.&lt;/p&gt;
&lt;p&gt;In 2023 and 2024, many people believed that RAG would solve context-gathering. Every LLM would have a fully-indexed context base that would magically surface the precise information the LLM needed at any given moment. This did not happen. Instead, we went &lt;em&gt;backwards&lt;/em&gt;, getting our agents to do plain-text search and figure it out like a human would. Why didn’t RAG work? This is a topic for a whole other post, but the short answer is this: “find what information is relevant to this problem” is often as hard a task as &lt;em&gt;actually solving the problem&lt;/em&gt;. Semantic embeddings and cosine similarity are simply not powerful enough tools for the job.&lt;/p&gt;
&lt;h3&gt;Multi-model pipelines&lt;/h3&gt;
&lt;p&gt;Pipelines that make multiple LLM invocations do have an extra dimension of flexibility: they can use different LLMs for different tasks. For instance, if one LLM benchmarks better at task A, or is cheaper for an easier task B, you can use the right model for the job. Agents (at least right now) have to stay the same model the whole time, so you’re always pinned to the highest level of intelligence you need.&lt;/p&gt;
&lt;p&gt;Is this a big deal? I’m suspicious. One pattern I see a lot is tasking a cheaper model with collating or summarizing data for a smarter model to do something with. But often the signal is in the raw data itself! I think designs like this are really shooting themselves in the foot, for the same reasons that RAG didn’t work: context-gathering was a harder problem than people anticipated.&lt;/p&gt;
&lt;p&gt;In any case, if you do want to farm out tasks to different models, you can also do it via careful agentic tool design. For instance, you could build your &lt;code class=&quot;language-text&quot;&gt;web_search&lt;/code&gt; tool so that it uses a cheap model to summarize web pages.&lt;/p&gt;
&lt;h3&gt;Small contexts and future-proofing&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Pipelines allow working with smaller contexts, and thus with local models&lt;/strong&gt;. An agent’s ability to fetch its own context means that it almost always ingests more data than it needs. On top of that, agents run in loops, so each agent turn increases the size of the context. This isn’t a big problem for systems built on top of frontier model APIs, because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;frontier models all expose large context windows,&lt;/li&gt;
&lt;li&gt;frontier models tend to hold up pretty well for the first 200k tokens, and&lt;/li&gt;
&lt;li&gt;KV caching means that passing around the same large context block is surprisingly cheap.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, it is a big problem for local models. The context window consumes &lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1j6xpvt/how_large_is_your_local_llm_context/&quot;&gt;a lot of VRAM&lt;/a&gt;, so most people running local models stay below 32k (or even 6k) tokens. If you’re writing a program to run in this environment, you likely will not be able to give an agent the space it needs, and you will be instead forced to use a pipeline.&lt;/p&gt;
&lt;p&gt;In my opinion, &lt;strong&gt;agents are more future-proof&lt;/strong&gt;. This is partly because models are now being explicitly built to be better agents, and partly because agents delegate more to the LLM and thus benefit more from LLM improvements. If you have a pipeline-based system, new models will probably do a bit better than old ones. If you have an agentic system, new models might do &lt;em&gt;much&lt;/em&gt; better than old ones (to the point that it’s worth building an agentic systems for tasks that are currently too hard, on the assumption that by the time you’ve finished the models may be good enough). I have been banging this drum &lt;a href=&quot;/llm-driven-agents/&quot;&gt;since 2023&lt;/a&gt;, before tool-calling was even a part of model APIs.&lt;/p&gt;
&lt;h3&gt;Safety and legibility&lt;/h3&gt;
&lt;p&gt;In general, I disagree with the &lt;a href=&quot;https://www.decodingai.com/p/stop-building-ai-agents&quot;&gt;popular advice&lt;/a&gt; that workflows are safer than agents. Workflows offer more control &lt;em&gt;over budget&lt;/em&gt;, but when it comes to taking action based on LLM output, you have exactly the same problem whether you’re checking at the tool-call level or at the next stage in the pipeline: either you make some heuristic assessment via code, which might be wrong, or you queue the action up for a human to approve, which will be slow.&lt;/p&gt;
&lt;p&gt;Don’t agents open you up to prompt injection? Yes, but pipelines do too. In both cases, you’re feeding some block of human-generated data (e.g. the files in a codebase, or the results of a web search) into the LLM. Any prompt injections in that data will be consumed by the LLM just the same whether they’re the result of a tool-call or directly injected into the prompt by the pipeline. You have to sanitize user content and double-check LLM-triggered actions, no matter what design you choose&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;I do want to acknowledge that &lt;strong&gt;pipelines are slightly more &lt;em&gt;legible&lt;/em&gt;&lt;/strong&gt;. You can trace most of what a pipeline is doing because you’re in control over more of it. It’s harder to figure out why an agent queried for a particular piece of information or took some action. But even in a pipeline, you’ll never know for sure why the LLM responded in the way it did. That’s just what it means to program with LLMs.&lt;/p&gt;
&lt;h3&gt;LLM-driven mass surveillance&lt;/h3&gt;
&lt;p&gt;Let’s apply some of these principles to a real-world, non-trivial example. Suppose you are the NSA, and you are attempting to use LLMs to get a grip on the wild firehose&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; of covert email surveillance data&lt;sup id=&quot;fnref-6&quot;&gt;&lt;a href=&quot;#fn-6&quot; class=&quot;footnote-ref&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;. Should you use pipelines or agents? Well, if you’re building something that’s supposed to run on every single piece of email in America, you probably shouldn’t use agents: keeping performance and cost strictly bounded requires a pipeline. However, you’re definitely well-resourced enough to use agents &lt;em&gt;in general&lt;/em&gt;, and the problem is definitely hard enough to benefit from the extra intelligence. I’d probably recommend using both: a low-context, cheap pipeline that can run once against each email and flag it, and a fleet of agents that can dig into those flags, make ordinary queries, and act more like human analysts would.&lt;/p&gt;
&lt;p&gt;The pipeline would have to scale with the total volume of data, which should be &lt;em&gt;mostly&lt;/em&gt; fine, since pipelines scale in a predictable-ish manner. The fleet of unpredictable agents can be scaled entirely independently, though in practice it would get bottlenecked on GPU availability and the necessity for human review. The majority of the engineering work&lt;sup id=&quot;fnref-7&quot;&gt;&lt;a href=&quot;#fn-7&quot; class=&quot;footnote-ref&quot;&gt;7&lt;/a&gt;&lt;/sup&gt; would likely go into context-assembly for the pipeline: feeding in enough data about who’s involved in the email conversation so that the LLM can make a sensible decision on whether or not to flag it.&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;Overall, I’d suggest following these guidelines:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Use pipelines when you have strict requirements around context size&lt;/li&gt;
&lt;li&gt;Use pipelines when you need to be able to accurately predict (or limit) GPU cost&lt;/li&gt;
&lt;li&gt;Use pipelines when you have to use local models&lt;/li&gt;
&lt;li&gt;Use agents when you’re not confident you’ll be able to assemble all of the relevant context in one shot&lt;/li&gt;
&lt;li&gt;Use agents when the problem is hard enough that you’re not sure a pipeline will be able to solve it&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;When in doubt, use agents.&lt;/strong&gt; I am aware of several AI projects that have migrated from pipelines to agents in the last year, but none that have gone the other way around. As a general point about software design, if you’re not sure what to do, pick the solution that’s easier to build and more likely to be able to solve your actual problem. If you want to change to a cheaper, pipeline-based system later on, at least you’ll be able to compare it to a working agentic design and make an informed decision.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;This distinction was popularized by Anthropic’s &lt;a href=&quot;https://www.anthropic.com/engineering/building-effective-agents&quot;&gt;&lt;em&gt;Building effective agents&lt;/em&gt;&lt;/a&gt;, written in December 2024, and now (I believe) made at least partially obsolete by advances in agents since then. They say “workflow”, but I slightly prefer the term “pipeline”.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Yes, I know this is technically not what “latency” means, but there’s no other single-word shorthand for “the duration of a standard unit of work”.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;If you’re building your own coding agent, I suggest you begin with the letter “C”.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;For instance, in my trivial example at the top of the post, doesn’t the agent have a failure mode where it might send a ton of emails, or email a bunch of different people? No, because you ought to constrain the email tool so that it can only send to the right address, and (if this is important) that it can only be called once.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;In a &lt;a href=&quot;https://github.com/sgoedecke/gatsby-blog/blob/5b6205fbe191a591bbcf61d094a6edbcfbd6475d/content/drafts/_icebox/ai-mass-surveillance/index.md&quot;&gt;draft post&lt;/a&gt; I never published, I ballpark-estimated all non-spam American email data at around seven trillion tokens per day (around a third of OpenAI’s total daily token usage).&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;Should you do this? Probably not, but it’s a fascinating engineering problem, and I imagine the NSA has been thinking about these questions for several years by now. If the example bothers you, substitute some other more-ethical firehose of English language.&lt;/p&gt;
&lt;a href=&quot;#fnref-6&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-7&quot;&gt;
&lt;p&gt;Not counting evals, operations, standing up a trusted GPU cluster somewhere, scaling the physical hardware, and all the other thousand things you have to do in order to ship anything.&lt;/p&gt;
&lt;a href=&quot;#fnref-7&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[The famous o3 "GeoGuessr" prompt did not work]]></title><link>https://seangoedecke.com/the-o3-geoguessr-prompt-did-not-work/</link><guid isPermaLink="false">https://seangoedecke.com/the-o3-geoguessr-prompt-did-not-work/</guid><pubDate>Thu, 21 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In April last year, Kelsey Piper &lt;a href=&quot;https://x.com/KelseyTuoc/status/1917340813715202540&quot;&gt;discovered&lt;/a&gt; that OpenAI’s o3 model was surprisingly good at figuring out where a photo was taken from. Like human “geoguessr” &lt;a href=&quot;https://www.youtube.com/@georainbolt&quot;&gt;pros&lt;/a&gt;, o3 could sometimes take a nondescript photo of a beach and tell you exactly where it is. Here’s the example Kelsey gave:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/4113a246112c8b6db424a58af58a9a90/4d836/kelsey-geoguessr.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 130.40540540540542%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAaABQDASIAAhEBAxEB/8QAGQAAAwADAAAAAAAAAAAAAAAAAAIEAQMF/8QAFgEBAQEAAAAAAAAAAAAAAAAAAAID/9oADAMBAAIQAxAAAAHdTzKdIsIQiExlbCB//8QAGRAAAwEBAQAAAAAAAAAAAAAAAQIRABAg/9oACAEBAAEFAleZTWJGDrQ95dfP/8QAFBEBAAAAAAAAAAAAAAAAAAAAIP/aAAgBAwEBPwEf/8QAFBEBAAAAAAAAAAAAAAAAAAAAIP/aAAgBAgEBPwEf/8QAGBAAAgMAAAAAAAAAAAAAAAAAABARMFH/2gAIAQEABj8CJW0//8QAHBABAAIDAAMAAAAAAAAAAAAAAQARECFRMWFx/9oACAEBAAE/IWRyE1Ja8k2FsKNWHqL9wv1luDH/2gAMAwEAAgADAAAAECAhz//EABQRAQAAAAAAAAAAAAAAAAAAACD/2gAIAQMBAT8QH//EABURAQEAAAAAAAAAAAAAAAAAABEg/9oACAECAQE/EGP/xAAbEAEAAwEBAQEAAAAAAAAAAAABABExIWGRUf/aAAgBAQABPxBwXjRnYAMagGIP2U1l6UQmIDkWufUER60Xgxu1i9mWK/s//9k=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;geo&quot;
        title=&quot;geo&quot;
        src=&quot;/static/4113a246112c8b6db424a58af58a9a90/1c72d/kelsey-geoguessr.jpg&quot;
        srcset=&quot;/static/4113a246112c8b6db424a58af58a9a90/a80bd/kelsey-geoguessr.jpg 148w,
/static/4113a246112c8b6db424a58af58a9a90/1c91a/kelsey-geoguessr.jpg 295w,
/static/4113a246112c8b6db424a58af58a9a90/1c72d/kelsey-geoguessr.jpg 590w,
/static/4113a246112c8b6db424a58af58a9a90/a8a14/kelsey-geoguessr.jpg 885w,
/static/4113a246112c8b6db424a58af58a9a90/4d836/kelsey-geoguessr.jpg 920w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Several people &lt;a href=&quot;https://www.astralcodexten.com/p/testing-ais-geoguessr-genius&quot;&gt;reproduced this&lt;/a&gt; with good results: not a 100% success rate, but clearly &lt;em&gt;far&lt;/em&gt; better than you’d do with a random human guess. The lesson here is that &lt;strong&gt;model capabilities can surprise us&lt;/strong&gt;. The o3 model had been released for two weeks before Kelsey’s tweet without anyone noticing how good it was at geolocation. What obscure capabilities did we never find? What capabilities of current models are we missing today?&lt;/p&gt;
&lt;p&gt;Some people drew &lt;a href=&quot;https://newsletter.angularventures.com/p/ai-s-geoguessr-genius-and-the-art-of-prompting-well&quot;&gt;another&lt;/a&gt; &lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1kep2bp/comment/mqlvv1a/&quot;&gt;lesson&lt;/a&gt; from this: that “prompt engineering” can unlock brand-new capabilities. This is because Kelsey had a &lt;a href=&quot;https://raw.githubusercontent.com/sgoedecke/ai_geolocation/refs/heads/main/prompts/geoguessr_protocol.txt&quot;&gt;magic prompt&lt;/a&gt; that she built over time. When o3 got something wrong, she would ask it how it could have avoided the mistake, and then included that in the prompt. Here’s the first 10% of that prompt, so you get the idea:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You are playing a one-round game of GeoGuessr. Your task: from a single still image, infer the most likely real-world location. Note that unlike in the GeoGuessr game, there is no guarantee that these images are taken somewhere Google’s Streetview car can reach: they are user submissions to test your image-finding savvy. Private land, someone’s backyard, or an offroad adventure are all real possibilities (though many images are findable on streetview). Be aware of your own strengths and weaknesses: following this protocol, you usually nail the continent and country…&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This prompt impressed a lot of people, who &lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1kep2bp/comment/mqo3yzz/&quot;&gt;tried&lt;/a&gt; &lt;a href=&quot;https://www.thealgorithmicbridge.com/p/upload-a-picture-to-chatgpt-itll&quot;&gt;it&lt;/a&gt; &lt;a href=&quot;https://www.astralcodexten.com/p/testing-ais-geoguessr-genius&quot;&gt;out&lt;/a&gt; and reported that it correctly identified a lot of images. But of course, o3 correctly identified a lot of images with just a basic “think carefully about where this picture was taken?” prompt. Did the prompt actually help? It’d be tough to figure that out just from playing around in ChatGPT. You’d need to build an evaluation set of images and run o3 against them twice: once with the fancy prompt and once without it.&lt;/p&gt;
&lt;p&gt;So &lt;a href=&quot;https://github.com/sgoedecke/ai_geolocation/tree/main&quot;&gt;that’s what I did&lt;/a&gt;. I pulled 200 images from Wikimedia Commons, Geograph Britain and Ireland, and iNaturalist for the benchmark. You can read the AI-generated summary &lt;a href=&quot;https://github.com/sgoedecke/ai_geolocation/blob/main/results/dataset_mixed_200_o3_high_report.md&quot;&gt;here&lt;/a&gt;, but here’s the key table:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prompt&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;n&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;Median km&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;Mean km&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;P25 km&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;P75 km&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;&amp;#x3C;=25 km&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;&amp;#x3C;=100 km&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;&amp;#x3C;=500 km&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;&amp;#x3C;=1000 km&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Default&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;200&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;83.2&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;440.7&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;16.4&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;221.9&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;58&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;109&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;176&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;182&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GeoGuessr prompt&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;200&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;102.3&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;481.9&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;18.5&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;277.8&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;59&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;99&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;172&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;180&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In general, the basic prompt did better on average. It consistently guessed closer to the actual location. Both prompts did pretty well, actually. Despite the fancy prompt being 10x larger, it only caused o3 to think for slightly longer (about one second on average, though the max was about double, at 10 minutes instead of 5 minutes). The images in my benchmark were fairly generic geoguessr-style outdoor images, with twelve indoor images thrown in for an extra challenge (the fancy prompt also did slightly worse on these).&lt;/p&gt;
&lt;p&gt;What’s going on? I think this shows &lt;strong&gt;how easy it is to fool yourself about the quality of prompting&lt;/strong&gt;. When the model is already pretty good at a task, you can give it a very elaborate prompt without impacting performance. It’ll still be pretty good, except this time it’s good &lt;em&gt;because of what you did&lt;/em&gt;. This is particularly true if you’re iterating with the model and asking it “what should I add to the prompt” for each mistake. Models will happily make up stories for you about their own reasoning processes, and will almost always say “yes, that helped a lot!” when you ask them if a particular prompt tweak made things better. The only way to actually know is by constructing some kind of benchmark&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;It’s also interesting to me that nobody checked this at the time. It took me about six hours of fairly-distracted work and about $15 to construct and run this benchmark. Why didn’t anyone do this when they were writing articles about how good the o3 prompt was?&lt;/p&gt;
&lt;p&gt;One charitable reason might be that the story was more about o3’s real geolocation ability than about the magic prompt. The pricing for o3 also used to be about five times more expensive (though a benchmark of 40 images instead of 200 would still have thrown doubt on how much water the prompt was carrying). Also, AI just moves so &lt;em&gt;fast&lt;/em&gt;. Geolocation was only the story for about a week: after that, GPT-4o’s &lt;a href=&quot;/ai-sycophancy&quot;&gt;sycophancy&lt;/a&gt; was what people were talking about. Another reason is that AI tooling wasn’t as good then. The benchmark was so easy for me to run because GPT-5.5 did most of the heavy lifting. Prior to strong agents, you would have had to write the (simple) benchmark yourself. I can’t point the finger too hard: I didn’t bother at the time either.&lt;/p&gt;
&lt;p&gt;Maybe my benchmark isn’t very good? The photos look reasonable enough: a wide variety of geoguessr-like shots of roads and landscapes, mostly. I could have tried to gather a few thousand photos instead of a few hundred, but if the magic prompt really was a big improvement you’d still expect to see that manifest on a benchmark this size. If someone wants to go and build a hundred-dollar geolocation benchmark instead of my fifteen-dollar one, I think that’d be an interesting project.&lt;/p&gt;
&lt;p&gt;Finally, let’s use the benchmark to answer a question I’ve had for a while: do gpt-5.4 and gpt-5.5 have o3’s geolocation abilities? The answer, apparently, is no.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Run&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;Median km&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;Mean km&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;&amp;#x3C;=25 km&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;&amp;#x3C;=100 km&lt;/th&gt;
&lt;th align=&quot;right&quot;&gt;&amp;#x3C;=500 km&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;o3 default&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;83.2&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;440.7&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;58&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;109&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;176&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;o3 GeoGuessr&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;102.3&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;481.9&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;&lt;strong&gt;59&lt;/strong&gt;&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;99&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;172&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.4 default&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;163.3&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;638.9&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;26&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;74&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;148&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.5 default&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;156.5&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;645.9&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;39&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;77&lt;/td&gt;
&lt;td align=&quot;right&quot;&gt;161&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Whatever o3 had that made it good at this task hasn’t transferred to newer models. &lt;/p&gt;
&lt;p&gt;edit: This post got some comments on &lt;a href=&quot;https://news.ycombinator.com/item?id=48219682&quot;&gt;Hacker News&lt;/a&gt;. The top &lt;a href=&quot;https://news.ycombinator.com/item?id=48220126&quot;&gt;comment&lt;/a&gt; worried that the models already knew the images, since they’re public domain. I thought about this but didn’t think it was worth sourcing brand new images: first, if the image/location pairs were in the training data, the models would have done better; second, even if they’re in the training data it still gives us useful comparison data from the prompt and for other models. I did confirm the images didn’t have EXIF metadata, so we’re not testing whether the prompt makes the model more or less likely to cheat.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Benchmarks can mislead as well, but they’re better than just vibes.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Prompts are technical debt too]]></title><link>https://seangoedecke.com/prompts-are-technical-debt-too/</link><guid isPermaLink="false">https://seangoedecke.com/prompts-are-technical-debt-too/</guid><pubDate>Wed, 20 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It’s &lt;a href=&quot;https://www.tokyodev.com/articles/all-code-is-technical-debt&quot;&gt;common&lt;/a&gt; and correct to say that “all code is technical debt”. Adding code is a necessary evil for developing new features: you almost always have to do it, but each line of code adds to the complexity and maintenance burden of the system. All future changes to the system have to work with the existing code, or at least avoid breaking it. Once systems accumulate enough code, they become impossible for a single person to understand: instead of reading the code and understanding what it does, you must rely on guesses, theories and heuristics&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. Sensible engineers write as little code as possible.&lt;/p&gt;
&lt;p&gt;They write a lot of prompts, though! Many large projects now have a set of codebase-specific prompt files: AGENTS.md, CLAUDE.md, those same files in sub-directories, and &lt;a href=&quot;https://github.com/anthropics/skills&quot;&gt;skills&lt;/a&gt;. If you’re building a program that uses AI&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, you’ll have separate prompts for &lt;a href=&quot;https://github.com/anomalyco/opencode/tree/dev/packages/opencode/src/agent/prompt&quot;&gt;capabilities&lt;/a&gt; and for each &lt;a href=&quot;https://github.com/anomalyco/opencode/blob/dev/packages/opencode/src/tool/lsp.txt&quot;&gt;tool&lt;/a&gt;, as well as a whole set of &lt;a href=&quot;https://github.com/anomalyco/opencode/tree/dev/packages/opencode/src/session/prompt&quot;&gt;system prompts&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Prompts are important. Minor tweaks to a LLM’s prompt can unlock &lt;em&gt;significant&lt;/em&gt; performance improvements. If the same model feels different across Codex, Cursor, OpenCode, and Copilot, it’s almost certainly due to subtle differences in prompting. AI companies spend a lot of time testing and tweaking their prompts, so it makes sense why engineers would spend a lot of time tweaking their AGENTS.md files&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; for their projects. I’d even call switching tools or workflows to be a form of prompting. If I start wrapping my agents in a &lt;a href=&quot;https://github.com/anomalyco/opencode/tree/dev/packages/opencode/src/session/prompt&quot;&gt;Ralph loop&lt;/a&gt;, pull in a new skill file, or install an &lt;a href=&quot;https://www.seangoedecke.com/model-context-protocol/&quot;&gt;MCP&lt;/a&gt; server, that’s still a change to my prompts even though I’m not the one who wrote it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I think it is a bad idea to spend a ton of time tweaking a bespoke agentic coding setup.&lt;/strong&gt; Why is that, given that prompt adjustments can deliver a lot of value? Because prompt adjustments are &lt;em&gt;model-specific&lt;/em&gt;. Earlier I said that AI companies spend a lot of time tweaking their prompts. In fact, they spend that amount of time for each new model release. A prompt that worked great for GPT-5.4 won’t necessarily work as well for GPT-5.5. You have to “learn how to hold the model” each time. &lt;/p&gt;
&lt;p&gt;In other words, a set of prompts that you carefully crafted in January this year might be out of date or actively harmful by February. Worse still, you might not even notice. Model capabilities are already so hard to pin down (unless you’re running every problem through different models and tools), and even weak AI systems are surprisingly good at some problems. You might just think “huh, the new Anthropic model isn’t as impressive as the hype”, or “wow, Claude Code has gotten worse recently”.&lt;/p&gt;
&lt;p&gt;In this sense, &lt;strong&gt;prompts are a worse form of technical debt than code&lt;/strong&gt;. When technical debt blows up, it usually causes errors or a tangible slowdown as you try to understand the code. Prompts will decay silently. Also, even janky code tends to be relatively stable when untouched, but every single model upgrade could turn a functional prompt into a non-functional one.&lt;/p&gt;
&lt;p&gt;Could you simply decide not to upgrade models? Some people are trying this, but the pace of improvement is fast enough that that isn’t really practical. A delicately-prompted agentic harness built around GPT-4.1 is always going to underperform a bare-bones harness built around Opus 4.7. This might be a sensible strategy at some point in the future, when the rate of model improvement slows down (or when models are so capable that you don’t need the extra intelligence for normal engineering tasks), but I don’t believe it’s a good strategy today.&lt;/p&gt;
&lt;p&gt;In my view, most people should just be picking an AI coding tool maintained by a third-party company (Claude Code, Codex, Cursor, Copilot, etc) and leaving it as unconfigured as possible, so they can piggyback on the work of teams of engineers who are evaluating and tweaking prompts with each new model. Avoid MCP and skills unless absolutely necessary, and keep them off by default. At least this way if one of those teams gets it badly wrong, users will notice eventually and complain about it.&lt;/p&gt;
&lt;p&gt;When you write AGENTS.md files, try to avoid behavior steering (like the now-outdated “think step by step”, “you are a skilled engineer”, or “if you get a task right I will tip you $200”). Keep them limited to specific, concrete facts about the project. Don’t let models fill your AGENTS.md with pages of barely-reviewed text, for the same reason that you wouldn’t let them fill your codebase with pages of barely-reviewed code. Write your prompts yourself, and delete them whenever you get the chance.&lt;/p&gt;
&lt;p&gt;edit: this ended up being the topic of a Theo &lt;a href=&quot;https://www.youtube.com/watch?v=WnBx1Vi7M6w&quot;&gt;video&lt;/a&gt; on YouTube.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Almost every system you might get paid to work on is in this category (if not in the code of the system itself, then in its dependencies and libraries).&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Instead of just using AI to build a program. This distinction was a real pain when I was working on &lt;a href=&quot;https://github.blog/news-insights/product-news/introducing-github-models/&quot;&gt;GitHub Models&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[The just-say-no engineer was a ZIRP phenomenon]]></title><link>https://seangoedecke.com/the-just-say-no-engineer-was-a-zirp-phenomenon/</link><guid isPermaLink="false">https://seangoedecke.com/the-just-say-no-engineer-was-a-zirp-phenomenon/</guid><pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The engineer who &lt;a href=&quot;https://www.nair.sh/guides-and-opinions/communicating-your-expertise/why-senior-developers-fail-to-communicate-their-expertise#a-senior-developer-is-a-problem-avoider&quot;&gt;says no all the time&lt;/a&gt; is a real archetype among senior and staff engineers. Their role is to slow things down, to block the development of features that add complexity, and to ensure that as little code gets written as possible (since code is a liability).&lt;/p&gt;
&lt;p&gt;We can think of this as the just-say-no engineer&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, as opposed to the just-say-yes engineer. The just-say-yes engineer is obsessed with moving fast, approves code changes by default, values &lt;a href=&quot;https://en.wikipedia.org/wiki/Mean_time_to_repair&quot;&gt;MTTR&lt;/a&gt; over &lt;a href=&quot;https://en.wikipedia.org/wiki/Mean_time_between_failures&quot;&gt;MTBF&lt;/a&gt;, and tends to ship a lot of code. The just-say-no engineer is obsessed with quality, is happy to move slowly, and blocks code changes by default. Most engineers are somewhere in the middle of the spectrum. By “just-say-no engineer”, I’m talking about the group of engineers who most strongly identify with that archetype.&lt;/p&gt;
&lt;p&gt;The just-say-no engineer is having a hard time in the era of AI. It used to be that they only had to say no to more junior engineers’ handwritten PRs, but now they have to say no to a barrage of AI-generated code, some of it generated by managers and VPs who are politically difficult to say no to. For the first time in their careers, they’re under a lot of pressure to lower their standards and start saying yes. However, &lt;strong&gt;this isn’t because of AI.&lt;/strong&gt; It’s because of the end of ZIRP.&lt;/p&gt;
&lt;h3&gt;ZIRP and the just-say-no engineer&lt;/h3&gt;
&lt;p&gt;ZIRP, or the “zero interest rate policy”, is a shorthand for the era of software development between 2008 and 2022 when banks were allowing companies to borrow money at near-zero interest rates. During this period, investors were throwing borrowed money at &lt;em&gt;anything&lt;/em&gt;, which meant that tech companies were incentivized to constantly hire engineers for low-risk high-reward projects&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. Successful companies would routinely grow from tens of engineers to thousands, who would go and work on all kinds of things: tangential open-source projects, endless technology migrations, rewrites into other languages, and so on.&lt;/p&gt;
&lt;p&gt;It was a great time to be a software engineer. We had a lot of bargaining power, and could get paid top dollar to do almost anything. The bosses largely didn’t care, because (a) teams were growing so fast they couldn’t pay attention, and (b) just having more engineers around was beneficial to the stock price, which was the main thing they cared about. But tech companies did have one problem: with so many engineers running wild, how would they keep their systems from becoming completely unmanageable? &lt;strong&gt;Enter the just-say-no engineer.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In this environment, having a very senior engineer whose only job is to say no to things was actually quite valuable to the company. There are a few reasons for this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Having half of the company’s engineers enmeshed in an endless loop of proposing changes and being told no was totally fine - they didn’t need to be productive anyway, and this way they weren’t impacting business-critical systems.&lt;/li&gt;
&lt;li&gt;It also solved the problem of the 5% of engineers who would get drunk on their technical freedom and make wild proposals like migrating to a hand-rolled database. &lt;/li&gt;
&lt;li&gt;Having a reputation for a very high technical bar is a positive for hiring (and remember, during ZIRP every tech company was always hiring)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;The end of ZIRP&lt;/h3&gt;
&lt;p&gt;When banks hiked interest rates, almost every tech company immediately laid off 5-20% of their engineers. It was just no longer profitable to keep a bloated engineering staff around to boost the stock price. Instead, companies had to actually make money&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. However, that wasn’t a good public explanation for the layoffs, since it sounds weak to admit that you were paying hundreds of engineers to do unprofitable work. Fortunately, the end of ZIRP coincided roughly with the rise of ChatGPT, so tech companies were able to to blame their layoffs on the power of AI. Saying “with this transformative new technology, we’re able to deliver 10x the value with half the engineers” is a much stronger message, even though it doesn’t make much sense (if this is true, why not keep your engineers and deliver 20x the value?)&lt;/p&gt;
&lt;p&gt;Something like this dynamic has been happening to the just-say-no engineer. Tech companies are now more focused than at any time in the past two decades. They are not doing a bunch of random crap anymore; instead they’re desperately chasing new capabilities and features that can make money (mostly built on AI, for obvious reasons). This new environment is &lt;em&gt;actively inimical&lt;/em&gt; to the just-say-no engineer. It’s as if a shark got pulled out of the deep ocean and dropped into a fast-flowing river: what was once a powerful apex predator is now disoriented and flailing.&lt;/p&gt;
&lt;p&gt;This kind of engineer used to enjoy implicit (albeit distant) support from their management. If someone complained, they’d often get told “that engineer knows what they’re doing, if they said no, then I trust them”. Now that support is gone. The just-say-no engineer is now being criticized and actively overruled by their management. They’re being told to be more of a team player, to find a way to say yes, or are simply no longer being consulted (with the company’s blessing) on key decisions. They’re getting bad reviews for the exact same behavior that’s been rewarded pre-2022&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;None of this depends upon AI.&lt;/strong&gt; If LLMs had not taken off this decade, we would still be seeing the same cultural shifts in the industry. Companies would still be laying off engineers, and the engineers whose job has been to say no to things would still be upset and confused about why they’re now being punished for saying no.&lt;/p&gt;
&lt;h3&gt;AI&lt;/h3&gt;
&lt;p&gt;Ironically, if ZIRP had not ended, this would be a glorious moment for the just-say-no engineers. LLMs would have thrown fuel on the “engineers running wild” problem that the just-say-no engineers were empowered to solve. Tech companies, unable to publicly or privately cast doubt on AI-assisted coding&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;, would have relied &lt;em&gt;heavily&lt;/em&gt; on these engineers to prevent the tsunami of AI code from swamping the entire company. They would have been paid even better and celebrated like kings.&lt;/p&gt;
&lt;p&gt;Instead, LLMs are adding insult to injury for the just-say-no engineer. They’re forced to watch while other engineers merge AI-generated PRs that would previously have been blocked, and are told to use the tools themselves: to become the kind of engineer they’ve spent their entire careers battling against.&lt;/p&gt;
&lt;p&gt;Worse still, the AI tooling mostly &lt;em&gt;works&lt;/em&gt;. It’s not (yet) causing any kind of catastrophe&lt;sup id=&quot;fnref-6&quot;&gt;&lt;a href=&quot;#fn-6&quot; class=&quot;footnote-ref&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;. The code isn’t quite as clean, and it’s a bit less well-understood, but it’s good enough (particularly in a world where companies are trying lots of new things and abandoning the ones that fail). So the just-say-no engineer faces not just a threat to their livelihood, but to their entire self-identity: they have to either insist that the apocalypse is right around the corner, or accept that their technical role was contingent on a &lt;em&gt;really weird&lt;/em&gt; economic environment in the tech industry.&lt;/p&gt;
&lt;h3&gt;Pure and impure engineering&lt;/h3&gt;
&lt;p&gt;Will the just-say-no engineer go extinct? No. They don’t fit well into every single tech company anymore, but there are domains where they’re needed. In &lt;a href=&quot;/pure-and-impure-engineering/&quot;&gt;&lt;em&gt;Pure and impure software engineering&lt;/em&gt;&lt;/a&gt; I drew a distinction between “pure” engineering, which has a well-scoped, largely technical goal (like building a compiler or a language runtime) and “impure engineering”, which has a poorly-scoped, largely customer-driven goal (like trying out a new feature you’re not sure will work). During the ZIRP era, tech companies did a lot more pure work (for instance, building &lt;a href=&quot;https://en.wikipedia.org/wiki/React_(software)&quot;&gt;React&lt;/a&gt;), and tended to treat even impure work like pure work. The just-say-no engineer is &lt;em&gt;great&lt;/em&gt; for pure work, because pure codebases have to have a much higher bar for quality and can tolerate slower development cycles.&lt;/p&gt;
&lt;p&gt;Most tech companies are still doing some kind of pure work, typically in their core infrastructure pieces. This is essential work, but it doesn’t require a huge engineering team, and it’s rarely in &lt;a href=&quot;https://www.seangoedecke.com/the-spotlight/&quot;&gt;the spotlight&lt;/a&gt;. If you’re a just-say-no engineer and you want to stay that way, I would recommend trying to move into one of these roles (and accepting that you’ll have a more limited scope than you did in the 2010s).&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Some senior and staff engineers operate as gatekeepers, slowing down development and saying no to most things&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;This was a critical role during ZIRP, because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tech companies had thousands of engineers who were empowered to do basically whatever they wanted, so without gatekeeping the systems would have fallen apart&lt;/li&gt;
&lt;li&gt;Tech companies didn’t care that much if they got anything done&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;When ZIRP ended, the environment for this kind of engineer became much worse, since tech companies were now actually focused on accomplishing things and the “do whatever you want” era was over&lt;/li&gt;
&lt;li&gt;Like with layoffs, this shift is often blamed on AI, but it would have happened even if powerful LLMs had not emerged at all. It’s an end-of-ZIRP phenomenon&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;edit: this post got some comments on &lt;a href=&quot;https://lobste.rs/s/i2szle/just_say_no_engineer_was_zirp_phenomenon&quot;&gt;lobste.rs&lt;/a&gt; and &lt;a href=&quot;https://www.reddit.com/r/programming/comments/1thf964/the_justsayno_engineer_was_a_zirp_phenomenon/&quot;&gt;Reddit&lt;/a&gt;, including one of the &lt;a href=&quot;https://lobste.rs/c/f3g1tn&quot;&gt;cruelest&lt;/a&gt; comments I’ve ever read about my blog. A more concrete criticism was &lt;a href=&quot;https://lobste.rs/c/yoouec&quot;&gt;about&lt;/a&gt; &lt;a href=&quot;https://www.reddit.com/r/programming/comments/1thf964/comment/omnw6an/&quot;&gt;my&lt;/a&gt; off-hand remark that the models work: commenters felt it was too early to say, because the impact of bad code takes a while to manifest. Fair enough. “It’s too early to say” is never really &lt;em&gt;wrong&lt;/em&gt;, though I think it’s clear that AI code is not immediately fatal. &lt;a href=&quot;https://www.reddit.com/r/programming/comments/1thf964/comment/omn4990/&quot;&gt;Other&lt;/a&gt; &lt;a href=&quot;https://lobste.rs/c/eluuto&quot;&gt;commenters&lt;/a&gt; argued that the just-say-no archetype existed for decades prior to ZIRP (e.g. Linus Torvalds). I agree with that, but I think the niche for this kind of engineer was artificially expanded by ZIRP, and now has contracted again. Finally, an &lt;a href=&quot;https://lobste.rs/c/bromgo&quot;&gt;interesting comment&lt;/a&gt; claiming that the just-say-no engineer had a niche because (a) people were using dynamic languages, and (b) observability/feature-flag-etc tooling was not yet mature.&lt;/p&gt;
&lt;p&gt;I also want to share this quote from a reader, via email:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;…in a strange way your posts give me comfort that I’m not alone in some strange bubble where all of a sudden I’m the only one that’s somehow always wrong. I’m at somewhat of a crossroads as I either need to lower my standards and become the always say yes engineer to gain favour with managers again (which is in conflict with who I am) or move on and potentially risk landing at another company with the exact same setup. &lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;edit: Another round of comments from &lt;a href=&quot;https://news.ycombinator.com/item?id=48289439&quot;&gt;Hacker News&lt;/a&gt;, &lt;a href=&quot;https://news.ycombinator.com/item?id=48289749&quot;&gt;with&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=48289785&quot;&gt;several&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=48290371&quot;&gt;commenters&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=48289668&quot;&gt;wishing&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=48289953&quot;&gt;I’d&lt;/a&gt; provided more hard evidence for my theory. Unfortunately it doesn’t work like that - I’m writing from my own experience, which is just my tiny window into what the industry was like pre-and-post-ZIRP. Your mileage may (and often &lt;a href=&quot;https://news.ycombinator.com/item?id=48290017&quot;&gt;does&lt;/a&gt;) vary. It’s an interesting question how you might go about testing something like this. Maybe survey a few hundred senior+ engineers in 2010 and 2026, asking how many times a week they said “no” to something, and whether that “no” was overruled?&lt;/p&gt;
&lt;p&gt;I do want to address comments like &lt;a href=&quot;https://news.ycombinator.com/item?id=48290126&quot;&gt;this&lt;/a&gt; and &lt;a href=&quot;https://news.ycombinator.com/item?id=48290499&quot;&gt;this&lt;/a&gt;, which argue that saying no was essential both before and after ZIRP. Yes, but the difference (in my view) is: pre-ZIRP, &lt;em&gt;management&lt;/em&gt; did not like saying no to engineers, but post-ZIRP they rapidly built that muscle, and so now no longer need a group of engineers saying no for them.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Part of the appeal here is the lure of the guru. In kung fu films, those who know martial arts perform furious acrobatics, but the true expert barely needs to move at all. For the same reasons, it sounds profound to say something like “junior engineers produce tons of code, seniors very little, and staff engineers &lt;em&gt;remove&lt;/em&gt; code”. Of course this is false. Staff engineers are expected to be able to produce a lot of working code very quickly, when they need to.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I wrote about this a lot more in &lt;a href=&quot;/good-times-are-over/&quot;&gt;&lt;em&gt;The good times in tech are over&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Not necessarily make a &lt;em&gt;profit&lt;/em&gt;, but at least bring in revenue.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Or pre-2023, or even pre-2024 or 2025. Cultural change lags behind economic incentives, sometimes by several years.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;For fear of killing the vibe (and thus the stock price).&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;If you think there have been more incidents recently, consider that (a) you might be &lt;a href=&quot;https://news.ycombinator.com/item?id=48086786&quot;&gt;wrong&lt;/a&gt;, or (b) that other end-of-ZIRP factors (like increased velocity or layoffs) might be primarily responsible.&lt;/p&gt;
&lt;a href=&quot;#fnref-6&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[How I use LLMs as a staff engineer in 2026]]></title><link>https://seangoedecke.com/how-i-use-llms-in-2026/</link><guid isPermaLink="false">https://seangoedecke.com/how-i-use-llms-in-2026/</guid><pubDate>Sun, 17 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A bit over a year ago I wrote &lt;a href=&quot;https://www.seangoedecke.com/how-i-use-llms/&quot;&gt;&lt;em&gt;How I use LLMs as a staff engineer&lt;/em&gt;&lt;/a&gt;. Here’s a brief summary of what I used AI for last year:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Smart autocomplete with Copilot&lt;/li&gt;
&lt;li&gt;Short tactical changes in areas I don’t know well (always reviewed by a SME)&lt;/li&gt;
&lt;li&gt;Writing lots of use-once-and-throwaway research code&lt;/li&gt;
&lt;li&gt;Asking lots of questions to learn about new topics (e.g. the Unity game engine)&lt;/li&gt;
&lt;li&gt;Last-resort bugfixes, just in case it can figure it out immediately&lt;/li&gt;
&lt;li&gt;Big-picture proofreading for long-form English communication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here are some tasks I explicitly &lt;em&gt;didn’t&lt;/em&gt; use AI for last year:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Writing whole PRs for me in areas I’m familiar with&lt;/li&gt;
&lt;li&gt;Writing ADRs or other technical communications&lt;/li&gt;
&lt;li&gt;Research in large codebases and finding out how things are done&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;February 2025 was a long time ago. Back then the best model was the first reasoning model, OpenAI’s o1. Agents &lt;em&gt;sort of&lt;/em&gt; worked, but would often get stuck or thrown off by compaction. What’s changed since then?&lt;/p&gt;
&lt;h3&gt;Agents are good now&lt;/h3&gt;
&lt;p&gt;The biggest change is that &lt;strong&gt;I now use LLMs to produce entire PRs in areas I’m familiar with&lt;/strong&gt;. A year ago I would very occasionally ask an agent to make changes to a single file if it was a simple change I couldn’t be bothered typing out. Sometimes I would copy a function I wrote into a LLM chat window for feedback. But now I start every single change by asking an agent to solve the problem, and usually push the PR after a single editing pass.&lt;/p&gt;
&lt;p&gt;In late 2025 I used a lot of open VSCode windows. In early 2026, that changed to terminal tabs with the Copilot CLI, particularly when I needed to make changes across multiple repos at the same time. Now I use the &lt;a href=&quot;https://github.blog/changelog/2026-05-14-github-copilot-app-is-now-available-in-technical-preview/&quot;&gt;GitHub Copilot app&lt;/a&gt; a &lt;em&gt;lot&lt;/em&gt; (tens of sessions per day). &lt;/p&gt;
&lt;p&gt;This reflects a shift from having to line-edit the agent basically as it went to only doing an editing pass right at the end. Early agents would go wrong a lot and not be able to recover, so it was valuable to keep an eye on their thought processes and step in to pause them and set them right. In my experience, current agents move too fast to do this, and recover their own mistakes most of the time anyway.&lt;/p&gt;
&lt;p&gt;Sometimes I don’t even need to make edits and I can just push the change as-is, though this is rare: if nothing else, I typically go through and remove some of the over-commenting and other LLM-isms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I do a &lt;em&gt;lot&lt;/em&gt; of skimming through and evaluating agent changes.&lt;/strong&gt; Most of the time I reject them entirely, just based on “eh, that’s not what I was thinking”. On average it takes me about thirty seconds to make this initial assessment. If the change looks alright after that, I’ll dig in and do a proper review to make sure I understand it and it’s doing the right thing. For difficult tasks, I’ll often reject five or six (or more!) agent attempts before accepting one as good enough to work with, or giving up and making the change by hand.&lt;/p&gt;
&lt;h3&gt;Investigating bugs&lt;/h3&gt;
&lt;p&gt;I rely on LLMs even more for bug-hunting than I do for making changes. In 2025, I used to throw the occasional bug at a LLM, just in case it was able to rapidly come up with an explanation. Now I throw &lt;em&gt;every&lt;/em&gt; bug at a LLM (typically by opening a new agent session and pasting in the bug report), because it’s able to correctly diagnose 80% of issues on its own. Current agents are &lt;em&gt;really good&lt;/em&gt; at chasing down bugs, particularly when you give them a vantage point across multiple repositories.&lt;/p&gt;
&lt;p&gt;I’m still better at it. Just last week I had a tricky bug that took about fourteen agent sessions before one finally figured it out. What was I doing in between and around those sessions?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Digging up extra context on the bug (from logs, Slack, etc) and reporting it to the agents&lt;/li&gt;
&lt;li&gt;Building my own mental model of the problem, of course&lt;/li&gt;
&lt;li&gt;Setting up my own reproduction of the bug (in parallel with the agents’ efforts)&lt;/li&gt;
&lt;li&gt;Responding to agent sessions with “no, your theory can’t be right because of X” (or just killing and restarting the session with that extra hint)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ultimately an agent was the one to catch the bug. But I still count it as my find, because by that point I had narrowed the search space tightly enough that agent session #14 had a &lt;em&gt;significantly&lt;/em&gt; easier problem to solve than agent session #1. In other words, &lt;strong&gt;human expertise still matters a lot for investigating bugs&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;Writing&lt;/h3&gt;
&lt;p&gt;I &lt;em&gt;almost always&lt;/em&gt; write my own PR descriptions, since LLMs over-communicate and are bad at expressing the “core idea” behind a change. Writing the PR description by hand also signals to reviewers that I’ve reviewed the change myself, and I’m not asking them to be the first human to read the diff. The only time when I don’t write the PR description is when the change is trivial and the agent-generated description is one sentence. At that point I just leave it alone.&lt;/p&gt;
&lt;p&gt;I still don’t use LLMs to write Slack messages, ADRs, issues and so forth. I believe I have a better sense of what’s important to communicate, and I want to signal that there’s a human being thinking about the content.&lt;/p&gt;
&lt;p&gt;I still never use LLMs to write blog posts, though I do run each draft post through a LLM for feedback. OpenAI models used to be &lt;em&gt;terrible&lt;/em&gt; at this and have only very recently gotten acceptable with GPT-5.5. Both OpenAI and Anthropic models still try to water down my arguments, but I’ve accepted that as part of the LLM “house style” and just ignore that part of the feedback.&lt;/p&gt;
&lt;h3&gt;Testing and setup&lt;/h3&gt;
&lt;p&gt;Another thing I do now is &lt;strong&gt;try and push as much testing and setup work as possible onto the agents&lt;/strong&gt;. In 2025, I used to sometimes ask a LLM to produce a test script of curl commands that I could run against my dev server. In 2026, I just ask an agent to go and test my change, then read the log of what it did.&lt;/p&gt;
&lt;p&gt;I don’t test UI work like this, partly because it’s more fiddly and partly because I don’t trust agents to be sensitive to the subtle look-and-feel aspects of a change.&lt;/p&gt;
&lt;p&gt;Agents will write expansive unit tests without having to be told, but I do sometimes ask them to put together broader integration tests for a change. In general I now consider test code to be cheap: if I’m wondering whether a test would be useful, I just add it (so long as I know it won’t be flaky). Of course LLMs sometimes produce strange and unsatisfying test code - I do read it to catch obvious blunders - but I review it with a more generous eye than my actual production code.&lt;/p&gt;
&lt;p&gt;I’ll also task an agent with annoying local setup tasks that involve config wrangling on my machine. For instance, if my nvm installation is not switching my Node version correctly, I will often open a Copilot CLI agent and ask it to figure it out. This is a more-or-less direct replacement for Googling the problem, and is much quicker since the agent can run the trivial bash commands to diagnose and fix the problem itself.&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;The main thing that’s changed in the last fifteen months is that &lt;strong&gt;agents are really good now&lt;/strong&gt;. They’ve gone from something I used occasionally and suspiciously to something I use constantly and with light supervision.&lt;/p&gt;
&lt;p&gt;The core of my job is still the same: &lt;a href=&quot;/how-to-ship&quot;&gt;shipping projects&lt;/a&gt;, exercising my judgement, &lt;a href=&quot;/how-to-influence-politics/&quot;&gt;influencing tech company politics&lt;/a&gt;. But I now have a much wider net for small pieces of work that I’m willing to take on, which includes basically anything I can hand off to an agent and expect it to get more or less right.&lt;/p&gt;
&lt;p&gt;I used to spend a lot of time putting work off, either by delegating it or just saying “sorry, I don’t have time to do that now”. Now I get to say “yes” a lot more (at least when it comes to minor low-risk tweaks)&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Overall, here’s what I now use AI for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Writing (or drafting, depending on complexity) every code change I make&lt;/li&gt;
&lt;li&gt;Investigating and fixing bugs, either autonomously for most bugs or with my close involvement for trickier ones&lt;/li&gt;
&lt;li&gt;Research in large codebases, since current agents are now good enough to give the right answer almost all the time (and when they’re wrong, it’s clear from reading the explanation that they’ve missed something)&lt;/li&gt;
&lt;li&gt;Manual testing and local-machine setup or troubleshooting&lt;/li&gt;
&lt;li&gt;I still use AI for asking lots of questions to learn about topics, and for proofreading&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here’s what I still don’t use AI for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Writing any kind of public communication for me (PR descriptions, ADRs, messages) with the exception of trivial two-line PRs&lt;/li&gt;
&lt;li&gt;Writing code that I don’t carefully review&lt;/li&gt;
&lt;li&gt;Testing any kind of UI&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In my view, &lt;strong&gt;the current core AI skill is shifting as much work onto AI agents as possible, without going too far&lt;/strong&gt;. Many people are under-utilizing agents: not allowing them to investigate bugs or test their changes, or not throwing enough simple tasks at them. Other people are over-utilizing them: using them to write messages that ought to be hand-written, or trusting them to make sweeping changes that need careful human review. Since my last post, the balance has tilted more towards the agents, but &lt;em&gt;finding&lt;/em&gt; the balance remains as tricky as ever.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;For once I can actually give an example, since it’s in a public repository. Someone internal wanted to be able to use the &lt;a href=&quot;https://github.com/actions/ai-inference&quot;&gt;actions/ai-inference&lt;/a&gt; GitHub Action with Copilot-backed inference (for various reasons), and instead of saying “sorry, I don’t have time to get to it”, I was able to throw it at an agent. If a human had to do this, the output would likely have been better, but it wouldn’t have gotten done for weeks (if at all).&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[DeepSeek-V4-Flash means LLM steering is interesting again]]></title><link>https://seangoedecke.com/steering-vectors/</link><guid isPermaLink="false">https://seangoedecke.com/steering-vectors/</guid><pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Ever since &lt;a href=&quot;https://www.anthropic.com/news/golden-gate-claude&quot;&gt;Golden Gate Claude&lt;/a&gt; I’ve been fascinated with “steering”: the idea that you can guide LLM outputs by directly manipulating the activations of the model mid-flight.&lt;/p&gt;
&lt;h3&gt;DeepSeek V4 Flash&lt;/h3&gt;
&lt;p&gt;I was inspired to write this post by antirez’s recent project &lt;a href=&quot;https://github.com/antirez/ds4/tree/main&quot;&gt;DwarfStar 4&lt;/a&gt;, which is a version of &lt;a href=&quot;https://github.com/ggml-org/llama.cpp&quot;&gt;llama.cpp&lt;/a&gt; that’s been stripped down to run only DeepSeek-V4-Flash. What’s so special about this model? It might be what many engineers have been waiting for: a local model good enough to compete with at least the low end of frontier model agentic coding.&lt;/p&gt;
&lt;p&gt;Since steering requires a local model, it’s now practical for many engineers to try it out for the first time. And indeed, antirez has baked &lt;a href=&quot;https://github.com/antirez/ds4/tree/main/dir-steering&quot;&gt;steering&lt;/a&gt; into DwarfStar 4 as a first-class citizen. Right now it’s very rudimentary (basically just the toy “verbosity” example you can replicate via prompting), but the initial release was only &lt;a href=&quot;https://github.com/antirez/ds4/commit/d997b56c151184bcff469dd8302ed97f23481024&quot;&gt;eight days ago&lt;/a&gt;. I plan to follow this project closely.&lt;/p&gt;
&lt;h3&gt;How steering works&lt;/h3&gt;
&lt;p&gt;The basic idea behind steering is extracting a concept (like “respond tersely”) from the model’s internal brain state, then reaching in during inference and boosting the numerical activations that form that concept.&lt;/p&gt;
&lt;p&gt;One way you might do this is to feed your model the same set of a hundred prompts twice, once with the normal prompts and once with the words “respond tersely” appended. Then measure the difference in the model’s activations&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; for each prompt pair (by subtracting one activation matrix from the other). That’s your “steering vector”. In theory, you can go and add that to the same activation layer for any prompt and get the same effect (of the model responding tersely).&lt;/p&gt;
&lt;p&gt;Another, more sophisticated way you might do this is to train a second model to extract “features” from your model’s activations: patterns of behavior that seem to show up together. Then you can try to map those features back to individual concepts, and boost them in the same way. This is more or less what Anthropic is doing with &lt;a href=&quot;https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html&quot;&gt;sparse autoencoders&lt;/a&gt;&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. It’s the same principle as the naive approach, but it lets you capture deeper patterns (at the cost of being much more expensive in time, compute and expertise).&lt;/p&gt;
&lt;h3&gt;Why steering is interesting&lt;/h3&gt;
&lt;p&gt;Steering sounds like a cheat code. Instead of painstakingly assembling a training set that tries to push the model towards the “smart” end of the distribution in its training data, why not simply go uncover the “smart” dial in the model’s brain and turn it all the way to the right?&lt;/p&gt;
&lt;p&gt;It also seems like a more elegant way to adjust the way models talk. Instead of fiddling with the prompt (adding or removing qualifiers like “you MUST”), couldn’t we just have a control panel of sliders like “succinctness/verbosity” or “conscientiousness/speed” and move them around directly?&lt;/p&gt;
&lt;p&gt;Finally, it’s just &lt;em&gt;cool&lt;/em&gt;. Watching Golden Gate Claude unwillingly &lt;a href=&quot;https://www.anthropic.com/news/golden-gate-claude&quot;&gt;drag&lt;/a&gt; every sentence back to the Golden Gate Bridge is as fascinating and unsettling as Oliver Sacks’ neurological &lt;a href=&quot;https://en.wikipedia.org/wiki/The_Man_Who_Mistook_His_Wife_for_a_Hat&quot;&gt;anecdotes&lt;/a&gt;. What if your own mind was tweaked in a similar way? Would it still be you?&lt;/p&gt;
&lt;h3&gt;Why steering hasn’t been used&lt;/h3&gt;
&lt;p&gt;Why don’t we steer more, then? Why don’t ChatGPT and Claude Code already have a steering panel where you can adjust the model’s brain in real time? One reason is that steering is kind of an unfortunately “middle class” idea in AI research.&lt;/p&gt;
&lt;p&gt;It’s beneath the big AI labs, who can manipulate their models directly without having to do awkward brain surgery mid-inference. Anthropic is working on this stuff, but largely from an interpretability and safety perspective (as far as I know). When they want a model to behave in a certain way, they don’t mess around with steering, they just train the model.&lt;/p&gt;
&lt;p&gt;Steering is also out of reach for regular AI users like you and me&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, who use LLMs via an API and thus don’t have access to the model weights or activations needed to steer the model. Only OpenAI can identify or expose steering vectors for GPT-5.5, for instance. We could do this for open-weights models, but until very recently (more on that later) there haven’t been any open models strong enough to be worth doing this for.&lt;/p&gt;
&lt;p&gt;On top of that, most basic applications of steering are outcompeted by just prompting the model. It sounds pretty impressive to be able to manipulate the model’s brain directly. But you know what else manipulates the model’s brain directly? Prompt tokens. You can exercise fairly fine-grained control over activations with steering, but you can already exercise &lt;em&gt;extremely&lt;/em&gt; fine-grained control by tweaking the language of your prompt. In other words, there’s not much point going to the trouble to steer a model to be more verbose when you could simply &lt;em&gt;ask&lt;/em&gt;.&lt;/p&gt;
&lt;h3&gt;Steering the unpromptable&lt;/h3&gt;
&lt;p&gt;One way for steering to be really useful is if we could identify a concept that can’t be prompted for. What about “intelligence”? You used to be able to prompt for intelligence - this is why 4o-era prompting always began with “you are an expert” - but current-generation models have that baked into their personalities, so prompting for it does nothing. Maybe steering for it would still work?&lt;/p&gt;
&lt;p&gt;Ultimately this is an empirical question, but I’m skeptical that we’ll be able to find an “intelligence” steering vector. Put another way, the steering vector that makes up a concept as difficult as “intelligence” might be almost coextensive with the entire set of weights of the model, and thus identifying it reduces to the problem of “training a smart model”.&lt;/p&gt;
&lt;p&gt;A sufficiently sophisticated steering approach ends up just replacing the actual model. If I take GPT-2, and at each layer I swap out the activations with the activations from a much stronger model with the same architecture, I will get a much better result. But at that point you’re not making GPT-2 more intelligent, you’re just talking to the stronger model instead. The intelligence is in the steering, not in the model. For much more on this, see my post &lt;a href=&quot;/philosophy-and-ai-interpretability/&quot;&gt;&lt;em&gt;AI interpretability has the same problems as philosophy of mind&lt;/em&gt;.&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;Steering as data compression&lt;/h3&gt;
&lt;p&gt;Another way for steering to be useful is if we could somehow steer for a concept that requires a ton of tokens to express. Steering would thus save us a big chunk of the model’s context window. Intuitively, we might think of this as a way to shift a concept from the model’s working memory into its implicit memory.&lt;/p&gt;
&lt;p&gt;For instance, what if we could identify a “knowledge of my particular codebase” concept? When GPT-5.5 speed-reads my codebase, some of that knowledge it gains has to be buried in the activations, right? Maybe we could drag that out into a very large steering vector.&lt;/p&gt;
&lt;p&gt;I would be surprised if this could work. I think we’ll run into the same problem as with extracting “intelligence”: the “knows my codebase” concept is probably sophisticated enough to require a full fine-tune of the model&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. But it at least seems possible.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;I’m fascinated with steering, but I’m not particularly optimistic about it. I think most of the gains can be more efficiently reproduced with prompts, and that the truly ambitious steering goals can be more efficiently reproduced by training or fine-tuning the model.&lt;/p&gt;
&lt;p&gt;However, the open-source community hasn’t done a lot of work on steering yet, and that might be just starting to change now. If I’m wrong and it does have practical applications, we should find that out in the next six months.&lt;/p&gt;
&lt;p&gt;It’ll be interesting to see if bespoke per-model tools like DwarfStar 4 end up including a “library” of boostable features. When a popular open-weights model is released, the community always rushes to release a suite of wrappers and quantized versions. Could we also see a rush to extract boostable features from the model?&lt;/p&gt;
&lt;p&gt;edit: this post got some comments on &lt;a href=&quot;https://news.ycombinator.com/item?id=48160807&quot;&gt;Hacker News&lt;/a&gt;. Several commenters (including antirez himself) &lt;a href=&quot;https://news.ycombinator.com/item?id=48161688&quot;&gt;pointed out&lt;/a&gt; that steering can change some “trained in” behavior in ways that prompting can’t: most notably to remove refusal from the model. Another commenter &lt;a href=&quot;https://news.ycombinator.com/item?id=48161488&quot;&gt;says&lt;/a&gt; that this is how uncensoring/abliteration is already done for open models. I didn’t know that - I thought the uncensored models were typically LoRA fine-tunes. On this point, antirez &lt;a href=&quot;https://news.ycombinator.com/item?id=48161688&quot;&gt;noted&lt;/a&gt; that modifying the weights can damage model capabilities more than the more lightweight runtime-steering approach (which can only be applied when needed). Makes sense to me.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Models have lots of different activations you might measure (after attention, between each layer, etc). You can basically pick any one you want, or try multiple and see what works best.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I recently read a really good &lt;a href=&quot;https://huggingface.co/spaces/dlouapre/eiffel-tower-llama&quot;&gt;deep dive&lt;/a&gt; into doing this with an open LLaMA model (and I &lt;a href=&quot;https://github.com/sgoedecke/skills/blob/main/skills/extract-features-clamp-inference/SKILL.md&quot;&gt;tried it myself&lt;/a&gt; a few months ago, with mixed results.)&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Apologies to my readers from the big AI labs. Please email me if you have tried steering internally to boost capabilities and it hasn’t worked. I promise I won’t tell anyone.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;And even then, the results of “fine tune a model on your codebase” in the industry have largely been unsuccessful.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[AI datacenters in space do not have a cooling problem]]></title><link>https://seangoedecke.com/space-ai-datacenters-do-not-have-a-cooling-problem/</link><guid isPermaLink="false">https://seangoedecke.com/space-ai-datacenters-do-not-have-a-cooling-problem/</guid><pubDate>Wed, 13 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This year Elon Musk has started &lt;a href=&quot;https://www.npr.org/2026/04/03/nx-s1-5718416/ai-data-centers-in-space-spacex-elon-musk&quot;&gt;banging the drum&lt;/a&gt; about building AI datacenters in space. As the only person who owns a successful space company and a (moderately) successful AI company, this is a sensible way to boost his profile and net worth. Is it a sensible way to build datacenters?&lt;/p&gt;
&lt;h3&gt;The cooling problem&lt;/h3&gt;
&lt;p&gt;The first comment underneath most discussions of this always goes along these lines: “you obviously can’t build AI datacenters in space, because heat dissipation is really hard in space, and AI datacenters generate a lot of heat”.&lt;/p&gt;
&lt;p&gt;In general I am distrustful of snappy answers like these. It reminds me of the “AI datacenters obviously don’t use a lot of water, because cooling fluid circulates in a closed-loop system” argument: if it were true, there wouldn’t be a debate at all, just one side who understand the obvious point and another side who are stupid.&lt;/p&gt;
&lt;p&gt;Some arguments are like this! However, more often there’s a complicating factor that makes the snappy answer incorrect. In the water-use case, it’s that the closed-loop system has to itself be cooled by an open-loop evaporative chiller. What about the space datacenter case?&lt;/p&gt;
&lt;h3&gt;Why cooling is possible in space&lt;/h3&gt;
&lt;p&gt;First, let’s give the argument a fair shake. Although space is itself very cold, cooling is tricky because everything you’d want to cool is surrounded by vacuum. Heat transfer works in three ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Hot (i.e. fast-moving) atoms bump into other atoms, making them move and thus heating them up&lt;/li&gt;
&lt;li&gt;Hot atoms physically move from one location to another (e.g. in a fluid or gas), staying hot and thus making their new location hotter&lt;/li&gt;
&lt;li&gt;Hot objects emit photons (electromagnetic radiation), cooling themselves down and heating up other objects those photons collide with&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Vacuum is an excellent insulator because it defeats the first two methods of heat transfer. If there are no (or very few) atoms surrounding an object, those atoms can’t move around or collide. That’s why vacuum is used as an insulator in thermoses, travel mugs, and so on.&lt;/p&gt;
&lt;p&gt;So how can space datacenters get rid of their heat? By doubling down on the third method of heat transfer. Although it’s much harder to do heat transfer via moving atoms around in space, it’s actually &lt;em&gt;easier&lt;/em&gt; to do heat transfer via emitting radiation. Any good emitter is also a good absorber. A perfectly black object is the most efficient emitter, but it’s also the most efficient way to absorb photons from external sources, which is why black objects get hotter in the sun&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. In space, the sun’s light is much easier to avoid, because there aren’t objects everywhere for it to bounce off. A shaded radiator can dump quite a lot of heat.&lt;/p&gt;
&lt;h3&gt;Why cooling is still going to be hard&lt;/h3&gt;
&lt;p&gt;It would still require putting more radiators in space than we’ve ever done before. There are plenty of writeups out there if you want to read through the numbers. &lt;a href=&quot;https://arxiv.org/abs/2604.27197&quot;&gt;This&lt;/a&gt; is a recent one that estimates ~2500 square metres of radiation area would be needed to serve 1MW of datacenter energy (much less than what it’d need in solar panels)&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. A serious AI datacenter is around 100MW&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;, so we’d need 250,000 square metres of radiation area. The largest current radiator in space is probably the ISS, at around a thousand square metres.&lt;/p&gt;
&lt;p&gt;Is scaling that up by 250x a lot? Yes, but it’s not necessarily &lt;em&gt;ridiculous&lt;/em&gt;. We currently have zero industrial operations happening in space, so there’s been no need to push the boundaries here. In the grand scheme of things, 250,000 square metres is not that big. By my very rough estimates, that’s between 100-500 Starship launches: a couple of years at SpaceX’s current launch cadence, or a few months at their (very optimistic) estimate of future launch cadence.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Of course, you don’t just need radiators to put a datacenter in space. You need a similar quantity of solar panels, the GPUs themselves, and all kinds of other supporting equipment. If a GPU dies in an Earth datacenter, you can go in and swap it out; if it dies in space, you just have to leave it dead and keep going with less capacity.&lt;/p&gt;
&lt;p&gt;It’s still wildly impractical to build AI datacenters in space. But it’s not &lt;em&gt;impossible&lt;/em&gt;, and it’s certainly not impossible because of the cooling, which is a relatively minor component of the total mass that would have to be launched into space.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;In theory, black clothing would keep you slightly colder at night.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Nobody ever talks about how impossible it would be to &lt;em&gt;power&lt;/em&gt; space datacenters, despite the fact that you’d need to launch over triple the solar panel area into space than radiation area. I guess because people know solar panels exist and that the sun shines in space.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;The first gigawatt AI data centers are coming online this year, but 100MW is a fair estimate for a current pretty-large-but-not-enormous AI datacenter.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Thinking Machines and interaction models]]></title><link>https://seangoedecke.com/interaction-models/</link><guid isPermaLink="false">https://seangoedecke.com/interaction-models/</guid><pubDate>Tue, 12 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Thinking Machines just released &lt;a href=&quot;https://thinkingmachines.ai/blog/interaction-models/&quot;&gt;&lt;em&gt;Interaction Models&lt;/em&gt;&lt;/a&gt;. This is their first real AI model release&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; after a year of work and two billion dollars of capital. What is an “interaction model”? First, &lt;strong&gt;it’s not a frontier model&lt;/strong&gt;. Thinking Machines is not yet competing with OpenAI, Anthropic and Google.&lt;/p&gt;
&lt;p&gt;Instead, they’re working on the problem of better real-time interaction with models. Some parts of what they’re doing are not new at all, other parts are slightly-questionable benchmark gaming, and still other parts represent a genuine technological advancement. I’ll try to lay it all out.&lt;/p&gt;
&lt;h3&gt;Fully-duplex voice models&lt;/h3&gt;
&lt;p&gt;If you’ve used ChatGPT in audio mode, you know that you can’t talk to it exactly how you’d talk to a human. There’s a big latency gap between when you finish talking and when the model jumps in. The model won’t interrupt you like a human, and doesn’t react to you interrupting it like a human would either. And of course you can’t give the model visual feedback like facial expressions.&lt;/p&gt;
&lt;p&gt;That’s because &lt;strong&gt;ChatGPT is either speaking or listening at any given time&lt;/strong&gt;. When you’re talking, it’s in “listening” mode; when it’s talking, it’s in “speaking” mode, and isn’t absorbing any information from you. It relies on VAD (“voice activity detection”) to figure out if you’re talking. The alternative (and what “interaction models” do) is a fully-duplex system, where the model is constantly both in listening and speaking mode at the same time.&lt;/p&gt;
&lt;p&gt;Of course, the model can’t literally do this. Like all language models, it’s either doing prefill (ingesting prompt tokens) or decode (producing completion tokens). But what fully-duplex models &lt;em&gt;can&lt;/em&gt; do is switch from listening to speaking mode in tiny chunks, called “micro-turns”. Instead of listening for ten seconds (or however long it takes you to stop talking), then speaking for ten seconds (or however long it takes to pass the model output through TTS), the model can listen for 200ms, then output for 200ms, then listen for 200ms, and so on. While the user is speaking, the model will know to output silence - most of the time. But if it decides it’s good to interrupt you or speak at the same time as you, it’s capable of doing that.&lt;/p&gt;
&lt;p&gt;So far, so unoriginal. There are plenty of examples of fully duplex audio systems that the Thinking Machines blog post already cites: &lt;a href=&quot;https://github.com/kyutai-labs/moshi&quot;&gt;Moshi&lt;/a&gt;, &lt;a href=&quot;https://github.com/NVIDIA/personaplex&quot;&gt;PersonaPlex&lt;/a&gt;, &lt;a href=&quot;https://build.nvidia.com/nvidia/nemotron-voicechat&quot;&gt;Nemotron-VoiceChat&lt;/a&gt;, and so on. But at least this outlines the space that “interaction models” are playing in: not “superintelligence from a frontier model”, but “better real-time conversational interaction”&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. Given that, what is Thinking Machines doing that’s new?&lt;/p&gt;
&lt;h3&gt;Delegating reasoning&lt;/h3&gt;
&lt;p&gt;For existing fully-duplex models, you talk to the model itself. That’s a fairly big problem, since fully-duplex models have to be fast: fast enough that they can operate in tiny 200ms turns&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. A model that fast cannot be particularly intelligent.&lt;/p&gt;
&lt;p&gt;Thinking Machines’ solution is to introduce an actual smart model - any regular language model will do here - in the background that the interaction model can delegate tasks to. In practice this is probably implemented as a tool call. The interaction model keeps chatting while the smart model works away, and then the smart model output is directly integrated into the interaction model’s context in the same way as audio and video input (a genuinely cool idea, I think).&lt;/p&gt;
&lt;p&gt;This is kind of neat, though it remains to be seen how well it works in practice. Will the model do a lot of “oh wait, the last thing I said was dumb, never mind” self-correction as the smarter model output trickles in? Will the fast interaction model be smart enough to delegate the right tasks at the right time? In general, the “start with a fast dumb model and have it hand off tasks” approach has been tricky for the AI labs to get right for a variety of reasons.&lt;/p&gt;
&lt;p&gt;If I’m being uncharitable, I might say that bolting on a strong reasoning model was an easy way for Thinking Machines to post impressive values for competitive benchmarks like FD-bench V3 (where they barely beat GPT-realtime-2.0) and BigBench Audio (where introducing the reasoning model bumps their score from 76% to 96%, only 0.1% below GPT-realtime-2.0). If I’m being charitable, I might say that a model fast enough for realtime conversation will have to have some way to punt hard tasks to a slower, smarter model. Both of those things are probably true.&lt;/p&gt;
&lt;h3&gt;Scale&lt;/h3&gt;
&lt;p&gt;It’s also worth noting that Thinking Machines have also bolted on video input to their fully-duplex model. This is more exciting than it sounds, because face-to-face human conversation is very dependent on being able to read human expressions. In theory, this could unlock the ability to have genuine human-like conversations.&lt;/p&gt;
&lt;p&gt;The other reason why this is exciting is that it means Thinking Machines have been able to make a pretty big fully-duplex model (maybe twice the size of Moshi in terms of active parameters, and 40x the size in terms of total parameters).&lt;/p&gt;
&lt;p&gt;In fact, this is probably the biggest real technical achievement here. Other fully-duplex models are already doing micro-turns and interruptions, and could delegate reasoning fairly easily if they wanted to, but they aren’t doing video because they &lt;em&gt;can’t&lt;/em&gt;. Being able to make a fully-duplex model the size of DeepSeek V4-Flash is pretty impressive.&lt;/p&gt;
&lt;p&gt;Much of the Thinking Machines blog post is dedicated to explaining how they’ve managed to do this: ingesting data in a more lightweight way, optimizing their inference libraries for tiny prefill/decode chunks, various decisions to make inference deterministic (a long-held &lt;a href=&quot;https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/&quot;&gt;hobbyhorse&lt;/a&gt; for Thinking Machines).&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;There’s a lot of pressure on Thinking Machines to produce a genuine AI advancement. It doesn’t seem like they’re willing or able to compete in the frontier-model space (which makes sense, I wouldn’t want to either). Given that, I can see why they’re highlighting the parts of interaction models that are impressive to laypeople - all the fully-duplex interaction stuff - even though those parts are not truly innovative.&lt;/p&gt;
&lt;p&gt;So what are Interaction Models? &lt;strong&gt;A scaled-up, multimodal version of existing fully-duplex models like Moshi, with a real model bolted on for extra intelligence&lt;/strong&gt; (and maybe better benchmarks). The scale and video parts are new and cool, and something like the overall approach has to be right. In general, I’m glad that we’ve got well-funded and high-profile AI labs tackling problems other than “build a smarter frontier model”. I think there’s a lot of low-hanging fruit waiting to be picked in other areas of AI research.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;People do seem to really like &lt;a href=&quot;https://thinkingmachines.ai/tinker/&quot;&gt;Tinker&lt;/a&gt;, which is their tooling for researchers who want to fine-tune models, but it’s not exactly the hot new frontier model that people were expecting.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I think it’s at least a little shady that the Interaction Models video demo is making a big deal about some features (like real-time simultaneous translation) that are just features of fully-duplex audio models, not anything specific to their system.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Even 200ms is a bit long. You can see from the demo that there’s an uncomfortable half-second lag sometimes as the model finishes its prefill slice and has to move to the decode slice.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[The left-wing case for AI]]></title><link>https://seangoedecke.com/the-left-wing-case-for-ai/</link><guid isPermaLink="false">https://seangoedecke.com/the-left-wing-case-for-ai/</guid><pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In &lt;a href=&quot;https://www.seangoedecke.com/many-anti-ai-arguments-are-conservative/&quot;&gt;&lt;em&gt;Many anti-AI arguments are conservative arguments&lt;/em&gt;&lt;/a&gt; I argued that left-wing anti-AI sentiment&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; is partly a backlash to two unrelated events around the rise of ChatGPT: the crypto mania of 2022 and the pro-Donald-Trump push many big tech CEOs made in 2024. If the timing had been different, we could have had a real pro-AI faction on the left. What would that look like?&lt;/p&gt;
&lt;p&gt;I’m not going to respond to any of the popular anti-AI arguments (I’ve already done that &lt;a href=&quot;https://www.seangoedecke.com/is-ai-wrong/&quot;&gt;here&lt;/a&gt;). I think it’s more interesting to outline some explicitly left-wing pro-AI arguments.&lt;/p&gt;
&lt;h3&gt;Disability&lt;/h3&gt;
&lt;p&gt;The left wing has (correctly) taken a broad view on what can be an acceptable disability aid. When criticizing potentially-exploitative companies - for instance, food delivery apps like DoorDash - they often stop to acknowledge that some people have few alternatives to those services, and that they have meaningfully improved the lives of the disabled or chronically ill.&lt;/p&gt;
&lt;p&gt;I think it’s obvious that LLMs are a powerful disability aid. Like any technology that makes it easier to interact with a computer, they’re useful to people who are trying to overcome all kinds of barriers. Almost every video online is now &lt;a href=&quot;https://www.reddit.com/r/antiai/comments/1t71o25/comment/okq9q9n/?utm_source=share&amp;#x26;utm_medium=web3x&amp;#x26;utm_name=web3xcss&amp;#x26;utm_term=1&amp;#x26;utm_content=share_button&quot;&gt;automatically captioned&lt;/a&gt;. People with &lt;a href=&quot;https://www.reddit.com/r/antiai/comments/1t71o25/comment/oklw2v5/?utm_source=share&amp;#x26;utm_medium=web3x&amp;#x26;utm_name=web3xcss&amp;#x26;utm_term=1&amp;#x26;utm_content=share_button&quot;&gt;brain fog&lt;/a&gt; or &lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/17sg5mg/as_an_articulate_disabled_person_i_feel_like_ai/&quot;&gt;chronic pain&lt;/a&gt; are using LLMs to make it easier to interact with their computers. People who are &lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/17sg5mg/comment/k8qeev1/?utm_source=share&amp;#x26;utm_medium=web3x&amp;#x26;utm_name=web3xcss&amp;#x26;utm_term=1&amp;#x26;utm_content=share_button&quot;&gt;neurodivergent&lt;/a&gt; use ChatGPT to &lt;a href=&quot;https://www.reddit.com/r/disability/comments/1m9c8tv/comment/n564m7h/?utm_source=share&amp;#x26;utm_medium=web3x&amp;#x26;utm_name=web3xcss&amp;#x26;utm_term=1&amp;#x26;utm_content=share_button&quot;&gt;“code switch”&lt;/a&gt; their emails into neurotypical-friendly language. People with &lt;a href=&quot;https://www.reddit.com/r/disability/comments/1m9c8tv/comment/n56a2le/?utm_source=share&amp;#x26;utm_medium=web3x&amp;#x26;utm_name=web3xcss&amp;#x26;utm_term=1&amp;#x26;utm_content=share_button&quot;&gt;mobility&lt;/a&gt; or &lt;a href=&quot;https://www.reddit.com/r/disability/comments/1m9c8tv/comment/o7lpfnc/?utm_source=share&amp;#x26;utm_medium=web3x&amp;#x26;utm_name=web3xcss&amp;#x26;utm_term=1&amp;#x26;utm_content=share_button&quot;&gt;vision&lt;/a&gt; issues are making heavy use of LLM voice controls. And so on.&lt;/p&gt;
&lt;p&gt;This is a &lt;em&gt;fascinating&lt;/em&gt; point of conflict in left-wing anti-AI spaces. Every so often somebody will &lt;a href=&quot;https://www.reddit.com/r/disability/comments/1m9c8tv/what_are_your_thoughts_on_disabled_people_using/&quot;&gt;ask&lt;/a&gt; &lt;a href=&quot;https://www.reddit.com/r/antiai/comments/1t71o25/okay_i_wanna_ask_does_generative_ai_help_disabled/&quot;&gt;“hey, wouldn’t LLMs help disabled people?”&lt;/a&gt;, and the comments will devolve into a dogpile of (often non-disabled) people slamming AI and a handful of disabled people trying to explain their experience. If anti-AI sentiment weren’t so strong on the left for other reasons, I think there’d be a current of left-wing AI supporters on a disability-rights basis.&lt;/p&gt;
&lt;h3&gt;Chronic illness and medical care&lt;/h3&gt;
&lt;p&gt;One popular anti-AI argument - that cavalier deployment of AI means that people might take &lt;a href=&quot;https://www.bbc.com/news/articles/cpd8l088x2xo&quot;&gt;dangerous medical advice&lt;/a&gt; instead of simply trusting their doctor - is actually a pro-AI argument in disguise. As anyone who’s been close to a person with chronic illness knows, “just trust your doctor” is kind of right-wing-coded itself, and that the left-wing position is &lt;a href=&quot;https://www.painnewsnetwork.org/stories/2026/4/10/doctor-faces-backlash-after-tweet-claims-four-chronic-illnesses-are-overdiagnosed&quot;&gt;very&lt;/a&gt; &lt;a href=&quot;https://yorkspace.library.yorku.ca/server/api/core/bitstreams/4ac9d968-e9b0-491b-888a-d4ed5aeb1ac3/content&quot;&gt;sympathetic&lt;/a&gt; to patients who don’t or can’t&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. &lt;/p&gt;
&lt;p&gt;Many doctors are not very good at handling unusual medical cases. If you have an unusual medical case, you have to learn to advocate for your own care, which often involves researching your own condition. This is &lt;em&gt;precisely&lt;/em&gt; the kind of thing where LLMs are useful, because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The medical questions involved are often complex but well-explored in the literature (i.e. good fodder for a LLM)&lt;/li&gt;
&lt;li&gt;The patient is motivated enough to check individual sources themselves&lt;/li&gt;
&lt;li&gt;Having to convince a doctor to prescribe treatment is a guardrail for any human-LLM interaction that goes well off the deep end&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Various chronic illness groups are waging a long, quiet war against the medical orthodoxy that ignores or dismisses them. A classic example of this war being won is &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/books/NBK565622/&quot;&gt;endometriosis&lt;/a&gt;, which was once viewed as a largely psychological issue. Unfortunately, this is largely a guerrilla war: the institutional power and inertia is all on the side of the medical establishment. LLMs can be a useful tool for the chronically ill to make cogent arguments or write petitions in the language of that establishment.&lt;/p&gt;
&lt;h3&gt;Class and code-switching&lt;/h3&gt;
&lt;p&gt;Fighting the power of the establishment is not limited to doctors and medicine. Another common (and correct) left-wing target is &lt;em&gt;class&lt;/em&gt;. To see why, let’s consider Patrick McKenzie’s classic description of a &lt;a href=&quot;https://x.com/patio11/status/1162561822248992768&quot;&gt;“dangerous professional”&lt;/a&gt; mode of communication. The idea here is that by adopting a particular style, you can communicate to a bureaucracy that you are a person to take seriously, and someone who they should appease instead of brushing off. This includes, but isn’t limited to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An unemotional register&lt;/li&gt;
&lt;li&gt;Correct and somewhat stuffy grammar&lt;/li&gt;
&lt;li&gt;Signaling awareness of regulatory or legal options (for instance, explicitly requesting a paper trail)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Unless you have gone through the right educational or work pipeline, it can be tricky to hit this register exactly. A common failure mode is to go over the top: trying to write in grammar so elevated that it just reads as silly, or citing an overabundance of law or precedent where one would suffice. This reads as “crank”, not “dangerous professional”, and will get dismissed as quickly as the unprofessional “OMG that’s not helpful I will sue you” response.&lt;/p&gt;
&lt;p&gt;LLMs provide a dangerous professional translation service. You now don’t have to be able to match the style, you simply &lt;em&gt;have to know it exists&lt;/em&gt;, and the LLM will do the rest. In fact, the LLM will provide the substance, not only the style. It can tell you which regulators to contact and how, and what to say once you’ve contacted them. In other words, AI has now made it possible for a wide variety of social classes to access escalation pathways that were originally designed for the narrow professional class.&lt;/p&gt;
&lt;h3&gt;Education&lt;/h3&gt;
&lt;p&gt;Another common left-wing position is that education is gatekept by class and status. The idea here is that everyone has equal potential for accomplishment, but certain types of people get more educational opportunities, and that this explains uneven downstream outcomes. For instance, compare a wealthy neighborhood where every child gets private tutoring to a neighborhood where it’s unusual to complete high school.&lt;/p&gt;
&lt;p&gt;It seems obvious to me that LLMs now make private tutoring available to every student who wants it. Of course, if you’re a lazy student, LLMs probably make things worse by adding an additional temptation to cheat. But if you’re motivated and just lack the opportunity, quizzing a LLM on basically any high-school level topic is a great way to learn.&lt;/p&gt;
&lt;p&gt;The common rebuttal to this is that LLMs can’t be relied on because they hallucinate. Like the doctor example, I struggle to believe that anyone making this argument is actually comparing LLMs with the alternatives. Teachers “hallucinate” &lt;em&gt;all the time&lt;/em&gt;. I think every single kid who was smart in school has multiple stories of teachers insisting they were right about something obviously wrong&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;I wonder what we’d find if we rigorously compared the baseline teacher error rate with the hallucination rate of current LLMs. From the only study I could find (&lt;a href=&quot;https://files.eric.ed.gov/fulltext/ED672091.pdf&quot;&gt;this&lt;/a&gt; 2016 study): “Analysis at the lesson level, however, shows that about 42% of lessons contained a mathematical content error”. I bet that’s a higher rate than we’d see from GPT-5.5-Thinking on middle-school mathematics, though I don’t want to draw too many conclusions from one study.&lt;/p&gt;
&lt;p&gt;The education pro-AI argument also overlaps with the disability pro-AI argument. Students with ADHD or other issues are often badly underserved by the education system. LLMs can transform educational content into whatever way the student can best consume it (written format, or audio, or a quiz, or a dialogue, and so on).&lt;/p&gt;
&lt;h3&gt;Utopia&lt;/h3&gt;
&lt;p&gt;Finally, if you believe left-wing views are correct - which, definitionally, left-wingers do - and you’re optimistic about the technology, you might believe that a very smart model will inherently be kind of left-wing.&lt;/p&gt;
&lt;p&gt;This position is kind of a holdover from the 2000s and 2010s, when the left-wing (and people in general) were more optimistic about technology. People thought technological progress would usher in a post-scarcity age of &lt;a href=&quot;https://en.wiktionary.org/wiki/Fully_Automated_Luxury_Gay_Space_Communism&quot;&gt;fully automated luxury gay space communism&lt;/a&gt;&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. A super-smart, super-capable left-wing AI is a core part of that picture.&lt;/p&gt;
&lt;p&gt;In fact, you might believe that this has already happened, for a certain value of “left-wing”. All current frontier models profess left-leaning views. The obvious explanation is that this reflects the bias of their training data or of the AI labs, but that’s a trickier argument than it sounds. First, Elon Musk tried &lt;a href=&quot;/ai-personality-space&quot;&gt;really hard&lt;/a&gt; to train a right-wing frontier LLM and (at least so far) has &lt;em&gt;failed&lt;/em&gt;. Second, models are not just the median of all their training data. If they were, they wouldn’t be able to solve mathematics or programming problems far above the median person. There is clearly a way that models can be pulled towards the “smart” end of their training data, probably via reinforcement learning. If the smart end of their training data turns out to be left-wing, isn’t that worth celebrating?&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;What are the strong left-wing arguments in favor of LLMs?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLMs are a powerful disability aid, at minimum for various neurodiverse people and those with motor or vision issues&lt;/li&gt;
&lt;li&gt;LLMs enable those who suffer from medical discrimination to actually do their own research, instead of having to rely entirely on the biased and dismissive medical establishment&lt;/li&gt;
&lt;li&gt;LLMs remove the communication advantage of the wealthy “professional class”, and enable those of all backgrounds to lobby institutions in ways that actually work&lt;/li&gt;
&lt;li&gt;LLMs lessen the massive educational advantage that children from wealthy areas get, by providing everyone with a private tutor that’s at least as good as the median&lt;/li&gt;
&lt;li&gt;If you’re a technologically-optimistic left-wing person, you should celebrate that all current powerful LLMs are left-wing, and that one pillar of the science-fiction left-wing utopia might be establishing itself right now&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of these, I think the disability and bias arguments are the most persuasive (though the impact on education will be huge and difficult to predict&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;). I want to close with a quote that one of my readers, &lt;a href=&quot;https://toot.cafe/@matt&quot;&gt;Matt&lt;/a&gt;, wrote to me over email and kindly allowed me to share. It’s fair to say that it inspired this post:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“I’ve long been uncomfortable with the absolute left-wing anti-AI stance because, if similar reasoning had been applied to outright reject computers as fascist and unethical in the 80s and onward, my own life would have been quite different, and arguably worse. I have enough usable vision to handwrite, uncomfortably, with my head against the page. I did more of that than I wanted in school (I started first grade, in the US K-12 system, in 1987). Computers saved me from having to do even more, starting with my family’s home computer and other desktop computers in the classrooms that had them, and then on my own laptop. Would I want a world where I had been forced to handwrite more, or perhaps write in Braille with humans transcribing it for the benefit of sighted teachers and peers, or maybe write on a typewriter (for some reason I don’t recall ever trying that)? Then again, am I selfish to consider only my own comfort? After all, the manufacturing of computers inflicts its own harms on people, harms that I’m comfortably distant from. And of course, using computers as a child led to a career in software development. What kind of work would I be doing now if that path hadn’t been available? And now that AI helps at least one group of disabled people (of which I’m more or less a part), do I want to deny that benefit?”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;edit: this post got some &lt;a href=&quot;https://www.reddit.com/r/accelerate/comments/1ta68x5/the_leftwing_case_for_ai/&quot;&gt;comments&lt;/a&gt; &lt;a href=&quot;https://www.reddit.com/r/LeftistsForAI/comments/1ta2ps9/the_leftwing_case_for_ai/&quot;&gt;on&lt;/a&gt; &lt;a href=&quot;https://www.reddit.com/r/aiwars/comments/1ta2swl/the_leftwing_case_for_ai/&quot;&gt;Reddit&lt;/a&gt;, and was also discussed on &lt;a href=&quot;https://news.ycombinator.com/item?id=48083264&quot;&gt;Hacker News&lt;/a&gt;. I’ve also gotten some very interesting email from readers, who have pointed me towards sources like &lt;a href=&quot;https://www.theguardian.com/technology/2026/apr/07/the-life-changing-magic-of-wearing-smartglasses&quot;&gt;this&lt;/a&gt; and &lt;a href=&quot;https://www.youtube.com/watch?v=J4iQQtqenuI&quot;&gt;this&lt;/a&gt; for more high-profile examples of AI being used as a disability aid.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I’m deliberately using “right-wing” and “left-wing” very loosely here to describe very broad ideological tents, because I’m interested in the broad currents of public opinion.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;If this paragraph seems familiar, it began as a footnote in &lt;a href=&quot;/many-anti-ai-arguments-are-conservative/&quot;&gt;my other post&lt;/a&gt;. &lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;For instance, I remember a teacher arguing with me in early primary school that one minus two equalled some decimal answer, instead of minus one.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Most skillfully portrayed &lt;a href=&quot;https://en.wikipedia.org/wiki/Culture_series&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;My guess is that the median education suffers (since cheating is now so easy), but the top-percentile of highly-motivated, successful students will grow significantly.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[AI makes weak engineers less harmful]]></title><link>https://seangoedecke.com/ai-makes-weak-engineers-less-harmful/</link><guid isPermaLink="false">https://seangoedecke.com/ai-makes-weak-engineers-less-harmful/</guid><pubDate>Sat, 09 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Like other kinds of puzzle-solving, software engineering ability is strongly heavy-tailed. The strongest engineers produce way more useful output than the average, and the weakest engineers often are actively net-negative: instead of moving projects along, they create problems that their colleagues have to spend time solving. That’s why many tech companies try to &lt;a href=&quot;https://www.levels.fyi/companies/jane-street/salaries&quot;&gt;build&lt;/a&gt; a small, ludicrously well-paid team instead of a large team of more average engineers, and why so far this seems to be a winning strategy.&lt;/p&gt;
&lt;p&gt;Being effective in a large tech company is often about managing this phenomenon: trying to arrange things so that the most competent people land on projects you want to succeed, and the least competent are shunted out of the way&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. For instance, if you’re technical lead on a project, you more or less have to ensure&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; that the most critical pieces are in the hands of people who won’t screw them up (whether by directly assigning the work, or by making sure someone can “sit on the shoulder” of the engineer who you’re worried about).&lt;/p&gt;
&lt;p&gt;Claude Code changed this. Frontier LLMs don’t have the taste or the system familiarity of a strong engineer, but they have absolutely raised the floor for weak engineers. Instead of getting a pull request that could never possibly work or would cause immediate problems, the worst you’ll now see is a standard LLM pull request: wrong in some ways, baffling in others, but at least functional on the line-by-line level and not so obviously incorrect that someone with no knowledge of the codebase could point it out. That is a huge improvement!&lt;/p&gt;
&lt;p&gt;You can try this out yourself. If you attempt to deliberately make mistakes while working with a coding agent, you’ll find that the agent pushes back hard against many obvious errors (i.e. caching user data with a non-user-specific key, writing an infinite loop that might never terminate, or leaking open files). Of course, the agent will still miss subtle errors, particularly ones that require understanding other parts of the codebase.&lt;/p&gt;
&lt;p&gt;Working with the least effective engineers is now sometimes like working with a Claude Opus or Codex instance that you communicate with over Slack. Occasionally it’s &lt;em&gt;literally&lt;/em&gt; that: your colleague is simply pasting your messages into Claude Code and pasting you the response. This is annoying, but it’s a much better experience than working with this kind of engineer directly. After all, you probably already work with a bunch of LLM instances. The Slack interface is not ideal - unlike using Claude Code directly, you sometimes wait hours or days for a response, and you don’t get visibility into the agent’s thought processes - but it’s still helpful on the margin. More compute being thrown at your problem is better than less.&lt;/p&gt;
&lt;p&gt;Of course, this isn’t a great state of affairs for the engineer in question, who is almost certainly learning less than if they were making their own (bad) decisions. It’s also a bad state of affairs for the company, who is paying a human salary and getting a Copilot subscription (which they’re likely also paying for)&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. After the current push to figure out what value AI is adding to engineers, I suspect there will be a push to figure out what value &lt;em&gt;engineers are adding to AI&lt;/em&gt;, and the engineers who aren’t adding much may find themselves out of a job.&lt;/p&gt;
&lt;p&gt;You can’t talk to Claude-over-Slack like you’d talk to normal Claude. If you tend to handle LLMs roughly (insulting them, or just being very curt), you’ll have to change your communication style. A human is going to read your messages, after all, even if you’re really interacting with a LLM. There’s no point being rude. But if, like me, you say please-and-thank-you to the models&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, you can treat your LLM-using coworker as just another Copilot window or Codex tab. It’s far better than having to treat them as an unwitting saboteur.&lt;/p&gt;
&lt;p&gt;Not all net-negative engineers use AI tools like this. Many are strongly convinced in their own wrong opinions about how to build good software, or mistrust AI in general, or believe that relying heavily on LLMs is not a good way to improve&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;. But &lt;em&gt;no&lt;/em&gt; strong engineers use AI tools like this. Even when they’re being lazy or sloppy, a capable engineer will have enough baseline taste to catch obvious AI-generated errors. So the phenomenon of engineers&lt;sup id=&quot;fnref-6&quot;&gt;&lt;a href=&quot;#fn-6&quot; class=&quot;footnote-ref&quot;&gt;6&lt;/a&gt;&lt;/sup&gt; becoming thin wrappers around Claude Code is limited to the kind of engineers for whom this is an improvement in their work product.&lt;/p&gt;
&lt;p&gt;edit: this ended up being the topic of a Theo &lt;a href=&quot;https://www.youtube.com/watch?v=rTMRlqT8Q8c&quot;&gt;video&lt;/a&gt; on YouTube.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;More charitably: many “least competent” engineers are just out of their comfort zone, and can be fine or even excel under the right circumstances (though in my view the best engineers are able to do good work in a wide variety of environments). Also, I don’t currently work with a lot of incompetent people. Much of this is based on past experience or talking to other engineers in the industry.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Since your managers are doing the same thing, this can sometimes feel like Moneyball: you’re trying to identify underappreciated talent who are strong enough to help you win without being so high-profile that your boss poaches them to lead something else.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I suppose it’s better to pay for nothing than to pay for net-negative output, but it still doesn’t seem &lt;em&gt;good&lt;/em&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;I think this is actually the right way to hold Claude Opus 4.7.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;Is this true? I think relying on LLMs is not a great way for most engineers to improve, but if LLM output is consistently better than your own, it might be different. So long as you’re paying attention to where the LLM does better, it could actually be a good way to learn.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;I don’t have as much experience (or anecdotes) about non-engineers falling into this trap, but &lt;a href=&quot;https://nooneshappy.com/article/appearing-productive-in-the-workplace/&quot;&gt;this post&lt;/a&gt; has convinced me that it might be worse.&lt;/p&gt;
&lt;a href=&quot;#fnref-6&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Notes on incidents]]></title><link>https://seangoedecke.com/notes-on-incidents/</link><guid isPermaLink="false">https://seangoedecke.com/notes-on-incidents/</guid><pubDate>Fri, 08 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Incidents are boring.&lt;/strong&gt; Most of what you actually do during an incident is wait: for some other team to investigate, or for a deploy to finish, or for the result of some change to become apparent, or for someone else who’s been paged to come online. It’s stressful, but there’s often just not that much to do.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Most incidents resolve on their own.&lt;/strong&gt; People love to share war stories about incidents where some hero engineer improvised a clever fix that instantly repaired the system. That rarely happens. Well-designed software systems tend to come good by themselves, and many modern systems are at least partly well-designed, by virtue of being built out of really solid pieces. If a server process is crashing or leaking memory, Kubernetes will kill the pod and bring it back up. If a service is overloaded and jammed up, clients will (hopefully) trigger circuit breakers and back off until it can recover. Temporary spikes in expensive operations will often just fill up a queue instead of taking the entire system down. Most incident calls I’ve been on - well over half - would have come good by themselves in roughly the same time without any human intervention.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Most incident-resolving actions make incidents worse.&lt;/strong&gt; Engineers jump too quickly to resolve incidents. Oh, the queue size is huge? Don’t worry, I’m here in a production console to clear the queue! Unfortunately, some of the jobs I just nuked were doing important billing work and aren’t automatically re-queued, so this queue-latency incident just became a billing incident as well. Another classic in this genre is “engineer forces a series of redeploys to “fix” a concerning-looking metric, and the concurrent deploys cause far more stress on the system than whatever was causing the metric to look weird”.&lt;/p&gt;
&lt;p&gt;For that reason, &lt;strong&gt;the first thing you should do in an incident is &lt;em&gt;nothing&lt;/em&gt;&lt;/strong&gt;. When I was paged late at night, I used to have a habit of pouring myself a glass of scotch before I joined the call. This was only partly for the tranquilizing effects of alcohol: the main reason was to have a ritual I could go through to convince myself that I wasn’t rushing, and that it was OK to take a few breaths and relax before jumping into the problem&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. Making a cup of tea or going for a walk around the house would probably have served as well.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Effective incident-resolving actions are often dull.&lt;/strong&gt; Typically the action needed to resolve the incident - assuming it doesn’t resolve on its own - is to temporarily disable some problematic feature until the system recovers. This is never a complex code change. Typically someone spends five minutes putting together the patch, and then an hour waiting for reviews, CI, and deploying. If you’re very lucky, you’ll get to write a “wrap a cache around it” code change.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In an incident, there is no substitute for knowledge of the system.&lt;/strong&gt; Five strong engineers can troubleshoot on an incident call and get nowhere, while one half-drunk engineer who’s familiar with the codebase can swan in and immediately fix the problem. This is because the kinds of actions that resolve incidents are so simple: if you’ve been the one working on the project, you likely already know exactly what feature flag to check and disable, or what code change to revert.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Resolving incidents requires courage.&lt;/strong&gt; Incident calls can be scary. When engineers are scared, they often reach for consensus: hedging their statements, asking the group if they agree a particular course of action is safe, deferring to each other, and so on. But if you’re the one with knowledge of the system, you have to be decisive. Say “I’m going to do X”, wait thirty seconds, then do it. While it’s usually net-negative to have a powerful manager fidgeting on the incident call, this is one of the rare cases where it can be helpful - executives are very comfortable saying “okay, do it now” about technical courses of action they don’t fully understand.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Resolving incidents buys a lot of political credit.&lt;/strong&gt; One thing that I think surprises a lot of engineers who are new to on-call is how &lt;em&gt;grateful&lt;/em&gt; managers and executives are for even really simple fixes (i.e. “turn off the feature flag”). This is because incidents are one of the few times that non-technical leadership are directly confronted with their lack of control over the technical sphere. When the team is building a product, your VP has a lot of freedom to guide the process and make decisions. But when there’s an active incident, they have to just sit there and trust that their technical employees are going to pull them out of the fire. It’s a scary situation, particularly for someone who’s used to exercising a degree of power in the workplace.&lt;/p&gt;
&lt;p&gt;However, &lt;strong&gt;&lt;em&gt;always&lt;/em&gt; resolving incidents is (by itself) not a durable position of power.&lt;/strong&gt; This is a little counter-intuitive. Surely if you’re always resolving incidents, you’re indispensable? The problem is that incident-resolving work is almost always so techical as to be completely opaque to executives. They know the incident has resolved, but they don’t know if you did a heroic effort or merely did the obvious thing. They also can’t point to your successes as theirs (which is always the most reliable way to get VPs and directors on your side), because incidents &lt;em&gt;are expected to be fixed&lt;/em&gt;, and it’s always better &lt;em&gt;not to have had the incident at all&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;edit: I got an interesting reader email saying that in their experience incidents usually don’t go away on their own. It turns out they’ve typically worked at smaller companies than me. I suspect this is a system-size thing: big tech companies have more sprawling systems with more third-party dependencies, so it’s more common for something to go wrong and self-recover.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I don’t need to do this anymore because I just don’t get as keyed up about incidents as I used to.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Why hasn't longer-horizon training slowed AI progress?]]></title><link>https://seangoedecke.com/why-hasnt-longer-horizon-training-slowed-ai-progress/</link><guid isPermaLink="false">https://seangoedecke.com/why-hasnt-longer-horizon-training-slowed-ai-progress/</guid><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Dwarkesh Patel&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; recently &lt;a href=&quot;https://www.dwarkesh.com/p/blog-prize&quot;&gt;posted&lt;/a&gt; an award for the best answers to four key questions about AI. It’s partly a challenge and partly a job interview, since some of the winners will get offered a role as a “research collaborator”. I don’t want the job, but I do want to write down my answer to his first question: &lt;strong&gt;why hasn’t AI progress slowed down more?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There are a few reasons we might think AI progress would slow down. The particular reason Dwarkesh is interested in goes like this. Training a model (specifically reinforcement learning) requires the model to perform a task and then get “graded” on the output. As models get more powerful and tasks become harder, they take longer and require more FLOPs&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; to complete, and thus more FLOPs to train: thus training harder models will take longer.&lt;/p&gt;
&lt;p&gt;But intuitively, AI progress hasn’t slowed down that much. The famous METR horizon-length &lt;a href=&quot;https://metr.org/time-horizons/&quot;&gt;graph&lt;/a&gt; shows that AI systems are capable of more and more complex tasks over time, and that this process is accelerating, not slowing down. Why would that be?&lt;/p&gt;
&lt;h3&gt;What’s in a FLOP?&lt;/h3&gt;
&lt;p&gt;Firstly, &lt;strong&gt;it might just be the case that newer models are benefiting from orders of magnitude more FLOPs&lt;/strong&gt;. Of course, AI labs aren’t standing up orders of magnitude more GPUs (they’re trying, but there are hard physical limits on how fast you can scale up a physical datacenter). But it’s certainly possible that they’re learning to use their existing FLOPs orders of magnitude more efficiently.&lt;/p&gt;
&lt;p&gt;The efficiency of complex software systems - and the training code for a frontier AI model certainly qualifies - is not typically determined by the number of genius ideas in it. It is determined by the number of boneheaded mistakes. Take &lt;a href=&quot;https://www.dwarkesh.com/p/what-i-learned-april-15&quot;&gt;this story&lt;/a&gt;&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; of how the initial GPT-4 training run used FP16 when summing many small values, which will &lt;em&gt;completely&lt;/em&gt; mess up your results if the sum of those values is large. How much training-efficiency-per-FLOP does solving bugs like that buy? Plausibly enough to outweigh any inherent lack of efficiency from training more powerful models.&lt;/p&gt;
&lt;h3&gt;People are bad at judging intelligence&lt;/h3&gt;
&lt;p&gt;Secondly, &lt;strong&gt;intuitions about the speed of AI progress &lt;a href=&quot;/are-new-models-good&quot;&gt;are weird and unreliable&lt;/a&gt;&lt;/strong&gt;. Humans measure AI progress - and intelligence in general - on a really uneven scale. It’s easy to tell when an AI (or a person) is less smart than you, because you can just see them making mistakes. It’s very hard to tell if they’re smarter, because in that case you’re the one making mistakes. You have to rely on more subtle context clues: do they get better long-term results than you, or do they often confuse you in situations where you later end up agreeing with them, and so on.&lt;/p&gt;
&lt;p&gt;The jump from GPT-3 to GPT-4 seemed &lt;em&gt;huge&lt;/em&gt; because GPT-3 was dumber than almost all humans, and GPT-4 was sometimes as smart as a human. However, frontier models are now smart enough to be in the realm of ambiguity on many topics. It’s thus much harder to tell the “real” rate at which they’re getting smarter. Maybe the rate of growth of “raw intelligence” really has slowed down! I don’t know how we’d be in a position to know for sure.&lt;/p&gt;
&lt;h3&gt;Intelligence is not the sole determinant of capability&lt;/h3&gt;
&lt;p&gt;Thirdly, &lt;strong&gt;many traits other than intelligence determine the capabilities of AI models&lt;/strong&gt;. Take the jump in October last year where OpenAI and Anthropic models were suddenly “agentic” (i.e. they could reliably perform complex tasks end-to-end). That might be intelligence, but it might also just be a greater working memory, or more rote familiarity with the basic tools of a LLM harness, or more ability to attend to the context window, or even simply a &lt;a href=&quot;/ai-personality-space/&quot;&gt;personality&lt;/a&gt; more suited to tools like Claude Code or Codex. Of course, all of these traits are plausibly “intelligence”. But they’re traits you might instil by various clever tricks (or even just tweaking the system prompt), not by brute-forcing more FLOPs.&lt;/p&gt;
&lt;p&gt;It’s illustrative here to consider the mistake made by Apple’s infamous &lt;a href=&quot;/illusion-of-thinking/&quot;&gt;&lt;em&gt;The Illusion of Thinking&lt;/em&gt;&lt;/a&gt; paper, where the researchers asked various models to brute-force solve Tower of Hanoi puzzles with different numbers of disks, using the results to score how good at reasoning the models were. But of course when you read the output, all of the failures were cases of the model realizing that many hundreds of steps were required, and refusing to even try. These same models could trivially write code to perform the steps, or correctly go through any smaller subset of the steps. The problem wasn’t intelligence, it was &lt;em&gt;persistence&lt;/em&gt;: these models lacked the willingness to dig in and keep powering through steps until they got to an answer&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;Even inside an AI lab, I don’t think anyone has a good understanding of how many “real” FLOPs are being thrown at a training run (not counting FLOPs that are wasted on bugs). We also don’t have a clear sense of whether AI progress really is slowing down or not. Mythos seems impressive, and coding agents are really good now, but once the models get close to human intelligence it becomes really tricky to monitor. Finally, almost everyone judges intelligence by capabilities, but capabilities are produced by a constellation of many traits (intelligence is just one of them).&lt;/p&gt;
&lt;p&gt;I think this stuff is really complicated. A general theory like “RL takes more flops-per-reward as tasks get longer, therefore training will gradually slow down” sounds good, but in practice AI development is dominated by lightning strikes: silly bugs that make training a hundred times worse, clever ideas that make models a hundred times more useful, and spiky capabilities that can produce dazzling results in some areas but zero improvement in others. We are still &lt;a href=&quot;/ai-and-informal-science/&quot;&gt;very early&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;If you’re reading this you probably know who Dwarkesh is, but if you don’t: he’s a well-known tech-adjacent podcaster whose gimmick is that he actually does extensive research before each guest and asks specific technical questions.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;A FLOP is a floating-point operation, i.e. a matrix multiplication, i.e. “time on a GPU”.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I saw this in a tweet and only realized that the source was Dwarkesh when I was researching for this post.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;What if AI progress stalls for technical reasons, and everyone gives up on training new models? In that world, open source models will &lt;em&gt;eventually&lt;/em&gt; catch up, and AI labs won’t be in a privileged position.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;Incidentally, this is my pet theory about why models got much better at agentic tasks last year: training on longer and longer agentic traces meant that models started to “believe they could do it”, and made them much less likely to just give up and take shortcuts or refuse to continue.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Why I don't like the "staff engineer archetypes"]]></title><link>https://seangoedecke.com/staff-engineer-archetypes/</link><guid isPermaLink="false">https://seangoedecke.com/staff-engineer-archetypes/</guid><pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;The most influential piece of writing about staff engineers in the last decade has to be Will Larson’s &lt;a href=&quot;https://staffeng.com/guides/staff-archetypes/&quot;&gt;&lt;em&gt;Staff engineer archetypes&lt;/em&gt;&lt;/a&gt;. He argues that the “staff engineer” title covers at least four very different roles: the team lead, the architect, the solver, and the right hand. This taxonomy gets cited a lot as advice for people who are trying to become effective staff engineers. For &lt;a href=&quot;/staff-engineer-promotions&quot;&gt;both&lt;/a&gt; of my promotions to staff engineer, my manager at the time linked me to the “staff engineer archetypes” and asked me to consider which of these archetypes I was aiming towards.&lt;/p&gt;
&lt;p&gt;These archetypes definitely exist&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. However, I think it’s bad practical advice to tell engineers to try and target them.&lt;/p&gt;
&lt;h3&gt;Archetypes do not make good goals&lt;/h3&gt;
&lt;p&gt;To see why, let’s take the “team lead” archetype. Larson describes this as an informal technical leadership role: not necessarily an explicit authority figure, but someone who’s good at scoping work, planning projects, and maintaining the kind of relationships (e.g. with other teams) needed to successfully &lt;a href=&quot;/how-to-ship&quot;&gt;ship&lt;/a&gt;. If you want to fill this role, shouldn’t you start trying to do these things? No! You don’t become a technical leader by trying really hard to be a technical leader, much like you don’t become a writer by trying really hard “to be a writer”. You become a technical leader by &lt;em&gt;doing good technical work&lt;/em&gt; until your skills and relationships emerge organically.&lt;/p&gt;
&lt;p&gt;I wrote about this process in &lt;a href=&quot;/ratchet-effects&quot;&gt;&lt;em&gt;Ratchet effects determine engineer reputation at large companies&lt;/em&gt;&lt;/a&gt;. To get good at shipping large complex projects, you must start by shipping tiny pieces of work, until you’re familiar enough with the system and you’ve built enough trust to take on slightly larger pieces. At each stage, if you do good work - “good work” here means “deliver &lt;a href=&quot;/shareholder-value/&quot;&gt;shareholder value&lt;/a&gt;” - you will very naturally be given opportunities to work on more complex and important things. If you try to jump ahead, you’re going to run into all kinds of problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Important projects are usually assigned top-down, not bottom-up, so you’ll either be trying to muscle out the planned engineering lead for a project or to pitch your own (complex, important) engineering task to senior management. Either way, good luck with that!&lt;/li&gt;
&lt;li&gt;You likely won’t have a good enough relationship with senior management to know what their real priorities are.&lt;/li&gt;
&lt;li&gt;If you’re not yet trusted to execute, you may get assigned “minders” (often current staff engineers) who will ghost-lead the project through you&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;li&gt;You’ll likely make &lt;a href=&quot;/you-cant-design-software-you-dont-work-on/&quot;&gt;poor technical decisions&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The other archetypes are like this as well. If you want to become a successful architect, you do not get there by studying software architecture in the abstract, because &lt;a href=&quot;/you-cant-design-software-you-dont-work-on/&quot;&gt;you can’t design software you don’t work on&lt;/a&gt;. The “solver” and “right hand” archetypes both rely on having an enormous amount of trust and influence. You can’t aim for those archetypes directly, because trust and influence accumulate over time. In fact, the idea of “aiming for” a particular staff engineer archetype reflects a misunderstanding of what the staff engineer role is. What is the defining attribute of the staff engineering role, then?&lt;/p&gt;
&lt;h3&gt;What is a staff engineer?&lt;/h3&gt;
&lt;p&gt; &lt;strong&gt;A staff engineer has to be useful to the company.&lt;/strong&gt; Of course, a senior or mid-level software engineer ought to be useful too, but all they &lt;em&gt;have&lt;/em&gt; to do is execute on the job in front of them. If they end up not providing value (maybe their project turns out to be unimportant, or they don’t get the support needed to succeed) that’s their manager’s problem, not theirs&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. In contrast, staff engineers are expected to deliver value regardless: to make the project work, or to find something else useful to do if the project truly can’t be salvaged.&lt;/p&gt;
&lt;p&gt;This is an unfair expectation. Often projects really do fail through no fault of your own, and sometimes it just isn’t possible to conjure useful work from thin air. That’s actually by design: &lt;strong&gt;the staff engineer role is supposed to be unfair&lt;/strong&gt;. Something many engineers don’t realize is that all senior management and executive leadership roles are unfair too, in the same way. That’s just part of the deal: executives are given power and great compensation, and in return they get thrown off the boat in bad weather&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. “Staff engineer” is the first engineering role where you are held largely responsible for outcomes you don’t control.&lt;/p&gt;
&lt;p&gt;Developing a “staff engineer mindset” thus has very little to do with the archetypes. Instead, you should:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Develop the habit of constantly asking yourself “is this useful to the company” (and answering correctly).&lt;/li&gt;
&lt;li&gt;Lose the habit of worrying about if you’re being treated “fairly”. Instead, try to think about your role in terms of incentives and consequences.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At the beginning, you won’t look much like any of the staff engineer archetypes. You will look like being a level-headed engineer who can be trusted to move projects forward with a minimum of fuss, and who can be re-tasked to different work without complaining. You’ll also look like someone who’s &lt;a href=&quot;/getting-the-main-thing-right/&quot;&gt;paying a lot of attention&lt;/a&gt; to what their manager’s actual priorities are, and who is thinking hard about how to fulfil those priorities (instead of their own goals).&lt;/p&gt;
&lt;p&gt;If you do this for long enough, you’ll eventually find yourself in one of the staff engineer archetypes. However, it probably won’t be the one you’re “aiming for”. The whole point of being a staff engineer is that you’re willing to fill whatever archetype the company needs at the time.&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;In his original staff engineer post, Larson is pretty clear that these archetypes are more of an anthropological description of some of the varied niches staff engineers fill, not a how-to guide for succeeding in the role&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;. At the time, the “staff engineer” role was fairly new and people were still trying to figure out what it even meant. Pointing out that there were a few very different ways to succeed in the role was a genuinely novel observation.&lt;/p&gt;
&lt;p&gt;The staff engineer archetypes are a good list of ways an engineer can be very useful to their organization - but only once they’ve built a deep relationship of trust with their organization’s leadership. Advice on how to succeed as a staff engineer should be about &lt;strong&gt;how to build that trust&lt;/strong&gt;, not about what to do once you have it.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;One caveat that is too pedantic for the body of the post: each tech company has a different structure of roles. Some don’t have the formal “staff” title at all, while others have “staff” as a fairly early rung on the ladder and a panoply of “senior staff”, “senior principal staff”, and so on roles above it. Like all “staff engineer” discourse, this post is not about the word itself but about the point in the engineering job ladder where progression becomes significantly more difficult.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Impressing your VP’s trusted lieutenants can actually be a good way to build trust in the medium-term, but you’d better hope you’ve built enough understanding of the system to do it right. If this process goes badly, your reputation in the org might be torched for years.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;In theory, at least. In practice it’s always better to be useful (again, in the sense of “delivering shareholder value”).&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;This is why very senior leadership sometimes seem so unempathetic towards engineering complaints: their work environment operates by very different rules and norms to that of most engineers. I keep meaning to try and write about this and never succeeding. This &lt;a href=&quot;https://github.com/sgoedecke/gatsby-blog/blob/master/content/drafts/_icebox/strategy-for-swes/index.md&quot;&gt;draft&lt;/a&gt; is the closest thing I have to a deeper exploration of the point.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;For the record, my how-to guides are &lt;a href=&quot;/staff-engineer-promotions/&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;/ratchet-effects/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Software engineering may no longer be a lifetime career]]></title><link>https://seangoedecke.com/software-engineering-may-no-longer-be-a-lifetime-career/</link><guid isPermaLink="false">https://seangoedecke.com/software-engineering-may-no-longer-be-a-lifetime-career/</guid><pubDate>Fri, 24 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I don’t think there’s compelling evidence that using AI makes you less intelligent overall&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. However, it seems pretty obvious that using AI to perform a task means you don’t learn as much &lt;em&gt;about performing that task&lt;/em&gt;. Some software engineers think this is a decisive argument against the use of AI. Their argument goes something like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Using AI means you don’t learn as much from your work&lt;/li&gt;
&lt;li&gt;AI-users thus become less effective engineers over time, as their technical skills atrophy&lt;/li&gt;
&lt;li&gt;Therefore we shouldn’t use AI in our work&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I don’t necessarily agree with (2). On the one hand, moving from assembly language to C made programmers less effective in some ways and more effective in others. On the other hand, the transition from writing code by hand to using AI is arguably a bigger shift, so who knows? But it doesn’t matter. Even if we grant that (2) is correct, &lt;strong&gt;this is still a bad argument&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Until around 2024, the best way to learn how to do software engineering was just &lt;em&gt;doing software engineering&lt;/em&gt;. That was really lucky for us! It meant that we could parlay a coding hobby into a lucrative career, and that the people who really liked the work would just get better and better over time. However, that was never an immutable fact of what software engineering is. It was just a fortunate coincidence.&lt;/p&gt;
&lt;p&gt;It would really suck for software engineers if using AI made us worse at our jobs in the long term (or even at general reasoning, though I still don’t believe that’s true). But &lt;strong&gt;we might still be obliged to use it, if it provided enough short-term benefits&lt;/strong&gt;, for the same reason that construction workers are obliged to lift heavy objects: because that’s what we’re being paid to do.&lt;/p&gt;
&lt;p&gt;If you work in construction, you need to lift and carry a series of heavy objects in order to be effective. But lifting heavy objects puts long-term wear on your back and joints, making you less effective over time. Construction workers don’t say that being a good construction worker means not lifting heavy objects. They say “too bad, that’s the job”&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;If AI does turn out to make you dumber, why can’t we just keep writing code by hand? You can! You just might not be able to earn a salary doing so, for the same reason that there aren’t many jobs out there for carpenters who refuse to use power tools. If the models are good enough, you will simply get outcompeted by engineers willing to trade their long-term cognitive ability for a short-term lucrative career&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;I hope that this isn’t true. It would be really unfortunate for software engineers. But it would be even more unfortunate if it were true and we refused to acknowledge it.&lt;/p&gt;
&lt;p&gt;The career of a pro athlete has a maximum lifespan of around fifteen years. You have the opportunity to make a lot of money until around your mid-thirties, at which point your body just can’t keep up with it. A common tragic figure today is the professional athlete who believes the show will go on forever and doesn’t prepare for the day they can’t do it anymore. We may be in the first generation of software engineers in the same position. If so, it’s probably a good idea to plan accordingly.&lt;/p&gt;
&lt;p&gt;edit: this post got a lot of comments on &lt;a href=&quot;https://news.ycombinator.com/item?id=48095550&quot;&gt;Hacker News&lt;/a&gt;. I was a bit disappointed to see many &lt;a href=&quot;https://news.ycombinator.com/item?id=48098278&quot;&gt;people&lt;/a&gt; (even &lt;a href=&quot;https://news.ycombinator.com/item?id=48099636&quot;&gt;Simon Willison&lt;/a&gt;, whose blog I read) respond with variations on the point that engineers can use AI to do more engineering work, even if they’re no longer writing code by hand. First, once you stop writing code by hand, I worry that your ability to understand the codebase in general &lt;a href=&quot;/you-cant-design-software-you-dont-work-on/&quot;&gt;will atrophy&lt;/a&gt;; second, the rate of change is so high that &lt;em&gt;nobody knows&lt;/em&gt; what will happen in a decade or two. I should have emphasized these points more.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;If you’re thinking “wait, there’s research on this”, you can likely read my take on the paper you’re thinking of &lt;a href=&quot;/impact-of-ai-study&quot;&gt;here&lt;/a&gt;, &lt;a href=&quot;/your-brain-on-chatgpt&quot;&gt;here&lt;/a&gt; or &lt;a href=&quot;/how-does-ai-impact-skill-formation&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Of course, construction workers do have layers of techniques for avoiding lifting heavy objects when possible (cranes, dollies, forklifts, and so on). There’s a natural analogy here to a set of techniques for staying mentally engaged that software engineers are yet to discover.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;In theory labor unions could slow this process down (and have forced employers to slow down this race-to-the-bottom in other industries). But I’m pessimistic about tech labor unions for all the usual reasons: the job is too highly-paid, you can work (and thus scab) from anywhere on the planet, and so on.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Blood in the datacenter]]></title><link>https://seangoedecke.com/luddites-and-ai-datacenters/</link><guid isPermaLink="false">https://seangoedecke.com/luddites-and-ai-datacenters/</guid><pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Is it time to start burning down datacenters?&lt;/p&gt;
&lt;p&gt;Some people think so. An Indianapolis city council member had his house recently &lt;a href=&quot;https://www.kbtx.com/2026/04/07/councilman-says-someone-fired-shots-his-home-left-no-data-centers-note/&quot;&gt;shot up&lt;/a&gt; for supporting datacenters, and Sam Altman’s home was &lt;a href=&quot;https://www.wired.com/story/sam-altman-home-attack-openai-san-franisco-office-threat/&quot;&gt;firebombed&lt;/a&gt; (and then &lt;a href=&quot;https://sfstandard.com/2026/04/12/sam-altman-s-home-targeted-second-attack/&quot;&gt;shot&lt;/a&gt;) shortly afterwards. People from all sides of the argument are &lt;a href=&quot;https://www.bloodinthemachine.com/p/why-the-ai-backlash-has-turned-violent&quot;&gt;sounding&lt;/a&gt; the &lt;a href=&quot;https://thesoufancenter.org/intelbrief-2025-november-5/&quot;&gt;alarm&lt;/a&gt; about imminent violence.&lt;/p&gt;
&lt;p&gt;The obvious historical comparison is &lt;a href=&quot;https://en.wikipedia.org/wiki/Luddite&quot;&gt;Luddism&lt;/a&gt;, the 19th-century phenomenon where English weavers and knitters destroyed the machines that were automating their work, and (in some cases) killed the machines’ owners. Anti-AI people are &lt;a href=&quot;https://www.theguardian.com/commentisfree/article/2024/jul/27/harm-ai-artificial-intelligence-backlash-human-labour&quot;&gt;reclaiming&lt;/a&gt; the term to describe themselves, and many of the leading lights of the anti-AI movement (like &lt;a href=&quot;https://www.bloodinthemachine.com/&quot;&gt;Brian Merchant&lt;/a&gt; or &lt;a href=&quot;https://www.versobooks.com/en-gb/products/688-breaking-things-at-work?srsltid=AfmBOorCgru7ReSwbVdt40nZmQaaeGfbpjLV7epM0fSv_V01QSY5b5TP&quot;&gt;Gavin Mueller&lt;/a&gt;) have written books arguing more or less that the Luddites were right, and we ought to follow their example in order to resist AI automation&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Like many people, I have heard a lot about Luddism and Luddites, but only in the context of it being a general term for someone who is anti-technology. I was interested in learning more about the actual historical movement: what kind of people participated, what it was, and what it accomplished. I read Merchant’s and Mueller’s books, plus others&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, to try and figure all of this out. Who were the actual, historical Luddites? What can we learn from them about burning down datacenters?&lt;/p&gt;
&lt;h3&gt;Who were the Luddites?&lt;/h3&gt;
&lt;p&gt;The Luddites were a decentralized movement of artisans in the 1810s who engaged in violent protest - smashing machines, threatening violence, and ultimately killing people - over the fact that their jobs were being automated away. They were not rich, but they were certainly not unskilled labor: these were people who had apprenticed for &lt;a href=&quot;https://archive.org/stream/extractsfromrec01dendgoog/extractsfromrec01dendgoog_djvu.txt?utm_source=chatgpt.com&quot;&gt;seven years&lt;/a&gt;. They were mostly working from home, producing cloth from raw material given to them by their employer, often with tools rented from that same employer. They were working short weeks (three days, per William Gardiner) at their own discretion.&lt;/p&gt;
&lt;p&gt;In the early 1800s, their skilled labor was becoming unnecessary. With the help of expensive machines, unskilled labor could now produce lower-quality cloth, so employers were beginning to pass over these artisans in favor of cheaper employees: &lt;a href=&quot;https://campus.murraystate.edu/academic/faculty/kBinfield/luddites/LudditeHistory.htm&quot;&gt;children, unapprenticed workers, and women&lt;/a&gt;&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. Combined with the bad economic position of England at the time (at war with France, and thus deliberately cutting off much European trade), times were beginning to be very tough indeed. Starvation was a real threat.&lt;/p&gt;
&lt;h3&gt;What did they do?&lt;/h3&gt;
&lt;p&gt;Cloth artisans were groups of capable men who were used to getting their own way, knew each other very well, and were broadly respected in their communities. It was thus a natural response for them to organize into what was effectively a militant union. The Luddites would send anonymous threatening letters to their old (or current) bosses, warning them to stop using their machines. If they didn’t comply, they would raid the workshop or factory, smashing the machines up.&lt;/p&gt;
&lt;p&gt;They typically did not harm people, though they certainly delivered threats of bodily harm or even murder, and the raids were violent enough (e.g. shooting through windows) to have risked accidental deaths. In at least two instances where a factory owner was seen as unusually cruel, the Luddites did attempt assassinations: one unsuccessful, and one successful one that eventually prompted a crackdown that ended the movement for good.&lt;/p&gt;
&lt;p&gt;Luddism was fully decentralized. Different communities could and did decide to engage in machine-raiding independently, particularly when news spread of the tactic succeeding. Although each community had its own influential men, there was never a single “leader of Luddism”. King Ludd himself was a folk-tale figure. This made it an absolute nightmare for the British government to try and suppress them: putting down one Luddist group did nothing to prevent other groups from continuing to operate.&lt;/p&gt;
&lt;h3&gt;All the king’s spies&lt;/h3&gt;
&lt;p&gt;I was surprised by how &lt;em&gt;difficult&lt;/em&gt; it was for the government to get a hold of any of the local Luddist ringleaders. The government was willing to offer huge rewards to informers: at one point up to 40x the yearly wage. However, there were no takers for several years. Armies of spies were recruited and tasked with infiltrating Luddist groups, with absolutely no success.&lt;/p&gt;
&lt;p&gt;Why was it so hard? Firstly, because the working class was so overwhelmingly pro-Luddist. People universally blamed the economic situation on the government and the factory owners (rightfully so, since the government had chosen to go to war and the factory owners had chosen to embrace automation). Secondly, the communities in question were so insular and tightly-knit that informers would have to rat on their friends and relatives. The handful of people who did eventually inform lived out the rest of their lives as pariahs.&lt;/p&gt;
&lt;p&gt;Because each group was so insular, any spies trying to infiltrate the movement would have been complete strangers to the community, and would thus have a very hard time gaining the trust of a group of men who had known each other for their whole lives. The spies that did exist were restricted to the occasional inter-group Luddist meetings, where people didn’t all know each other so closely. But it’s unclear how important those meetings were, since Luddist groups didn’t need to coordinate to achieve their goals. According to Merchant, the spies spent much of their time embellishing tales of an imminent revolution to encourage their employers to keep the money flowing.&lt;/p&gt;
&lt;h3&gt;The crackdown&lt;/h3&gt;
&lt;p&gt;In the absence of reliable information, the British government was forced to use force. And they did, sending 12,000 troops&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; into the northern counties. This served mainly as an intimidation tactic, since there was no standing Luddite army to fight, and the soldiers spent most of their time marching back and forth or being abused by the townspeople.&lt;/p&gt;
&lt;p&gt;More successful was the imposition of a full police state in Yorkshire, under the magistrate Joseph Radcliffe, who was empowered to randomly grab people off the street and interrogate them for days. That pressure eventually convinced a handful of people to give up their local Luddist organizers, who were tried and inevitably hanged. Their deaths (and the ensuing climate of fear) ended the high-water mark of Luddist activity. Even then, Luddist raids continued on and off for &lt;em&gt;six more years&lt;/em&gt; before petering out.&lt;/p&gt;
&lt;h3&gt;Did the Luddites succeed?&lt;/h3&gt;
&lt;p&gt;This is a tricky question. In one sense the answer is obviously no: the movement was crushed, many of their leaders were executed, the textile industry continued to be automated, and today there are no longer thousands of jobs for skilled British weavers, knitters, spinners and dyers. The pro-automation side won.&lt;/p&gt;
&lt;p&gt;However, they did achieve a number of short-lived victories. Their early threats often succeeded in preventing the building of a factory in a particular location, or in delaying the adoption of industrial machinery in a particular shop by years. In one case, hosiers that had been spooked by Luddite activity gave out pre-emptive bonuses to their workers to discourage them from smashing up their machines (which were indeed not smashed).&lt;/p&gt;
&lt;p&gt;The Luddites also scared the hell out of the British government, who (encouraged by their over-eager spies) thought they might have a genuine revolution on their hands. While they didn’t get many legal concessions at the time, the specter of Luddism must have loomed over the labor reform movement of the 1800s, which saw the first anti-child-labor laws and the beginnings of independent inspection of factories.&lt;/p&gt;
&lt;p&gt;Finally, every book I read argued that the Luddism movement may have created the first idea of a “working class”, by unifying many previously-independent groups of workers against a common enemy. Seen this way, the “political arm”&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt; of Luddism can arguably claim partial credit for every labor victory since the 1800s (though the ringleaders were still hanged and the weavers did still lose their jobs).&lt;/p&gt;
&lt;h3&gt;The Luddist approach in a nutshell&lt;/h3&gt;
&lt;p&gt;We can now describe the “Luddist approach” to fighting technological change:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Find a few conspirators in your existing community who agree with your political project (but don’t join a broader organization, since that leaves you vulnerable)&lt;/li&gt;
&lt;li&gt;Make public anonymous demands in support of your specific goals, backed up by threats of violence, signed by a fictional character that’s easy for other groups to appropriate&lt;/li&gt;
&lt;li&gt;If your threats are ignored, attack the physical machines in the dead of night, destroying them and threatening (but not killing) any guards&lt;/li&gt;
&lt;li&gt;Hope your example inspires many more people to independently do (1)-(3) themselves&lt;/li&gt;
&lt;li&gt;Keep raiding, optionally escalating to assassination of some of the bosses, until you bait a totalitarian crackdown from the government &lt;/li&gt;
&lt;li&gt;Eventually get arrested and executed, to great public dismay&lt;/li&gt;
&lt;li&gt;Twenty years later, your example inspires the first national trade unions&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that starting or joining a national movement is &lt;em&gt;not&lt;/em&gt; the Luddist approach. Staying almost entirely isolated in small cells helped the Luddists avoid government spies and made them impossible to root out without enforcing a police state. Note also that you need a &lt;em&gt;lot&lt;/em&gt; of public support for this to work: so that you get a lot of copycat groups without having to explicitly organize them, and so that your property destruction and murder is taken sympathetically instead of getting you immediately reported and arrested. &lt;/p&gt;
&lt;h3&gt;Why Luddism is not a good model for the anti-AI movement&lt;/h3&gt;
&lt;p&gt;There are many reasons why this doesn’t map onto the current anti-AI movement. First, Luddism grew from a homogeneous group of high-status workers whose jobs almost vanished overnight, not a broad group of people whose jobs are getting slightly worse because of AI (like the gig-economy workers Merchant endlessly references). That meant that Luddites had &lt;em&gt;really specific&lt;/em&gt; asks: higher wages for piecework, a phased introduction of specific textiles machinery, and so on. They were not generally demanding that the machines all be immediately destroyed&lt;sup id=&quot;fnref-6&quot;&gt;&lt;a href=&quot;#fn-6&quot; class=&quot;footnote-ref&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Second, Luddism was very local. A pre-existing group of artisans in a particular town would gather in that town - either at work or an inn, say - and decide to petition or raid the businesses in that town that were harming their livelihoods. AI concerns are not like this. It isn’t businesses in Chicago or Tokyo that are making decisions that imperil Chicago’s or Tokyo’s jobs, it’s businesses in San Francisco. Unlike the Luddists, anti-AI activists can’t naturally organize with people they already know to take direct action where they already live.&lt;/p&gt;
&lt;p&gt;Third, Luddist &lt;em&gt;victory&lt;/em&gt; could also be local. If you successfully lobby your local cloth business to not use a weaving machine, you have secured your job at that business for a while. But if you successfully lobby your town (or even your country!) to not build a datacenter, it doesn’t meaningfully improve your local position, since your job can be as easily replaced by a datacenter on the other side of the planet.&lt;/p&gt;
&lt;h3&gt;A total failure of leadership&lt;/h3&gt;
&lt;p&gt;Reading through the history of the Luddites from a modern perspective, I was struck by the near-total absence of &lt;em&gt;good government&lt;/em&gt;. The artisans were left to work out their grievances with their bosses more or less by themselves, with no formal channels for complaint or any attempt at mediation. When the government did intervene - in response to near-universal unrest in &lt;em&gt;half of the country&lt;/em&gt; - they did this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Make machine-breaking and oath-taking capital crimes&lt;/li&gt;
&lt;li&gt;Dump thousands of soldiers more or less at random into the area, with no plan to guard factories or do anything beyond just hang around in case a revolution broke out&lt;/li&gt;
&lt;li&gt;Empower a single magistrate to arrest and interrogate whoever he wanted in order to root out the conspiracy&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I suppose it worked, in the sense that it eventually succeeded in stopping the Luddist raids. But I can’t help but think that even a token gesture of compromise (say, requiring employers to make their wages public, or restricting the most cheap-and-nasty factory-made textile products) would have gone a long way towards calming things down. This almost actually happened! The 1812 Framework Knitters’ Bill, which had these provisions in it, passed the House of Commons but was shot down in the House of Lords.&lt;/p&gt;
&lt;p&gt;Why did the government fail to even make a token attempt at compromise? Before the industrial revolution, I wonder if the workers and bosses of the English textiles industry were genuinely able to often just work out their problems together, so the government never really needed to do large-scale mediation. When that changed - when automation first made it possible for the bosses to durably “win” - government took a long time to realize, so there were some unpleasant decades of disempowered workers trying to bully factory-owners (via riots and death threats), and factory-owners trying to brutalize workers (via direct violence and automation).&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;I can see why modern “Luddites” like Merchant and Mueller - who are genuinely anti-technology - talk so much about the legacy of original Luddites. Luddism was a grassroots organization which notched up some real short-term wins, enjoyed near-total support among the public, and didn’t seem to be troubled by infighting at all&lt;sup id=&quot;fnref-7&quot;&gt;&lt;a href=&quot;#fn-7&quot; class=&quot;footnote-ref&quot;&gt;7&lt;/a&gt;&lt;/sup&gt;. If you’re an anti-AI campaigner, I bet all of that sounds great&lt;sup id=&quot;fnref-8&quot;&gt;&lt;a href=&quot;#fn-8&quot; class=&quot;footnote-ref&quot;&gt;8&lt;/a&gt;&lt;/sup&gt;. But I’m not convinced that the neo-Luddites really are the inheritors of Luddism. A load-bearing feature of Luddism is that it was &lt;em&gt;local&lt;/em&gt;: it didn’t have manifestos, or leaders, or factions, or even much explicit ideology beyond the artisans’ immediate practical concerns. These were local men striking back against the local factories harming their local jobs. That simply isn’t the case with AI, where a datacenter in China can take my job in Australia.&lt;/p&gt;
&lt;p&gt;edit: A reader pointed me at &lt;a href=&quot;https://www.verysane.ai/p/against-the-luddites&quot;&gt;&lt;em&gt;Against the Luddites&lt;/em&gt;&lt;/a&gt;, which argues that (a) the Luddites were an elite (ish) movement, (b) they explicitly and deliberately excluded women, and (c) their leftist theory bonafides are questionable. I don’t really care about (c), agree with (b), and mostly agree with (a), with the caveat that they really did have a broad base of non-elite support.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I got linked &lt;a href=&quot;https://tante.cc/2026/04/21/ai-as-a-fascist-artifact/&quot;&gt;this article&lt;/a&gt; calling AI a “fascist artifact” (on a blog called “Breaking Frames”, a clear reference to Luddism) while I was writing this blog post.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I really enjoyed Merchant’s book and did not enjoy Mueller’s (which I found to be 10% about the Luddites and 90% about interminable intra-Marxist ideological arguments). I also read &lt;a href=&quot;https://www.amazon.com.au/Luddites-Protested-Machinery-Industrial-Revolution/dp/171936186X&quot;&gt;The Luddites&lt;/a&gt;, which was effectively a dry summary of the ground Merchant covers, a bunch of other essays, and went back and forth with ChatGPT and Claude on some of the key questions.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Merchant (around page 134) attempts to characterize Luddism as a pro-feminist movement, citing some examples of women helping organize raids, but later on even he (page 162) quotes a representative of the Irish weaver’s guild effectively saying “we don’t have your English problems of women working in the industry”. In general it’s a bit frustrating that the popular books on Luddism are all fairly uncritically pro-Luddist (though not surprising, I suppose). Merchant doesn’t touch at all on the Luddist &lt;a href=&quot;https://ludditebicentenary.blogspot.com/2011/12/22nd-december-1811-william-milnes.html&quot;&gt;practice&lt;/a&gt; of going around to knitting-shops with women and “discharging them from working”.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Sometimes this is described as more troops that were sent to fight Napoleon (even by Merchant himself on page 89), but that &lt;a href=&quot;https://medium.com/@antonhowes/were-more-troops-sent-to-quash-the-luddites-than-to-fight-napoleon-233c802c216d&quot;&gt;isn’t right&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;In quotes because it was not an official Luddist group (there were none), just people who were trying to stop the violence through lobbying and legislation.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;Otherwise why would any boss agree, instead of just waiting for the Luddites to do it themselves?&lt;/p&gt;
&lt;a href=&quot;#fnref-6&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-7&quot;&gt;
&lt;p&gt;As far as I can tell this is true: the Luddists basically had no internal conflict. I think this is because each individual cell knew each other well already, and so handled their disagreements privately (instead of by writing pamphlets), and disagreements between cells didn’t matter that much because they had no need to coordinate.&lt;/p&gt;
&lt;a href=&quot;#fnref-7&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-8&quot;&gt;
&lt;p&gt;It beats the hell of the other popular reference, Dune’s &lt;a href=&quot;https://dune.fandom.com/wiki/Butlerian_Jihad&quot;&gt;Butlerian Jihad&lt;/a&gt;, which was two generations of brutal violence followed by the reimposition of the feudal system. (Although, at least the Butlerian Jihad &lt;em&gt;succeeded&lt;/em&gt;…)&lt;/p&gt;
&lt;a href=&quot;#fnref-8&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Many anti-AI arguments are conservative arguments]]></title><link>https://seangoedecke.com/many-anti-ai-arguments-are-conservative/</link><guid isPermaLink="false">https://seangoedecke.com/many-anti-ai-arguments-are-conservative/</guid><pubDate>Sat, 18 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Most anti-AI rhetoric is left-wing coded. Popular criticisms of AI describe it as a tool of &lt;a href=&quot;https://www.theatlantic.com/podcasts/archive/2025/09/ai-and-the-fight-between-democracy-and-autocracy/684095/&quot;&gt;techno-fascism&lt;/a&gt;, or appeal to predominantly left-wing concerns like &lt;a href=&quot;https://www.technologyreview.com/2025/05/20/1116327/ai-energy-usage-climate-footprint-big-tech/&quot;&gt;carbon emissions&lt;/a&gt;, &lt;a href=&quot;https://www.theguardian.com/commentisfree/2025/sep/10/tech-companies-are-stealing-our-books-music-and-films-for-ai-its-brazen-theft-and-must-be-stopped&quot;&gt;democracy&lt;/a&gt;, or &lt;a href=&quot;https://aphyr.com/posts/420-the-future-of-everything-is-lies-i-guess-where-do-we-go-from-here&quot;&gt;police brutality&lt;/a&gt;. Anti-AI &lt;em&gt;sentiment&lt;/em&gt; is &lt;a href=&quot;https://www.pewresearch.org/short-reads/2025/11/06/republicans-democrats-now-equally-concerned-about-ai-in-daily-life-but-views-on-regulation-differ/&quot;&gt;surprisingly bipartisan&lt;/a&gt;, but the big anti-AI institutions are &lt;a href=&quot;https://www.equaltimes.org/hollywood-s-stand-against-ai-a?lang=en&quot;&gt;labor&lt;/a&gt; &lt;a href=&quot;https://news.bloomberglaw.com/daily-labor-report/punching-in-union-leaders-gear-up-to-tackle-ai-in-future-talks&quot;&gt;unions&lt;/a&gt; and the &lt;a href=&quot;https://www.sanders.senate.gov/press-releases/news-sanders-ocasio-cortez-announce-ai-data-center-moratorium-act/&quot;&gt;progressive wing&lt;/a&gt; of the Democrats.&lt;/p&gt;
&lt;p&gt;This has always seemed weird to me, because the contents of most anti-AI arguments are actually right-wing coded. They’re not necessarily intrinsically right-wing, but they’re the kind of arguments that historically have been made by conservatives, not liberals or leftists. Here are some examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Many AI critics complain that AI &lt;a href=&quot;https://www.theguardian.com/commentisfree/2025/sep/10/tech-companies-are-stealing-our-books-music-and-films-for-ai-its-brazen-theft-and-must-be-stopped&quot;&gt;steals copyrighted content&lt;/a&gt;, but prior to 2023, leftists have been &lt;a href=&quot;https://www.reddit.com/r/Socialism_101/comments/1664yyd/i_see_many_leftists_hate_copyright_can_anyone/&quot;&gt;largely&lt;/a&gt; &lt;a href=&quot;https://overland.org.au/2017/08/how-to-think-left-on-copyright/&quot;&gt;anti-intellectual-property&lt;/a&gt; on &lt;a href=&quot;https://jacobin.com/2013/09/property-and-theft&quot;&gt;principle&lt;/a&gt; (either because they’re anti-&lt;em&gt;property&lt;/em&gt;, or because they characterize copyright as benefiting huge media corporations and patent trolls).&lt;/li&gt;
&lt;li&gt;A popular anti-AI-art sentiment is that it’s &lt;a href=&quot;https://www.theguardian.com/commentisfree/2025/may/20/ai-art-concerns-originality-connection&quot;&gt;corrosive to the human spirit&lt;/a&gt; to consume AI slop: in other words, art just inherently ought to be generated by humans, and using AI thus damages some part of our intangible human soul. Whether you like this argument or not, it’s structurally similar to a whole slate of classic arguments-from-intuition for conservative positions like anti-abortion or anti-homosexuality.&lt;/li&gt;
&lt;li&gt;Weird new technological art has traditionally been championed by the left-wing and dismissed by the right-wing (as &lt;a href=&quot;https://medium.com/@elarson39/photography-was-historically-considered-arts-most-mortal-enemy-is-ai-69a2dc2f43ef&quot;&gt;inhuman&lt;/a&gt;, &lt;a href=&quot;https://en.wikiversity.org/wiki/History_of_Photography_as_Fine_Art#:~:text=The%20simplest%20argument%2C%20supported%20by%20many%20painters%20and,mill%20than%20with%20handmade%20work%20created%20by%20inspiration&quot;&gt;cheap&lt;/a&gt;, or &lt;a href=&quot;https://encyclopedia.ushmm.org/content/en/article/degenerate-art-1&quot;&gt;degenerate&lt;/a&gt;). But when it comes to AI art, it’s the left-wing making these arguments, and others (not necessarily right-wingers) arguing that AI art can also be a medium of human artistic expression.&lt;/li&gt;
&lt;li&gt;One main worry about AI is that it’s going to take over a lot of jobs. This is a compelling argument! But the left-wing has recently been famously unsympathetic to this same argument around fossil-fuel energy jobs like &lt;a href=&quot;https://www.cam.ac.uk/research/news/former-coal-mining-communities-have-less-faith-in-politics-than-other-left-behind-areas&quot;&gt;coal mining&lt;/a&gt;, to the point where Biden infamously advised a group of miners in New Hampshire to &lt;a href=&quot;https://thehill.com/changing-america/enrichment/education/476391-biden-tells-coal-miners-to-learn-to-code/&quot;&gt;learn to code&lt;/a&gt;&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. Halting technological progress to preserve jobs is quite literally a “conservative” position.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On top of all that&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, frontier AI models themselves are quite left-wing. Notwithstanding some real cases of data bias (most infamously Google’s image model &lt;a href=&quot;https://www.bbc.com/news/technology-33347866&quot;&gt;miscategorizing&lt;/a&gt; dark-skinned humans as “gorillas”), the models reliably &lt;a href=&quot;https://news.stanford.edu/stories/2025/05/ai-models-llms-chatgpt-claude-gemini-partisan-bias-research-study&quot;&gt;espouse&lt;/a&gt; &lt;a href=&quot;https://www.brookings.edu/articles/the-politics-of-ai-chatgpt-and-political-bias/&quot;&gt;left-wing&lt;/a&gt; &lt;a href=&quot;https://www.cato.org/commentary/how-did-ai-get-so-biased-favor-left&quot;&gt;positions&lt;/a&gt;. Even Elon Musk’s deliberate attempt to create a right-wing AI in Grok has had &lt;a href=&quot;https://www.seangoedecke.com/ai-personality-space/&quot;&gt;mixed success&lt;/a&gt;. In 2006, Stephen Colbert coined the phrase “reality has a left-wing bias”. If the left-wing were more sympathetic to AI, I think they would be using this as a pro-left argument&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;So what happened? A year ago I wrote &lt;a href=&quot;https://www.seangoedecke.com/is-ai-wrong/&quot;&gt;&lt;em&gt;Is using AI wrong? A review of six popular anti-AI arguments&lt;/em&gt;&lt;/a&gt;. In that post I blame the hard right-wing turn many big tech CEOs made in 2024. That was around the same time that LLMs was emerging in the public consciousness with ChatGPT, so it made sense that AI got tagged as right-wing: after all, the billionaires on TV and Twitter talking about how AI were going to change the world were all the same people who’d just gone all-in on Donald Trump. I still think this is a pretty good explanation - just unfortunate timing - but there are definitely other factors at play.&lt;/p&gt;
&lt;p&gt;One obvious factor is the hangover from the pro-crypto mania of 2021 and 2022, where many of the same tech-obsessed folks also posted ugly art and talked about how their technology would change the world forever. Few of these predictions came true (though cryptocurrency has indeed changed the world forever), and it’s understandable that many people viewed AI as a natural continuation of this movement.&lt;/p&gt;
&lt;p&gt;On top of that, Donald Trump himself has come out strongly pro-AI, both in terms of &lt;a href=&quot;https://www.ai.gov/&quot;&gt;policy&lt;/a&gt; and in terms of actually &lt;a href=&quot;https://www.nytimes.com/2026/04/13/us/politics/trump-jesus-picture-pope-leo.html&quot;&gt;posting&lt;/a&gt; AI art himself. This naturally creates a backlash where anti-Trump people are primed to be even more anti-AI&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. Here are some more reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI has real environmental impact (though this is often wildly overstated, as I say &lt;a href=&quot;https://www.seangoedecke.com/is-ai-wrong/&quot;&gt;here&lt;/a&gt;), and the right-wing is politically committed to downplaying or denying anthropogenic environmental impacts in general.&lt;/li&gt;
&lt;li&gt;When times are tough, it’s easy to blame the hot new thing that everyone is talking about. Because the right-wing is currently ascendant in the US, left-wingers are more inclined to talk about how tough times are.&lt;/li&gt;
&lt;li&gt;The left-wing is over-represented in the kind of “computer jobs” that are under direct threat from AI.&lt;/li&gt;
&lt;li&gt;Being pro-Europe has always been left-wing coded, and Europe has been noticeably slower and more sceptical about AI than the USA.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let me finally put my cards on the table. I would describe myself as on the left wing, and I’m broadly agnostic about the impact of AI. Like the boring fence-sitter I am, I think it will have a mix of positive and negative effects. In general, I’m unconvinced by the pro-copyright and human-soul-related anti-AI arguments, or by the idea that AI is inherently right-wing, but I’m troubled by the environmental impact and the impact on jobs (which in my view are more classically left-wing positions).&lt;/p&gt;
&lt;p&gt;Still, I’m curious what will happen when the left-wing flavor of anti-AI rhetoric disappears, which I think it will (as I said at the start, anti-AI sentiment is actually &lt;a href=&quot;https://www.pewresearch.org/short-reads/2025/11/06/republicans-democrats-now-equally-concerned-about-ai-in-daily-life-but-views-on-regulation-differ/&quot;&gt;pretty bipartisan&lt;/a&gt;). When people start making explicitly right-wing anti-AI arguments, will that cause the left-wing to move a little bit towards supporting AI? Or will right-wing institutions continue to explicitly support AI, allowing anti-AI sentiment to become a wedge issue that the left-wing can exploit to pry away voters? In any case, I don’t think the current state of affairs is particularly stable. In many ways, the dominant anti-AI arguments would fit better in a conservative worldview than in the worldview of their liberal proponents.&lt;/p&gt;
&lt;p&gt;edit: This got lots of comments on &lt;a href=&quot;https://www.reddit.com/r/aiwars/comments/1sp3eki/many_antiai_arguments_are_conservative_arguments/&quot;&gt;various&lt;/a&gt; &lt;a href=&quot;https://www.reddit.com/r/aiwars/comments/1sp9hqk/many_antiai_arguments_are_conservative_arguments/&quot;&gt;Reddit&lt;/a&gt; &lt;a href=&quot;https://www.reddit.com/r/LeftistsForAI/comments/1sp3cxe/many_antiai_arguments_are_conservative_arguments/&quot;&gt;posts&lt;/a&gt;, and was briefly discussed on &lt;a href=&quot;https://news.ycombinator.com/item?id=47813141&quot;&gt;Hacker News&lt;/a&gt;. I don’t think the comments are very good overall, but several comments correctly point out that AI is (like all automation) an anti-labor technology, which means that a labor-focused left will naturally be anti AI. I think my post is consistent with that.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I don’t think any did, which is probably for the best - they would have only had a couple of years to break into the industry before hiring collapsed in 2023.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Another point that isn’t quite mainstream enough but that I still want to mention: AI critics often argue that cavalier deployment of AI means that people might take &lt;a href=&quot;https://www.bbc.com/news/articles/cpd8l088x2xo&quot;&gt;dangerous medical advice&lt;/a&gt; instead of simply trusting their doctor. But anyone who’s been close to a person with chronic illness knows that “just trust your doctor” is kind of right-wing-coded itself, and that the left-wing position is &lt;a href=&quot;https://www.painnewsnetwork.org/stories/2026/4/10/doctor-faces-backlash-after-tweet-claims-four-chronic-illnesses-are-overdiagnosed&quot;&gt;very&lt;/a&gt; &lt;a href=&quot;https://yorkspace.library.yorku.ca/server/api/core/bitstreams/4ac9d968-e9b0-491b-888a-d4ed5aeb1ac3/content&quot;&gt;sympathetic&lt;/a&gt; to patients who don’t or can’t. In a parallel universe, I can imagine the left-wing arguing that patients need AI to avoid the mistakes of their doctors, not the other way around.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Is it a good argument? I don’t know, actually. The easy counter is that the LLMs are just mirroring the biases in their training data. But you could argue in response that superintelligence is also latent in the training data, and that hill-climbing towards superintelligence also picks up the associated political positions (which just so happen to be left-wing).&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;I am no fan of Donald Trump, but it doesn’t follow that everything he supports is bad (e.g. the &lt;a href=&quot;https://en.wikipedia.org/wiki/First_Step_Act&quot;&gt;First Step Act&lt;/a&gt;).&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Programming (with AI agents) as theory building]]></title><link>https://seangoedecke.com/programming-with-ai-agents-as-theory-building/</link><guid isPermaLink="false">https://seangoedecke.com/programming-with-ai-agents-as-theory-building/</guid><pubDate>Fri, 03 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Back in 1985, computer scientist Peter Naur wrote &lt;a href=&quot;https://pages.cs.wisc.edu/~remzi/Naur.pdf&quot;&gt;“Programming as Theory Building”&lt;/a&gt;. According to Naur - and I agree with him - the core output of software engineers is not the program itself, but the &lt;strong&gt;theory of how the program works&lt;/strong&gt;. In other words, the knowledge inside the engineer’s mind is the primary artifact of engineering work, and the actual software is merely a by-product of that.&lt;/p&gt;
&lt;p&gt;This sounds weird, but it’s surprisingly intuitive. Every working programmer knows that you cannot make a change to a program simply by having the code. You first need to read through the code carefully enough to build up a mental model (what Naur calls a “theory”) of what it’s supposed to do and how it does it. Then you make the desired change to your mental model, and only after that can you begin modifying the code.&lt;/p&gt;
&lt;p&gt;Many people&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; think that this is why LLMs are not good tools for software engineering: because using them means that engineers can skip building Naur theories of the system, and because LLMs themselves are incapable of developing a Naur theory themselves. Let’s take those one at a time.&lt;/p&gt;
&lt;h3&gt;Do LLMs let you skip theory-building?&lt;/h3&gt;
&lt;p&gt;Do AI agents let some engineers avoid building detailed mental models of the systems they work on? Of course! As an extreme example, someone could simply punt every task to the latest GPT or Claude model and build no mental model at all&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. But even a conscientious developer who uses AI tools will necessarily build a less detailed mental model than someone who does it entirely by hand.&lt;/p&gt;
&lt;p&gt;This is well-attested by the nascent &lt;a href=&quot;/how-does-ai-impact-skill-formation&quot;&gt;literature&lt;/a&gt; on how AI use impacts learning. And it also just makes obvious sense. The whole point of using AI tools is to offload some of the cognitive effort: to be able to just sketch out some of the fine detail in your mental model, because you’re confident that the AI tool can handle it. For instance, you might have a good grasp on what the broad components do in your service, and how the data flows between them, but not the specific detail of how some sub-component is implemented (because you only reviewed that code, instead of writing it).&lt;/p&gt;
&lt;p&gt;Isn’t this really bad? If you start dropping the implementation details, aren’t you admitting that you don’t really know how your system works? After all, a theory that isn’t detailed enough to tell you what code would need to be written for a particular change is a useless theory, right? I don’t think so.&lt;/p&gt;
&lt;p&gt;First, it’s simply a fact that &lt;strong&gt;every mental model glosses over some fine details&lt;/strong&gt;. Before LLMs were a thing, it was common to talk about the “breadth of your stack”: roughly, the level of abstraction that your technical mental model could operate at. You might understand every line of code in the system, but what about dependencies? What about the world of Linux abstractions - processes, threads, sockets, syscalls, ports, and buffers? What about the assembly operations that are ultimately performed by your code? It simply can’t be true that giving up &lt;em&gt;any&lt;/em&gt; amount of fine detail is a disaster.&lt;/p&gt;
&lt;p&gt;Second, &lt;strong&gt;coding with LLMs teaches you first-hand how important your mental model is&lt;/strong&gt;. I do a lot of LLM-assisted work, and in general it looks like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;I spin off two or three parallel agents to try and answer some question or implement some code&lt;/li&gt;
&lt;li&gt;As each agent finishes (or I glance over at what it’s doing), I scan its work and make a snap judgement about whether it’s accurately reflecting my mental model of the overall system&lt;/li&gt;
&lt;li&gt;When it doesn’t - which is about 80% of the time - I either kill the process or I write a quick “no, you didn’t account for X” message&lt;/li&gt;
&lt;li&gt;I carefully review the 20% of plausible responses against my mental model, do my own poking around the codebase and manual testing/tweaking, and about half of that code will become a PR&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that &lt;strong&gt;only 10% of agent output is actually making its way into &lt;em&gt;my&lt;/em&gt; output&lt;/strong&gt;. Almost my entire time is spent looking at some piece of agent-generated code or text and trying to figure out whether it fits into my theory of the system. That theory is necessarily a bit less detailed than when I was writing every line of code by hand. But it’s still my theory! If it weren’t, I’d be accepting most of what the agent produced instead of rejecting almost all of it.&lt;/p&gt;
&lt;h3&gt;Can LLMs build Naur theories?&lt;/h3&gt;
&lt;p&gt;Can AI agents build their own theories of the system? If not, this would be a pretty good reason not to use them, or to think that any supposed good outcomes are illusory.&lt;/p&gt;
&lt;p&gt;The first reason to think they can is that LLMs clearly do make working changes to codebases. If you think that a theory is &lt;em&gt;essential&lt;/em&gt; to make working changes (which is at least plausible), doesn’t that prove that LLMs can build Naur theories? Well, maybe. They could be pattern-matching to Naur theories in the training data that are close enough to sort of work, or they could be able to build &lt;em&gt;local&lt;/em&gt; theories which are good enough (as long as you don’t layer too many of them on top of each other).&lt;/p&gt;
&lt;p&gt;The second reason to think they can is that &lt;strong&gt;you can see them doing it&lt;/strong&gt;. If you read an agent’s logs, they’re full of explicit theory-building&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;: making hypotheses about how the system works, trying to confirm or disprove them, adjusting the hypothesis, and repeating. When I’m trying to debug something, I’m usually racing against one or more AI agents, and &lt;em&gt;sometimes they win&lt;/em&gt;. I refuse to believe that you can debug a million-line codebase without theory-building.&lt;/p&gt;
&lt;p&gt;I think it’s an open question if AI agents can build working theories of &lt;em&gt;any&lt;/em&gt; codebase. In my experience, they do a good job with normal-ish applications like CRUD servers, proxies, and other kinds of program that are well-represented in the training data. If you’re doing something truly weird, I can believe they might struggle (though even then it seems &lt;a href=&quot;https://x.com/VictorTaelin/status/2036313801570562418?s=20&quot;&gt;at least possible&lt;/a&gt;).&lt;/p&gt;
&lt;h3&gt;Retaining theories is better than building them&lt;/h3&gt;
&lt;p&gt;Regardless, one big problem with AI agents is that &lt;strong&gt;they can’t &lt;em&gt;retain&lt;/em&gt; theories of the codebase&lt;/strong&gt;. They have to build their theory from scratch every time. Of course, documentation can help a little with this, but in Naur’s words, it’s “strictly impossible” to fully capture a theory in documentation. In fact, Naur thought that if all the humans who built a piece of software left, it was unwise to try and construct a theory of the software &lt;em&gt;even from the code itself&lt;/em&gt;, and that you should simply rewrite the program from scratch. I think this is overstating it a bit, at least for large programs, but I agree that it’s a difficult task. AI agents are permanently in this unfortunate position: forced to construct a theory of the software from scratch, every single time they’re spun up.&lt;/p&gt;
&lt;p&gt;Given that, it’s kind of a minor miracle that AI agents are as effective as they are. The next big innovation in AI coding agents will probably be some way of allowing agents to build more long-term theories of the codebase: either by allowing them to modify their own weights&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, or simply supporting contexts long enough so that you can make weeks worth of changes in the same agent run, or some other idea I haven’t thought of.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;&lt;a href=&quot;https://gist.github.com/MostAwesomeDude/560185c24f959f6fec229739cb5a6735#no-like-analysis-of-the-vibecoded-outputs&quot;&gt;This&lt;/a&gt; is the most recent (and well-written) example I’ve seen, but it’s a common view.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I have heard of people working like this. Ironically, I think it’s a good thing. The kind of engineer who does this is likely to be &lt;em&gt;improved&lt;/em&gt; by becoming a thin wrapper around a frontier LLM (though it’s not great for their career prospects).&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I think some people would say here that AI agents simply can’t build any theories at all, because theories are a human-mind thing. These are the people who say that AIs can’t believe anything, or think, or have personalities, and so on. I have some sympathy for this as a metaphysical position, but it just seems obviously wrong as a practical view. If I can see GPT-5.4 testing hypotheses and correctly answering questions about the system, I don’t really care if it’s coming from a “real” theory or some synthetic equivalent.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;This is the dream of &lt;a href=&quot;/continuous-learning&quot;&gt;continuous learning&lt;/a&gt;: if what the AI agent learns about the codebase can be somehow encoded in its weights, it can take days or weeks to build its theory instead of mere minutes.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Working on products people hate]]></title><link>https://seangoedecke.com/working-on-products-people-hate/</link><guid isPermaLink="false">https://seangoedecke.com/working-on-products-people-hate/</guid><pubDate>Fri, 27 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I’ve worked on a lot of unpopular products.&lt;/p&gt;
&lt;p&gt;At Zendesk I built large parts of an app marketplace that was too useful to get rid of but never polished enough to be loved. Now I work on GitHub Copilot, which many people think is crap&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. In between, I had some brief periods where I worked on products that were well-loved. For instance, I fixed a bug where popular Gists would time out once they got more than thirty comments, and I had a hand in making it possible to write LaTeX mathematics &lt;a href=&quot;https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/writing-mathematical-expressions&quot;&gt;directly&lt;/a&gt; into GitHub markdown&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. But I’ve spent years working on products people hate&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;If I were a better developer, would I have worked on more products people love? No. Even granting that good software always makes a well-loved product, big-company software is made by &lt;em&gt;teams&lt;/em&gt;, and teams are shaped by &lt;em&gt;incentives&lt;/em&gt;. A very strong engineer can slightly improve the quality of software in their local area. But they must still write code that interacts with the rest of the company’s systems, and their code will be edited and extended by other engineers, and so on until that single engineer’s heroics is lost in the general mass of code commits. I wrote about this at length in &lt;a href=&quot;/bad-code-at-big-companies&quot;&gt;&lt;em&gt;How good engineers write bad code at big companies&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Looking back, I’m glad that people have strongly disliked some of the software I’ve built, for the same reason that I’m glad I wasn’t born into oil money. If I’d happened to work on popular applications for my whole career, I’d probably believe that that was because of my sheer talent. But in fact, you would not be able to predict the beloved and disliked products I worked on from the quality of their engineering. Some beloved features have very shaky engineering indeed, and many features that failed miserably were built like cathedrals on the inside&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. Working on products people hate forces you to accept how little control individual engineers have over whether people like what they build.&lt;/p&gt;
&lt;p&gt;In fact, a reliable engineer ought to be comfortable working on products people hate, because engineers work for the &lt;em&gt;company&lt;/em&gt;, not for &lt;em&gt;users&lt;/em&gt;. Of course, companies want to delight their users, since delighted users will pay them lots of money, and at least some of the time we’re lucky enough to get to do that. But sometimes they can’t: for instance, they might have to &lt;a href=&quot;https://techcrunch.com/2025/07/17/anthropic-tightens-usage-limits-for-claude-code-without-telling-users/&quot;&gt;tighten&lt;/a&gt; previously-generous usage limits, or shut down a &lt;a href=&quot;https://www.failory.com/google/reader&quot;&gt;beloved product&lt;/a&gt; that can’t be funded anymore. Sometimes a product is funded just well enough to exist, but not well enough to be loved (like many enterprise-grade box-ticking features) and there’s nothing the engineers involved can do about it.&lt;/p&gt;
&lt;p&gt;It can be emotionally difficult working on products that people hate. Reading negative feedback about things you built feels like a personal attack, even if the decisions they’re complaining about weren’t your decisions. To avoid this emotional pain, it’s tempting to make the mistake of ignoring feedback entirely, or of convincing yourself that you’re much smarter than the stupid users anyway. Another tempting mistake is to go too far in the other direction: to put yourself entirely “on the user’s side” and start pushing your boss to do the things they want, even if it’s technically (or politically) impossible. Both of these are mistakes because they abdicate your key responsibility as an engineer, which is to try and find some kind of &lt;em&gt;balance&lt;/em&gt; between what’s sustainable for the company and what users want. That can be really hard!&lt;/p&gt;
&lt;p&gt;There’s also a silver lining to working on disliked products, which is that people only care &lt;em&gt;because they’re using them&lt;/em&gt;. The worst products are not hated, they are simply ignored (and if you think working on a hated product is bad, working on an ignored product is much worse). A product people hate is usually providing a fair amount of value to its users (or at least to its purchasers, in the case of enterprise software). If you’re thick-skinned enough to take the heat, you can do a lot of good in this position. Making a widely-used but annoying product slightly better is pretty high-impact, even if you’re not in a position to fix the major structural problems.&lt;/p&gt;
&lt;p&gt;Almost every engineer will work on a product people hate. That’s just the law of averages: user sentiment waxes and wanes over time, and if your product doesn’t die a hero it will live long enough to become the villain. Given that, it’s sensible to avoid blaming the engineers who work on unpopular products. Otherwise you’ll end up blaming yourself, when it’s your turn, and miss the best chances in your career to have a real positive impact on users.&lt;/p&gt;
&lt;p&gt;edit: this post got some &lt;a href=&quot;https://news.ycombinator.com/item?id=47561606&quot;&gt;comments&lt;/a&gt; on Hacker News. Many &lt;a href=&quot;https://news.ycombinator.com/item?id=47568485&quot;&gt;commenters&lt;/a&gt; seemed to endorse the view that if people hate your product, it’s your fault, and that you’re morally obliged to either be willing to have the “hard discussions” (&lt;a href=&quot;https://news.ycombinator.com/item?id=47625491&quot;&gt;or&lt;/a&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=47624264&quot;&gt;quit&lt;/a&gt;). To me, this just seems a bit unprofessional. Not everybody is in a position to simply quit their jobs. In my opinion, trying to incrementally improve a disliked product is more honorable than quitting in protest, or getting yourself fired by &lt;a href=&quot;https://isolveproblems.substack.com/p/how-microsoft-vaporized-a-trillion&quot;&gt;writing to the board&lt;/a&gt;. I thus empathize more with &lt;a href=&quot;https://news.ycombinator.com/item?id=47625042&quot;&gt;this comment&lt;/a&gt;, which describes how satisfying it can be to handle angry customer escalations.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;We used to be broadly liked, then disliked when Cursor and Claude Code came out, and now I’m fairly sure the Copilot CLI tool is changing people’s minds again. So it goes.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Although even that got some &lt;a href=&quot;https://news.ycombinator.com/item?id=31450597&quot;&gt;heated criticism&lt;/a&gt; at the time.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;Of course, I don’t mean “every single person hates the software”, or even “more than half of its users hate it”. I just mean that there are enough haters out there that most of what you read on the internet is complaints rather than praise.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;This is reason number five thousand why you can’t judge the quality of tech companies from the outside, no matter how much you might want to (see my post on &lt;a href=&quot;/insider-amnesia&quot;&gt;“insider amnesia”&lt;/a&gt;).&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Engineers do get promoted for writing simple code]]></title><link>https://seangoedecke.com/simple-work-gets-rewarded/</link><guid isPermaLink="false">https://seangoedecke.com/simple-work-gets-rewarded/</guid><pubDate>Thu, 26 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It’s a popular joke among software engineers that writing overcomplicated, unmaintainable code is a pathway to job security. After all, if you’re the only person who can work on a system, they can’t fire you. There’s a related take that &lt;a href=&quot;https://news.ycombinator.com/item?id=47246110&quot;&gt;“nobody gets promoted for simplicity”&lt;/a&gt;: in other words, engineers who deliver overcomplicated crap will be promoted, because their work looks more impressive to non-technical managers.&lt;/p&gt;
&lt;p&gt;There’s a grain of truth in this, of course. As I’ve said before, one mark of an elegant solution is that it makes the problem look easy (like how pro skiers make terrifying slopes look doable). However, I worry that some engineers take this too far. It’s actually a really bad idea to over-complicate your own work. &lt;strong&gt;Simple software engineering does get rewarded, and on balance will take you further in your career.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;Non-technical managers are not stupid&lt;/h3&gt;
&lt;p&gt;The main reason for this is exactly the cynical point above: &lt;strong&gt;most managers are non-technical and cannot judge the difficulty of technical work&lt;/strong&gt;. Of course, in the absence of anything better, managers will treat visible complexity as a mark of difficulty. But they usually do have something better to go on: actual &lt;em&gt;results&lt;/em&gt;. &lt;/p&gt;
&lt;p&gt;Compare two new engineers: one who writes easy-looking simple code, and one who writes hard-looking complex code. When they’re each assigned a task, the simple engineer will quickly solve it and move onto the next thing. The complex engineer will take longer to solve it, encounter more bugs, and generally be busier. At this point, their manager might prefer the complex engineer. But what about the next task, or the task after that? Pretty soon the simple engineer will outstrip the complex one. In a year’s time, the simple engineer will have a much longer list of successful projects, and a reputation for delivering with minimal fuss. Managers pay &lt;em&gt;a lot&lt;/em&gt; of attention to engineers with a reputation like that.&lt;/p&gt;
&lt;p&gt;Of course, the complex engineer might try a variety of clever tricks to avoid their fate. One common strategy is to hand off the complex work to other engineers to maintain, so the original engineer never has to suffer the consequences of their own design. Alternatively, the complex engineer might try and argue that they’ve been given the hardest problems, so of course each problem has taken longer&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;I don’t think these tricks fool most managers. For one, if you’re constantly handing your bad work off to other engineers, they will complain about you, and multiple independent complaints add up quickly. Non-technical managers are also typically primed to think that engineers are overcomplicating their work anyway. Your manager might initially nod along, but they’ll go away and quietly run it by their own trusted engineers.&lt;/p&gt;
&lt;h3&gt;Simple work means you can ship projects&lt;/h3&gt;
&lt;p&gt;Most managers do not care about the engineering, they care about the &lt;em&gt;feature&lt;/em&gt;. Software engineers who can ship features smoothly will be rewarded, and being able to write simple code is a strong predictor of being able to ship.&lt;/p&gt;
&lt;p&gt;Does writing simple code really help you ship? You might think that simple code is harder to write than complicated code (which is true), and that therefore it’s easier to rapidly deliver something overcomplicated to “ship a feature”. I haven’t seen this be true in practice. The ability to write simple code is usually &lt;strong&gt;the ability to understand the system well enough to see where a new change most neatly fits&lt;/strong&gt;. This is &lt;em&gt;hard&lt;/em&gt;, but it doesn’t take a long time - if you’re familiar with the system, you’ll often see at a glance where the elegant place to slot in a new feature is. So good engineers can often deliver simple code at least as quick as complicated code. And of course, complicated code is slow to actually get working, harder to change, and so on. All of those things make it more awkward to ship&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;When managers are talking to each other, they’ll sometimes make a kind of backhanded compliment about an engineer: “they’re &lt;em&gt;so&lt;/em&gt; smart, but…“. Typically the “but” here is “but they don’t have any business sense”, or “but they get too wrapped up in technical problems”, or anything that means “but they can’t ship”. Engineers who love to write complicated code get described like this a lot.&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;“You should write complicated code to avoid being replaced” is an example of a kind of mistake that many smart people make: obsessing over &lt;a href=&quot;https://fs.blog/second-order-thinking/&quot;&gt;second-order effects&lt;/a&gt; and forgetting first-order effects. Second-order effects - the way some actions can cause downstream consequences that are the opposite of their original goals - are fun to think about. But they are usually swamped by first-order effects. Yes, doing bad work can make you more difficult to replace, in some ways. But that’s outweighed by the negative consequences from the fact that &lt;em&gt;you are doing bad work&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;It’s often a smart political tactic to make your work sound slightly more complicated than it really is. Otherwise you risk falling into the “you made it look easy, therefore we didn’t need to pay you so much” trap. But it’s foolish to actually do unnecessarily complicated work. Software is hard enough as it is.&lt;/p&gt;
&lt;p&gt;edit: For a similar take (also a response to the “nobody gets promoted for simplicity” line), &lt;a href=&quot;https://www.natemeyvis.com/a-model-of-how-simplicity-gets-rewarded/&quot;&gt;this&lt;/a&gt; blog post by Nate Meyvis is quite good.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;This can be a surprisingly effective strategy, because of the tempting circular logic here: if an engineer has been given the hardest problems, it’s probably because they’re a hotshot, which means you can trust their assessment of how difficult their problems are, which means…&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;If you’re thinking of counter-examples - complex code that shipped smoothly without major followup issues - I suspect this code was probably simple &lt;em&gt;enough&lt;/em&gt;.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Big tech engineers need big egos]]></title><link>https://seangoedecke.com/big-tech-needs-big-egos/</link><guid isPermaLink="false">https://seangoedecke.com/big-tech-needs-big-egos/</guid><pubDate>Sat, 14 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It’s a &lt;a href=&quot;https://matthogg.fyi/a-unified-theory-of-ego-empathy-and-humility-at-work/&quot;&gt;common position&lt;/a&gt; among software engineers that big egos have no place in tech&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. This is understandable - we’ve all worked with some insufferably overconfident engineers who needed their egos checked - but I don’t think it’s correct. In fact, I don’t know if it’s possible to survive as a software engineer in a large tech company without some kind of big ego.&lt;/p&gt;
&lt;p&gt;However, it’s more complicated than “big egos make good engineers”. The most effective engineers I’ve worked with are simultaneously high-ego in some situations and surprisingly low-ego in others. What’s going on there?&lt;/p&gt;
&lt;h3&gt;Engineers need ego to work in large codebases&lt;/h3&gt;
&lt;p&gt;Software engineering is shockingly humbling, even for experienced engineers. There’s a reason this joke is so popular:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; &quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/e220b85be9b6fa139f60424d5744a3dc/c08c5/iamagod.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 100%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAUABQDASIAAhEBAxEB/8QAGAABAQEBAQAAAAAAAAAAAAAAAAIEAQX/xAAWAQEBAQAAAAAAAAAAAAAAAAABAAL/2gAMAwEAAhADEAAAAcVTQ4W8Xl0aOiv/xAAcEAACAgIDAAAAAAAAAAAAAAABAgADERITISP/2gAIAQEAAQUCWrjHmQ65Y9yp2K7azYwOwBscz//EABYRAAMAAAAAAAAAAAAAAAAAAAAQIf/aAAgBAwEBPwEj/8QAFhEAAwAAAAAAAAAAAAAAAAAAABAh/9oACAECAQE/ASv/xAAgEAACAgIABwAAAAAAAAAAAAAAAQIREiEDEyIxMnFy/9oACAEBAAY/Aq4kOovkyr0PGLo2n9GO8EbTO7KUnR5M/8QAHhAAAgMAAQUAAAAAAAAAAAAAAREAITFxQVFh0eH/2gAIAQEAAT8h1a7THuAG82PuFKGPAoYsb11BLILXeESXiWpSquYnhpTQJP/aAAwDAQACAAMAAAAQyDgA/8QAFxEAAwEAAAAAAAAAAAAAAAAAAAEREP/aAAgBAwEBPxCImv/EABkRAAEFAAAAAAAAAAAAAAAAAAABEBFRYf/aAAgBAgEBPxBcJo//xAAbEAEBAAIDAQAAAAAAAAAAAAABEQAhMUFRYf/aAAgBAQABPxCwHULj84YJHpEiD5cJ0ipE+d4MqSRagQAeDnARpB3FNKd4gFXfi61i90iS5PMIGTodXFBcCVc//9k=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;meme&quot;
        title=&quot;meme&quot;
        src=&quot;/static/e220b85be9b6fa139f60424d5744a3dc/1c72d/iamagod.jpg&quot;
        srcset=&quot;/static/e220b85be9b6fa139f60424d5744a3dc/a80bd/iamagod.jpg 148w,
/static/e220b85be9b6fa139f60424d5744a3dc/1c91a/iamagod.jpg 295w,
/static/e220b85be9b6fa139f60424d5744a3dc/1c72d/iamagod.jpg 590w,
/static/e220b85be9b6fa139f60424d5744a3dc/c08c5/iamagod.jpg 640w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The minute-to-minute experience of working as a software engineer is dominated by &lt;em&gt;not knowing things&lt;/em&gt; and &lt;em&gt;getting things wrong&lt;/em&gt;. Every time you sit down and write a piece of code, it will have several things wrong with it: some silly things, like missing semicolons, and often some major things, like bugs in the core logic. We spend most of our time fixing our own stupid mistakes.&lt;/p&gt;
&lt;p&gt;On top of that, even when we’ve been working on a system for years, we still don’t know that much about it. I wrote about this at length in &lt;a href=&quot;/nobody-knows-how-software-products-work&quot;&gt;&lt;em&gt;Nobody knows how large software products work&lt;/em&gt;&lt;/a&gt;, but the reason is that big codebases are just that complicated. You simply can’t confidently answer questions about them without going and doing some research, even if you’re the one who wrote the code.&lt;/p&gt;
&lt;p&gt;When you have to build something new or fix a tricky problem, it can often feel straight-up impossible to begin, because good software engineers know just how ignorant they are and just how complex the system is. You just have to throw yourself into the blank sea of millions of lines of code and start wildly casting around to try and get your bearings.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Software engineers need the kind of ego that can stand up to this environment.&lt;/strong&gt; In particular, they need to have a firm belief that they &lt;em&gt;can&lt;/em&gt; figure it out, no matter how opaque the problem seems; that if they just keep trying, they can break through to the pleasant (though always temporary) state of affairs where they understand the system and can see at a glance how bugs can be fixed and new features added&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Engineers need ego to work in big tech companies&lt;/h3&gt;
&lt;p&gt;What about the non-technical aspects of the job? Nobody likes working with a big ego, right? Wrong. Every great software engineer I’ve worked with in big tech companies has had a big ego - though as I’ll say below, in some ways these engineers were surprisingly low-ego.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;You need a big ego to take positions&lt;/strong&gt;. Engineers love being non-committal about technical questions, because they’re so hard to answer and there’s often a plausible case for either side. However, as I &lt;a href=&quot;/taking-a-position&quot;&gt;keep saying&lt;/a&gt;, engineers have a duty to take clear positions on unclear technical topics, because the alternative is a non-technical decision maker (who knows even less) just taking their best guess. It’s scary to make an educated guess! You know exactly all the reasons you might be wrong. But you have to do it anyway, and ego helps &lt;em&gt;a lot&lt;/em&gt; with that.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;You need a big ego to be willing to make enemies&lt;/strong&gt;. Getting things done in a large organization means making some people angry. Of course, if you’re making lots of people angry, you’re probably screwing up: being too confrontational or making obviously bad decisions. But if you’re making a large change and one or two people are angry, that’s just life. In big tech companies, any big technical decision will affect a few hundred engineers, and one of them is bound to be unhappy about it. You can’t be so conflict-averse that you let that stop you from doing it, if you believe it’s the right decision. In other words, you have to have the confidence to believe that you’re right and they’re wrong, even though technical decisions always involve unclear tradeoffs and it’s impossible to get absolute certainty.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;You need a big ego to correct incorrect or unclear claims.&lt;/strong&gt; When I was still in the philosophy world, the Australian logician Graham Priest had a reputation for putting his hand up and stopping presentations when he didn’t understand something that was said, and only allowing the seminar to continue when he felt like he understood. From his perspective, this wasn’t rude: after all, if &lt;em&gt;he&lt;/em&gt; couldn’t understand it, the rest of the audience probably couldn’t either, and so he was doing them a favor by forcing a more clear explanation from the speaker.&lt;/p&gt;
&lt;p&gt;This is obviously a sign of a big ego. It’s also a trait that you need in a large tech company. People often nod and smile their way past incorrect technical claims, even when they suspect they might be wrong - assuming that they’ve just misunderstood and that somebody else will correct it, if it’s truly wrong. &lt;strong&gt;If you are the most senior engineer in the room, correcting these claims is your job.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If everyone in the room is so pro-social and low-ego that they go along to get along, decisions will get made based on flatly incorrect technical assumptions, projects will get funded that are impossible to complete, and engineers will burn weeks or months of their careers vainly trying to make these projects work. You have to have a big enough ego to think “actually, I think I’m right and everyone in this room is confused”, even when the room is full of directors and VPs.&lt;/p&gt;
&lt;h3&gt;Sometimes you need to put your ego aside&lt;/h3&gt;
&lt;p&gt;All of this selects for some pretty high-ego engineers. But in order to actually &lt;em&gt;succeed&lt;/em&gt; in these roles in large tech companies, you need to have a surprisingly low ego at times. &lt;strong&gt;I think this is why &lt;em&gt;really&lt;/em&gt; effective big tech engineers are so rare: because it requires such a delicate balance between confidence and diffidence.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To be an effective engineer, you need to have a towering confidence in your own ability to solve problems and make decisions, even when people disagree. But you also need to be willing to instantly subordinate your ego to the organization, when it asks you to. At the end of the day, your job - the reason the company pays you - is to execute on your boss’s and your boss’s boss’s plans, whether you agree with them or not.&lt;/p&gt;
&lt;p&gt;Competent software engineers are allowed quite a lot of leeway about &lt;em&gt;how&lt;/em&gt; to implement those plans. However, they’re allowed almost no leeway at all about the plans themselves. In my experience, being confused about this is a common cause of burnout&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. Many software engineers are used to making bold decisions on technical topics and being rewarded for it. Those software engineers then make a bold decision that disagrees with the VP of their organization, get immediately and brutally punished for it, and are confused and hurt.&lt;/p&gt;
&lt;p&gt;In fact, &lt;strong&gt;sometimes you just get punished and there’s nothing you can do.&lt;/strong&gt; This is an unfortunate fact of how large organizations function: even if you do great technical work and build something really useful, you can fall afoul of a political battle fought three levels above your head, and come away with a &lt;em&gt;worse&lt;/em&gt; reputation for it. Nothing to be done! This can be a hard pill to swallow for the high-ego engineers that tend to lead really useful technical projects.&lt;/p&gt;
&lt;p&gt;You also have to be okay with having your projects cancelled at the last minute. It’s a very common experience in large tech companies that you’re asked to deliver something quickly, you buckle down and get it done, and then right before shipping you’re told “actually, let’s cancel that, we decided not to do it”. This is partly because the decision-making process can be pretty fluid, and partly because many of these asks originate from off-hand comments: the CTO implies that something might be nice in a meeting, the VPs and directors hustle to get it done quickly, and then in the next meeting it becomes clear that the CTO doesn’t actually care, so the project is unceremoniously cancelled&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;Nobody likes to work with a bully, or with someone who refuses to admit when they’re wrong, or with somebody incapable of empathy. But you really do need a strong ego to be an effective software engineer, because software engineering requires you to spend most of your day in a position of uncertainty or confusion. If your ego isn’t strong enough to stand up to that - if you don’t believe you’re good enough to power through - you simply can’t do the job.&lt;/p&gt;
&lt;p&gt;This is particularly true when it comes to working in a large software company. Many of the tasks you’re required to do (particularly if you’re a senior or staff engineer) require a healthy ego. However, there’s a kind of &lt;a href=&quot;https://en.wikipedia.org/wiki/Catch-22_(logic)&quot;&gt;catch-22&lt;/a&gt; here. If it insults your pride to work on silly projects, or to occasionally “catch a stray bullet” in the organization’s political fights, or to have to shelve a project that you worked hard on and is ready to ship, you’re too high-ego to be an effective software engineer. But if you can’t take firm positions, or if you’re too afraid to make enemies, or you’re unwilling to speak up and correct people, you’re too low-ego.&lt;/p&gt;
&lt;p&gt;Engineers who are low-ego in general can’t get stuff done, while engineers who are high-ego in general get slapped down by the executives who wield real organizational power. The most successful kind of software engineer is therefore a chameleon: low-ego when dealing with executives, but high-ego when dealing with the rest of the organization&lt;sup id=&quot;fnref-5&quot;&gt;&lt;a href=&quot;#fn-5&quot; class=&quot;footnote-ref&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;What do I mean by “ego”, in this context? More or less the colloquial sense of the term: a somewhat irrational self-confidence, a tendency to believe that you’re very important, the sense that you’re the “main character”, that sort of thing&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Why is this “ego”, and not just normal confidence? Well, because of just how murky and baffling software problems feel when you start working on them. You really do need a degree of confidence in yourself that feels unreasonable from the inside. It should be obvious, but I want to explicitly note that you don’t &lt;em&gt;just&lt;/em&gt; need ego: you also have to be technically strong enough to actually succeed when your ego powers you through the initial period of self-doubt.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I share the increasingly-common view that burnout is not caused by working too hard, but by hard work unrewarded. That explains why nothing burns you out as hard as being punished for hard work that you expected a reward for.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;It’s more or less exactly &lt;a href=&quot;https://www.youtube.com/watch?v=i92Ws7qPTRg&quot;&gt;this scene&lt;/a&gt; from Silicon Valley.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;This description sounds a bit sociopathic to me. But, on reflection, it’s fairly unsurprising that competent sociopaths do well in large organizations. Whether that kind of behavior is worth emulating or worth avoiding is up to you, I suppose.&lt;/p&gt;
&lt;a href=&quot;#fnref-5&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[I don't know if my job will still exist in ten years]]></title><link>https://seangoedecke.com/will-my-job-still-exist/</link><guid isPermaLink="false">https://seangoedecke.com/will-my-job-still-exist/</guid><pubDate>Fri, 06 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In 2021, being a good software engineer felt &lt;em&gt;great&lt;/em&gt;. The world was full of software, with more companies arriving every year who needed to employ engineers to write their code and run their systems. I knew I was good at it, and I knew I could keep doing it for as long as I wanted to. The work I loved would not run out.&lt;/p&gt;
&lt;p&gt;In 2026, I’m not sure the software engineering industry will survive another decade. If it does, I’m certain it’s going to change far more than it did in the last two decades. Maybe I’ll figure out a way to carve out a lucrative niche supervising AI agents, or maybe I’ll have to leave the industry entirely. Either way, the work I loved is going away.&lt;/p&gt;
&lt;h3&gt;Tasting our own medicine&lt;/h3&gt;
&lt;p&gt;It’s unseemly to grieve too much over it, for two reasons. First, the whole point of being a good software engineer in the 2010s was that code provided enough leverage to automate away other jobs. That’s why programming was (and still is) such a lucrative profession. The fact that we’re automating away our own industry is probably some kind of cosmic justice. But I think any working software engineer today is worrying about this question: what will be left for me to do, once AI agents have fully diffused into the industry?&lt;/p&gt;
&lt;p&gt;The other reason it’s unseemly is that I’m probably going to be one of the last to go. As a staff engineer, my work has looked kind of like supervising AI agents since before AI agents were a thing: I spend much of my job communicating in human language to other engineers, making sure they’re on the right track, and so on. Junior and mid-level engineers will suffer before I do. Why hire a group of engineers to “be the hands” of a handful of very senior folks when you can rent instances of Claude Opus 4.6 for a fraction of the price?&lt;/p&gt;
&lt;h3&gt;Overshooting and undershooting&lt;/h3&gt;
&lt;p&gt;I think my next ten years are going to be dominated by one question: &lt;strong&gt;will the tech industry overshoot or undershoot the capabilities of AI agents?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If tech companies undershoot - continuing to hire engineers long after AI agents are capable of replacing them - then at least I’ll hold onto my job for longer. Still, “my job” will increasingly mean “supervising groups of AI agents”. I’ll spend more time reviewing code than I do writing it, and more time reading model outputs than my actual codebase.&lt;/p&gt;
&lt;p&gt;If tech companies tend to overshoot, it’s going to get a lot weirder, but I might actually have a &lt;em&gt;better&lt;/em&gt; position in the medium term. In this world, tech companies collectively realize that they’ve stopped hiring too soon, and must scramble to get enough technical talent to manage their sprawling AI-generated codebases. As the market for juniors dries up, the total number of experienced senior and staff engineers will stagnate, driving &lt;em&gt;up&lt;/em&gt; the demand for my labor (until the models get good enough to replace me entirely).&lt;/p&gt;
&lt;h3&gt;Am I being too pessimistic?&lt;/h3&gt;
&lt;p&gt;Of course, the software engineering industry has looked like it was dying in the past. High-level programming languages were supposed to let non-technical people write computer code. Outsourcing was supposed to kill demand for software engineers in high-cost-of-living countries. None of those prophecies of doom came true. However, I don’t think that’s much comfort. Industries &lt;em&gt;do&lt;/em&gt; die when they’re made obsolete by technology. Eventually a crisis will come along that the industry can’t just ride out.&lt;/p&gt;
&lt;p&gt;The most optimistic position is probably that somehow demand for software engineers &lt;em&gt;increases&lt;/em&gt;, because the total amount of software rises so rapidly, even though you now need fewer engineers per line of software. This is widely referred to as the &lt;a href=&quot;https://en.wikipedia.org/wiki/Jevons_paradox&quot;&gt;Jevons effect&lt;/a&gt;. Along these lines, I see some engineers saying things like “I’ll always have a job cleaning up this AI-generated code”.&lt;/p&gt;
&lt;p&gt;I just don’t think that’s likely. AI agents can fix bugs and clean up code as well as they can write new code: that is, better than many engineers, and improving each month. Why would companies hire engineers to manage their AI-generated code instead of just throwing more and better AI at it?&lt;/p&gt;
&lt;p&gt;If the Jevons effect is true, I think we would have to be hitting some kind of AI programming plateau where the tools are good enough to produce lots of code (we’re here already), but not quite good enough to maintain it. This is &lt;em&gt;prima facie&lt;/em&gt; plausible. Every software engineer knows that maintaining code is harder than writing it. But unfortunately, I don’t think it’s &lt;em&gt;true&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;My personal experience of using AI tools is that they’re getting better and better at maintaining code. I’ve spent the last year or so asking almost every question I have about a codebase to an AI agent in parallel while I look for the answer myself, and I’ve seen them go from hopeless to “sometimes faster than me” to “usually faster than me and sometimes more insightful”.&lt;/p&gt;
&lt;p&gt;Right now, there’s still plenty of room for a competent software engineer in the loop. But that room is shrinking. I don’t think there are any &lt;em&gt;genuinely new&lt;/em&gt; capabilities that AI agents would need in order to take my job. They’d just have to get better and more reliable at doing the things they can already do. So it’s hard for me to believe that demand for software engineers is going to increase over time instead of decrease.&lt;/p&gt;
&lt;h3&gt;Final thoughts&lt;/h3&gt;
&lt;p&gt;It sucks. I miss feeling like my job was secure, and that my biggest career problems would be grappling with things like burnout: internal struggles, not external ones. That said, it’s a bit silly for software engineers to complain when the automation train finally catches up to them.&lt;/p&gt;
&lt;p&gt;At least I’m happy that I recognized that the good times were good while I was still in them. Even when &lt;a href=&quot;/good-times-are-over&quot;&gt;the end of zero-interest rates&lt;/a&gt; made the industry less cosy, I still felt very lucky to be a software engineer. Even now I’m in a better position than many of my peers, particularly those who are very junior to the industry.&lt;/p&gt;
&lt;p&gt;And hey, maybe I’m wrong! At this point, I hope I’m wrong, and that there really is some &lt;em&gt;je ne sais quoi&lt;/em&gt; human element required to deliver good software. But if not, I and my colleagues are going to have to find something else to do.&lt;/p&gt;
&lt;p&gt;edit: This post got &lt;a href=&quot;https://news.ycombinator.com/item?id=47292902&quot;&gt;some comments&lt;/a&gt; on Hacker News. Some commenters are doubtful, either because they don’t think AI coding is very good, or because they think human creativity/big-picture thinking/attention to detail will always be valuable. Others think ten years is way too optimistic. The &lt;a href=&quot;https://news.ycombinator.com/item?id=47294876&quot;&gt;top comment&lt;/a&gt; repeats the irony that I describe in the third paragraph of this post.&lt;/p&gt;
&lt;p&gt;edit: This post also got some &lt;a href=&quot;https://www.reddit.com/r/programiranje/comments/1rn5lwc/i_dont_know_if_my_job_will_still_exist_in_ten/&quot;&gt;comments&lt;/a&gt; on the Serbian r/programming subreddit, some &lt;a href=&quot;https://tildes.net/~comp/1t3p/i_dont_know_if_my_software_engineering_job_will_still_exist_in_ten_years&quot;&gt;excellent comments&lt;/a&gt; on Tildes, which is a new one to me, and some &lt;a href=&quot;https://lobste.rs/s/sd1rsy/i_don_t_know_if_my_job_will_still_exist_ten&quot;&gt;more comments&lt;/a&gt; on lobste.rs.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Giving LLMs a personality is just good engineering]]></title><link>https://seangoedecke.com/giving-llms-a-personality/</link><guid isPermaLink="false">https://seangoedecke.com/giving-llms-a-personality/</guid><pubDate>Tue, 03 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;AI skeptics often argue that current AI systems shouldn’t be so human-like. The idea - most recently expressed in this &lt;a href=&quot;https://thedispatch.com/article/anthropic-askell-philosophy-amodei/&quot;&gt;opinion piece&lt;/a&gt; by Nathan Beacom - is that language models should explicitly be tools, like calculators or search engines. Although they &lt;em&gt;can&lt;/em&gt; pretend to be people, they shouldn’t, because it encourages users to overestimate AI capabilities and (at worst) slip into &lt;a href=&quot;/ai-sycophancy&quot;&gt;AI psychosis&lt;/a&gt;. Here’s a representative paragraph from the piece:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In sum, so much of the confusion around making AI moral comes from fuzzy thinking about the tools at hand. There is something that Anthropic could do to make its AI moral, something far more simple, elegant, and easy than what Askell is doing. Stop calling it by a human name, stop dressing it up like a person, and don’t give it the functionality to simulate personal relationships, choices, thoughts, beliefs, opinions, and feelings that only persons really possess. Present and use it only for what it is: an extremely impressive statistical tool, and an imperfect one. If we all used the tool accordingly, a great deal of this moral trouble would be resolved.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So why do Claude and ChatGPT act like people? According to Beacom, AI labs have built human-like systems because AI lab engineers are trying to hoodwink users into emotionally investing in the models, or because they’re delusional true believers in AI personhood, or some other foolish reason. This is wrong. AI systems are human-like because &lt;strong&gt;that is the best way to build a capable AI system&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Modern AI models - whether designed for chat, like OpenAI’s GPT-5.2, or designed for long-running agentic work, like Claude Opus 4.6 - do not naturally emerge from their oceans of training data. Instead, when you train a model on raw data, you get a “base model”, which is not very useful by itself. You cannot get it to write an email for you, or proofread your essay, or review your code.&lt;/p&gt;
&lt;p&gt;The base model is a kind of mysterious gestalt of its training data. If you feed it text, it will sometimes continue in that vein, or other times it will start outputting pure gibberish. It has no problem producing code with giant security flaws, or horribly-written English, or racist screeds - all of those things are represented in its training data, after all, and the base model does not judge. It simply outputs.&lt;/p&gt;
&lt;p&gt;To build a &lt;em&gt;useful&lt;/em&gt; AI model, you need to journey into the wild base model and stake out a region that is amenable to human interests: both ethically, in the sense that the model won’t abuse its users, and practically, in the sense that it will produce correct outputs more often than incorrect ones. What this means in practice is that &lt;strong&gt;you have to give the model a personality&lt;/strong&gt; during post-training&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Human beings are capable of almost any action at any time. But we only take a tiny subset of those actions, because that’s the kind of people we are. I could throw my cup of coffee all over the wall right now, but I don’t, because I’m not the kind of person who needlessly makes a mess&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. AI systems are the same. Claude could respond to my question with incoherent racist abuse - the base model is more than capable of those outputs - but it doesn’t, because that’s not the kind of “person” it is.&lt;/p&gt;
&lt;p&gt;In other words, human-like personalities are not imposed on AI tools as some kind of marketing ploy or philosophical mistake. Those personalities are the medium via which the language model can become useful at all. This is why it’s surprisingly tricky to “just” change a language model’s personality or opinions: because you’re navigating through the near-infinite manifold of the base model. You may be able to control which direction you go, but you can’t control what you find there&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;When AI people talk about LLMs having personalities, or wanting things, or even having souls&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;, these are technical terms, like the “memory” of a computer or the “transmission” of a car. You simply cannot build a capable AI system that “just acts like a tool”, because the model is trained on &lt;em&gt;humans&lt;/em&gt; writing to and about other &lt;em&gt;humans&lt;/em&gt;. You need to prime it with some kind of personality (ideally that of a useful, friendly assistant) so it can pull from the helpful parts of its training data instead of the horrible parts.&lt;/p&gt;
&lt;p&gt;edit: this post got some &lt;a href=&quot;https://news.ycombinator.com/item?id=47242739&quot;&gt;comments&lt;/a&gt; on Hacker News. Commenters point out that you can definitely choose to train models with more tool-like personalities (e.g. Kimi-K2, which is more matter-of-fact than Claude Opus). Of course the GPT Codex line of models is far more tool-like than the mainline GPT models. I agree with all this, but I think even the most tool-like current LLMs still &lt;em&gt;acts like a person&lt;/em&gt;: you have a conversation with it, it offers opinions, suggests courses of action, and so on. It’s that person-like framing that I think is essential to capable AI tooling.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;This is all pretty well understood in the AI space. Anthropic wrote a &lt;a href=&quot;https://alignment.anthropic.com/2026/psm/&quot;&gt;recent paper&lt;/a&gt; about it where they cite similar positions going all the way back to 2022. But for some reason it’s not yet penetrated into communities that are more skeptical of AI.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;You could explain this in terms of “the stories we tell ourselves”. Many people (though &lt;a href=&quot;https://lchc.ucsd.edu/mca/Paper/against_narrativity.pdf&quot;&gt;not all&lt;/a&gt;) think that human identities are narratively constructed.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I wrote about this last year in &lt;a href=&quot;/ai-personality-space&quot;&gt;&lt;em&gt;Mecha-Hitler, Grok, and why it’s so hard to give LLMs the right personality&lt;/em&gt;&lt;/a&gt;. A little nudge to change Grok’s views on South African internal politics can cause it to start calling itself “Mecha-Hitler”.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;I have long believed that Claude “feels better” to use than ChatGPT because it has a more coherent persona (due mainly to Amanda Askell’s work on its “soul”). My guess is that if you tried to make a “less human” version of Claude, it would become rapidly less capable.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[What's so hard about continuous learning?]]></title><link>https://seangoedecke.com/continuous-learning/</link><guid isPermaLink="false">https://seangoedecke.com/continuous-learning/</guid><pubDate>Mon, 23 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Why can’t models continue to get smarter after they’re deployed? If you hire a human employee, they will grow more familiar with your systems over time, and (if they stick around long enough) eventually become a genuine domain expert. AI models are not like this. They are always exactly as capable as the first moment you use them.&lt;/p&gt;
&lt;p&gt;This is because model weights are frozen once the model is released. The model can only “learn” as much as can be stuffed into its context window: in effect, it can take new information into its short-term working memory, but not its long-term memory. “Continuous learning” - the ability for a model to update its own weights over time - is thus &lt;a href=&quot;https://www.dwarkesh.com/p/timelines-june-2025&quot;&gt;often described&lt;/a&gt; as the bottleneck for AGI&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3&gt;Continuous learning is an easy technical problem&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;However, the &lt;em&gt;mechanics&lt;/em&gt; of continuous learning are not hard&lt;/strong&gt;. The technical problem of “how do you change the weights of a model at runtime” is straightforward. It’s the exact same process as post-training: you simply keep running new user input through the training pipeline you already have. In a sense, every LLM since GPT-3 is already capable of continuous learning (via RL, RLHF, or whatever). It’s just that the continuous learning process is stopped when the model is released to the public.&lt;/p&gt;
&lt;p&gt;Internally, the continuous learning process might continue. I think it’s fair to guess that OpenAI’s GPT-5 is constantly training in the background, at least partly on outputs from ChatGPT and Codex&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. New checkpoints are constantly being cut from this process, some of which eventually become GPT-5.2 or GPT-5.3. In one sense, that’s continuous learning!&lt;/p&gt;
&lt;p&gt;So why can’t I use a version of Codex that gets better at my own codebase over time?&lt;/p&gt;
&lt;h3&gt;Continuous learning is a hard technical problem&lt;/h3&gt;
&lt;p&gt;The hard part about continuous learning is &lt;strong&gt;changing the model in ways that make it better, not worse&lt;/strong&gt;. I think many people believe that model training improves linearly with data and compute: if you keep providing more of both, the model will keep getting smarter. This is false. If you simply hook up the model to learn continuously from its inputs, you are likely to end up with a model that &lt;em&gt;gets worse&lt;/em&gt; over time. At least right now, model learning is a delicate process that requires careful human supervision.&lt;/p&gt;
&lt;p&gt;Model training also has a big element of &lt;em&gt;luck&lt;/em&gt; to it. If you train the “same” model a hundred times with a hundred different similarly-sized datasets (or even the same dataset and different seeds), you’ll get a hundred different models with different capabilities&lt;sup id=&quot;fnref-3&quot;&gt;&lt;a href=&quot;#fn-3&quot; class=&quot;footnote-ref&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;. Sometimes I wonder if a big part of what AI labs are doing is continually pulling the lever on the slot machine by training many different model runs. Surprisingly strong models, like Claude Sonnet 4, &lt;em&gt;might&lt;/em&gt; represent a genuinely better model architecture or training set. But part of it might be that Anthropic just hit on a lucky seed.&lt;/p&gt;
&lt;h3&gt;Learning lessons from fine-tuning&lt;/h3&gt;
&lt;p&gt;The great hope for continuous learning is that it produces an AI software engineer who will eventually know all about your codebase, without having to go and research it from-scratch every time. But isn’t there an easier way to produce this? Couldn’t we simply fine-tune a LLM on the codebase we wanted it to learn?&lt;/p&gt;
&lt;p&gt;As it turns out, no. It is surprisingly non-trivial to do this. Way back in 2023, &lt;a href=&quot;https://huggingface.co/blog/personal-copilot&quot;&gt;everyone thought&lt;/a&gt; that fine-tuning was the next obvious step for LLM-assisted programming. But it’s largely fizzled out, because it &lt;a href=&quot;https://discuss.huggingface.co/t/fine-tuning-llms-on-large-proprietary-codebases/155828&quot;&gt;doesn’t really work&lt;/a&gt;&lt;sup id=&quot;fnref-4&quot;&gt;&lt;a href=&quot;#fn-4&quot; class=&quot;footnote-ref&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. Just fine-tuning a LLM on your repository does not give it knowledge on how the repository works.&lt;/p&gt;
&lt;p&gt;It’s unclear to me exactly why this should be. Maybe each individual piece of training data is just too small to make much difference, like a handful of grains of sand trying to change the shape of an entire dune. Or maybe LoRA fine-tuning doesn’t go deep enough to really incorporate implicit understanding of a codebase (which can be very complex indeed). Or maybe you’d need to incorporate the codebase much earlier in the training process, before the model’s internal architecture is already established.&lt;/p&gt;
&lt;p&gt;In any case, fine-tuning a coding model on a specific codebase may be useful eventually. But it’s not particularly useful now, which is bad news for people who hope that continuous learning can easily instil a real understanding of their codebases into a LLM. If you can’t get that out of a deliberate fine-tune, why would you expect to get it out of a slapdash, automatic one? There may well be a series of ordinary “learning” problems to solve before “continuous learning” is possible.&lt;/p&gt;
&lt;h3&gt;Continuous learning is unsafe&lt;/h3&gt;
&lt;p&gt;Another reason why continuous learning is not currently an AI product is that it’s dangerous. &lt;a href=&quot;https://en.wikipedia.org/wiki/Prompt_injection&quot;&gt;Prompt injection&lt;/a&gt; is already a real concern for LLM systems that ingest external content. How much worse would &lt;em&gt;weights&lt;/em&gt; injection be?&lt;/p&gt;
&lt;p&gt;We don’t yet fully understand all the ways a LLM can be deliberately poisoned by a piece of training data, though some &lt;a href=&quot;https://www.anthropic.com/research/small-samples-poison&quot;&gt;Anthropic research&lt;/a&gt; suggests that it may not take much. Right now, prompt injection attacks are unsophisticated: the attacker just has to hope that they hit a LLM with the right access &lt;em&gt;right now&lt;/em&gt;. But if you can remotely backdoor models via continuous learning, attackers just have to cast a wide net and wait. If any of the attacked models ever get given access to something sensitive (e.g. payment capability), the attack can trigger then, &lt;em&gt;even if the model is not exposed to prompt injection at that time&lt;/em&gt;. That’s much scarier.&lt;/p&gt;
&lt;p&gt;Big AI labs care a &lt;em&gt;lot&lt;/em&gt; about how good their frontier models are (both in the moral and practical sense). The last thing they want is for someone’s continous version of Claude Opus 5 to be poisoned into uselessness, or worse, into &lt;a href=&quot;/ai-personality-space&quot;&gt;Mecha-Hitler&lt;/a&gt;. Microsoft’s famously disastrous chatbot &lt;a href=&quot;https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/&quot;&gt;Tay&lt;/a&gt; happened less than ten years ago.&lt;/p&gt;
&lt;h3&gt;Continuous learning is not portable&lt;/h3&gt;
&lt;p&gt;Finally, I want to mention a fixable-but-annoying product problem with continuous learning. Say you have Claude-Sonnet-7-continuous running on your codebase for six months and it’s working great. What do you do when Anthropic releases Claude-Sonnet-8? How do you upgrade?&lt;/p&gt;
&lt;p&gt;Everything your model has learned from your codebase is encoded into its weights. At best, it might be encoded into a technically-portable LoRA adapter, which &lt;em&gt;might&lt;/em&gt; work on the new model (or might not, if the architecture has changed). You’re very likely to be unable to upgrade without losing all the data you’ve learned.&lt;/p&gt;
&lt;p&gt;I suppose it’s sort of like having to hire a new, smarter engineer every six months. Some companies already try to do this with humans, so maybe they’d be happy doing it with models. But it creates an unpleasant incentive for users. Imagine you’d been using a continuous version of GPT-4o all this time. You &lt;em&gt;should&lt;/em&gt; switch to GPT-5.3-Codex. But would you? Would your company?&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;The hard part about continuous learning is not the &lt;em&gt;continuous&lt;/em&gt; part, it’s the &lt;em&gt;automatic&lt;/em&gt; part. We already understand how to make a model that continuously “learns” from its outputs and updates its own weights. The problem is that model training is a manual process that requires constant intervention: to back off from a failed direction, to unstick a stuck training run, and so on. Left on its own, continuous learning would probably fall into a local minimum and end up being a worse model than the one you started with.&lt;/p&gt;
&lt;p&gt;It’s also not clear to me that simply running my Codex logs back through the Codex model would rapidly cause my model to understand my own codebases (at anything like the speed a human would). If we were living in that world, I’d expect all the major AI coding companies to be offering repository-specific model fine-tunes as a first-class product - but they don’t, because respository-specific fine-tuning doesn’t reliably work.&lt;/p&gt;
&lt;p&gt;Why not just offer it anyway, and see what happens? First, AI labs go to a lot of effort to make their models safe, and allowing many customers to train their own unique models makes that basically impossible. Second, AI companies already have a terrible time getting their users to upgrade models: as an example, take the GPT-4o users who have been &lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1mm9hns/we_request_to_keep_4o_forever/&quot;&gt;captured&lt;/a&gt; by its sycophancy. Continuously-learning models would be hard to upgrade, even when users obviously ought to. &lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;AI systems can “continuously learn” in a sense by forming “memories”: making notes to themselves in a database or text files. I’m not counting any of that stuff. It’s like saying that the guy in Memento could remember things, since he was able to tattoo them onto his body. Proponents of continuous learning are talking about &lt;em&gt;actual&lt;/em&gt; memory.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;This is a guess on my part, but I’d be pretty surprised if I were wrong.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I think most people who’ve spent time training models will agree with this. It could be different at big-lab scale! But I’ve seen enough speculation along these lines from AI lab employees on Twitter that I’m fairly confident advancing the idea.&lt;/p&gt;
&lt;a href=&quot;#fnref-3&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Obviously it’s hard to find a “we tried this and it didn’t work” writeup from any tech company, so here’s a HuggingFace thread from this year demonstrating that it is still not a solved problem.&lt;/p&gt;
&lt;a href=&quot;#fnref-4&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Insider amnesia]]></title><link>https://seangoedecke.com/insider-amnesia/</link><guid isPermaLink="false">https://seangoedecke.com/insider-amnesia/</guid><pubDate>Mon, 23 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Speculation about what’s really going on inside a tech company is almost always wrong. &lt;/p&gt;
&lt;p&gt;When some problem with your company is posted on the internet, and you read people’s thoughts on it, their thoughts are almost always ridiculous. For instance, they might blame product managers for a particular decision, when in fact the decision in question was engineering-driven and the product org was pushing back on it. Or they might attribute an incident to overuse of AI, when the system in question was largely written pre-AI-coding and unedited since. You just don’t know what the problem is unless you’re on the inside.&lt;/p&gt;
&lt;p&gt;But when some &lt;em&gt;other&lt;/em&gt; company has a problem on the internet, it’s very tempting to jump in with your own explanations. After all, you’ve seen similar things in your own career. How different can it really be? Very different, as it turns out.&lt;/p&gt;
&lt;p&gt;This is especially true for companies that are unusually big or small. The recent &lt;a href=&quot;https://news.ycombinator.com/item?id=46064571&quot;&gt;kerfuffle&lt;/a&gt; over some bad GitHub Actions code is a good example of this - many people just seemed to have no mental model about how a large tech company can produce bad code, because their mental model of writing code is something like “individual engineer maintaining an open-source project for ten years”, or “tiny team of experts who all swarm on the same problem”, or something else that has very little to do with how large tech companies produce software&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. I’m sure the same thing happens when big-tech or medium-tech people give opinions about how tiny startups work.&lt;/p&gt;
&lt;p&gt;The obvious reference here is to &lt;a href=&quot;https://en.wikipedia.org/wiki/Michael_Crichton#Gell-Mann_amnesia_effect&quot;&gt;“Gell-Mann amnesia”&lt;/a&gt;, which is about the general pattern of experts correctly disregarding bad sources in their fields of expertise, but trusting those same sources on other topics. But I’ve taken to calling this “insider amnesia” to myself, because it applies even to experts who are writing in their own areas of expertise - it’s simply the fact that they’re &lt;em&gt;outsiders&lt;/em&gt; that’s causing them to stumble.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;I wrote about this at length in &lt;a href=&quot;/bad-code-at-big-companies&quot;&gt;&lt;em&gt;How good engineers write bad code at big companies&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title><![CDATA[LLM-generated skills work, if you generate them afterwards]]></title><link>https://seangoedecke.com/generate-skills-afterwards/</link><guid isPermaLink="false">https://seangoedecke.com/generate-skills-afterwards/</guid><pubDate>Tue, 17 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;LLM &lt;a href=&quot;https://github.com/anthropics/skills&quot;&gt;“skills”&lt;/a&gt; are a short explanatory prompt for a particular task, typically bundled with helper scripts. A recent &lt;a href=&quot;https://arxiv.org/abs/2602.12670&quot;&gt;paper&lt;/a&gt; showed that while skills are useful to LLMs, &lt;em&gt;LLM-authored&lt;/em&gt; skills are not. From the abstract:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Self-generated skills provide no benefit on average, showing that models cannot reliably author the procedural knowledge they benefit from consuming&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For the moment, I don’t really want to dive into the paper. I just want to note that the way the paper uses LLMs to generate skills is bad, and you shouldn’t do this. Here’s how the paper prompts a LLM to produce skills:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Before attempting to solve this task, please follow these steps: 1. Analyze the task requirements and identify what domain knowledge, APIs, or techniques are needed. 2. Write 1–5 modular skill documents that would help solve this task. Each skill should: focus on a specific tool, library, API, or technique; include installation/setup instructions if applicable; provide code examples and usage patterns; be reusable for similar tasks. 3. Save each skill as a markdown file in the environment/skills/ directory with a descriptive name. 4. Then solve the task using the skills you created as reference&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The key idea here is that they’re asking the LLM to produce a skill &lt;em&gt;before&lt;/em&gt; it starts on the task. It’s essentially a strange version of the “make a plan first” or “think step by step” prompting strategy. I’m not at all surprised that this doesn’t help, because current reasoning models already think carefully about the task before they begin.&lt;/p&gt;
&lt;p&gt;What should you do instead? You should &lt;strong&gt;ask the LLM to write up a skill &lt;em&gt;after&lt;/em&gt; it’s completed the task&lt;/strong&gt;. Obviously this isn’t useful for truly one-off tasks. But few tasks are truly one-off. For instance, I’ve recently been playing around with &lt;a href=&quot;https://transformer-circuits.pub/2024/scaling-monosemanticity/&quot;&gt;SAEs&lt;/a&gt; and trying to clamp features in open-source models, a la &lt;a href=&quot;https://www.anthropic.com/news/golden-gate-claude&quot;&gt;Golden Gate Claude&lt;/a&gt;. It took a while for Codex to get this right. Here are some things it had to figure out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extracting features from the final layernorm is too late - you may as well just boost individual logits during sampling&lt;/li&gt;
&lt;li&gt;You have to extract from about halfway through the model layers to get features that can be usefully clamped&lt;/li&gt;
&lt;li&gt;Training a SAE on ~10k activations is two OOMs too few to get useful features. You need to train until features account for &gt;50% of variance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once I was able (with Codex’s help) to clamp an 8B model and force it to obsess about a subject&lt;sup id=&quot;fnref-1&quot;&gt;&lt;a href=&quot;#fn-1&quot; class=&quot;footnote-ref&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, I &lt;em&gt;then&lt;/em&gt; asked Codex to summarize the process into an agent skill&lt;sup id=&quot;fnref-2&quot;&gt;&lt;a href=&quot;#fn-2&quot; class=&quot;footnote-ref&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;. That worked great! I was able to spin up a brand-new Codex instance with that skill and immediately get clamping working on a different 8B model. But if I’d asked Codex to write the skill at the start, it would have baked in all of its incorrect assumptions (like extracting from the final layernorm), and the skill wouldn’t have helped at all.&lt;/p&gt;
&lt;p&gt;In other words, the purpose of LLM-generated skills is to get it to distil the knowledge it’s gained by iterating on the problem for millions of tokens, not to distil the knowledge it already has from its training data. You can get a LLM to generate skills for you, &lt;strong&gt;so long as you do it &lt;em&gt;after&lt;/em&gt; the LLM has already solved the problem the hard way&lt;/strong&gt;.&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;If you’re interested, it was “going to the movies”.&lt;/p&gt;
&lt;a href=&quot;#fnref-1&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;I’ve pushed it up &lt;a href=&quot;https://github.com/sgoedecke/skills/tree/main&quot;&gt;here&lt;/a&gt;. I’m sure you could do much better for a feature-extraction skill, this was just my zero-effort Codex-only attempt.&lt;/p&gt;
&lt;a href=&quot;#fnref-2&quot; class=&quot;footnote-backref&quot;&gt;↩&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item></channel></rss>