More

mlinsey · 2026-04-24T19:44:17 1777059857

My anecdata is that it heavily depends on how much of the relevant code and instructions it can fit in the context window.

A small app, or a task that touches one clear smaller subsection of a larger codebase, or a refactor that applies the same pattern independently to many different spots in a large codebase - the coding agents do extremely well, better than the median engineer I think.

Basically "do something really hard on this one section of code, whose contract of how it intereacts with other code is clear, documented, and respected" is an ideal case for these tools.

As soon as the codebase is large and there are gotchas, edge cases where one area of the code affects the other, or old requirements - things get treacherous. It will forget something was implemented somewhere else and write a duplicate version, it will hallucinate what the API shapes are, it will assume how a data field is used downstream based on its name and write something incorrect.

IMO you can still work around this and move net-faster, especially with good test coverage, but you certainly have to pay attention. Larger codebases also work better when you started them with CC from the beginning, because it's older code is more likely to actually work how it exepects/hallucinates.

onlyrealcuzzo · 2026-04-24T20:00:51 1777060851

> My anecdata is that it heavily depends on how much of the relevant code and instructions it can fit in the context window.

Agreed, but I'm working on something >100k lines of code total (a new language and a runtime).

It helps when you can implement new things as if they're green-field-ish AND THEN implement and plumb them later.

antonvs · 2026-04-25T05:11:52 1777093912

In a well-designed system, you can point an agent at a module of that system and it's perfectly capable of dealing with it. Humans also have a limited context window, and divide and conquer is always how we've dealt with it. The same approach works for agents.

onlyrealcuzzo · 2026-04-27T18:38:10 1777315090

> In a well-designed system, you can point an agent at a module of that system and it's perfectly capable of dealing with it.

Yes, but the problem is that LLMs don't default to well-designed systems... So, you need to aggressively stay on top of them.

mlinsey · 2026-04-23T18:05:42 1776967542

The consumer surplus is quite high. Even with the regressions in this postmortem, performance was above the models last fall, when I was gladly paying for my subscription and thought it was net saving me time.

That said, there is now much better competition with Codex, so there's only so much rope they have now.

mlinsey · 2026-04-22T03:43:28 1776829408

Is this cash or compute? Elon has one of the world's biggest compute clusters spun up, and little inference demand to speak of.

Trading billions worth of idle compute, in exchange for a high-strike call option on the #3 player in the most-promising-vertical for AI, plus (presmuably) some access to their data, starts to sound like not a bad trade. Especially if you're pre-committed to betting your entire rocket company on winning in AI, and you're currently in sixth or seventh place.

HWR_14 · 2026-04-22T05:04:43 1776834283

> you're pre-committed to betting your entire rocket company on winning in AI

SpaceX has invested a small amount as a share of its value in XAI, and could survive the loss of its investment.

mlinsey · 2026-04-22T14:38:35 1776868715

It's true he could write off xAI today and the company could still fetch a trillion-dollar valuation. But I was more referring to his stated intentions - between his stated plans, his actions taking SpaceX from a profitable company to spending basically all their revenue (plus a rumored large chunk of what's raised via its IPO) on AI, and seeing his tendency to make bet-the-farm bets on Tesla, I think it's fair to say he's committing to bet all of SpaceX on xAI.

Barbing · 2026-04-22T03:52:52 1776829972

I heard he made a deal with a company to use his clusters. Is there good data on demand for Grok? Seems like relatively little chatter at least, in spite of tremendous investment.

throwanem · 2026-04-22T04:26:25 1776831985

[flagged]

the-peter · 2026-04-22T04:53:37 1776833617

[flagged]

estomagordo · 2026-04-22T05:27:28 1776835648

I hate Trump as much as the next guy, but what is that evidence, again?

modriano · 2026-04-22T06:00:41 1776837641

He had a very close, decades long friendship with the most notorious sex-trafficker-of-children-to-rich-creeps in modern history for decades. And when imprisoned, that infamous pedophile died while in a federal prison under Trump's control, with a strange gap in the CCTV video footage. And Trump's handling of the entire Epstein Files saga makes it clear that Trump is described extensively in those files and he desperately wants to conceal it. What could be in there that he would use the entire justice department to try and redact? Trump is shameless about things that are legal even if they're salacious (like sleeping with porn star Stormy Daniels), so you have to wonder, what could Jeffery Epstein's good friend be trying to conceal?

Also, he owned the Miss Universe org (including Miss USA and Miss Teen USA) for decades, and he was known to walk into the dressing rooms of teen contestants as young as 15 while they were undressed. [0]

Also, he bragged about molesting women, and a court of law found that he sexually assaulted E Jean Carroll.

I haven't proven the case that Trump had sex with a minor, but there's way more than enough probable cause to believe it's more likely than not.

[0] https://web.archive.org/web/20200111171647/https://www.rolli...

estomagordo · 2026-04-22T06:33:41 1776839621

Obviously this looks very bad but you don't seriously think it constitutes evidence?

modriano · 2026-04-22T16:24:52 1776875092

Imagine there's a camera continuously recording a cookie jar. A child eats all of the cookies and then deletes the footage from the time they ate the cookies. A parent returns to find their child covered in crumbs, loudly proclaiming they haven't eaten a cookie in years and actively interferes with the parent's investigation and tries to distract from it by throwing a brick through the window of an Iranian family down the street.

Are any of the facts in this hypothetical "evidence"? With the knowledge of the truth (that the kid ate the cookies), it's clear these are all relevant pieces of evidence. If we take knowledge of the truth out of the equation, would these facts still be evidence? Unambiguously they would.

brazukadev · 2026-04-22T07:55:28 1776844528

you don't seriously think it constitutes evidence? Do you even know what the word evidence mean? It is not the same as proof.

estomagordo · 2026-04-22T09:57:12 1776851832

Maybe you would want to insert the term "circumstantial" or so.

gpm · 2026-04-22T12:30:46 1776861046

Definitionally both circumstantial and direct evidence are forms of evidence. No modifier is necessary.

And incidentally you can be convicted in a court of law purely on circumstantial evidence, and that's the place in society where we have the highest standard of proof. The evidence all being circumstantial is not a gotcha.

kennywinker · 2026-04-22T06:44:23 1776840263

https://www.thedailybeast.com/new-evidence-corroborates-clai...

estomagordo · 2026-04-22T10:03:59 1776852239

Yeah that's pretty bad.

rhizome · 2026-04-22T05:58:30 1776837510

This isn't court. The evidence, such as it is, is all of the smoke which commonly motivates people to look for fire. The strongest and most comprehensive that I've seen is the argument that if Trump was not implicated in the Epstein files, he would be publishing them in free book form himself and forcing every media outlet to advertise it. Slight exaggeration, but I think truly only slight.

Not really relevant to the thread, but there are simple answers to the "eViDeNcE??" question. You may have already known this.

estomagordo · 2026-04-22T06:34:35 1776839675

Again, circumstantial and speculative.

pyvpx · 2026-04-22T05:44:20 1776836660

Clearly you don’t and that disingenuousness is frowned upon in discussions here.

walletdrainer · 2026-04-22T05:57:10 1776837430

So, where’s the evidence?

kennywinker · 2026-04-22T06:44:36 1776840276

https://www.thedailybeast.com/new-evidence-corroborates-clai...

whatsupdog · 2026-04-22T06:05:01 1776837901

[flagged]

saaaaaam · 2026-04-22T07:22:55 1776842575

Someone who works on a “sugar dating” app advocating for synthetic child porn? That’s… uncomfortable?

throwanem · 2026-04-22T07:35:47 1776843347

To say the least. Great catch! 'O brave new world, that has such people in 't.'

danso · 2026-04-22T06:15:38 1776838538

Has the availability of deepfake porn generation reduced the demand for deepfake porn featuring real people? When deepfake generators are capable of creating convincing imagery of flawless ideal fake humans, why do you suppose there’s so many real humans who report being non-consensual subjects of deepfake porn?

numpad0 · 2026-04-22T06:50:46 1776840646

> Has the availability of deepfake porn generation reduced the demand for deepfake porn featuring real people?

yes

> When deepfake generators are capable of creating convincing imagery of flawless ideal fake humans, why do you suppose there’s so many real humans who report being non-consensual subjects of deepfake porn?

?

eCa · 2026-04-22T06:18:17 1776838697

One obvious argument is what it was trained on.

whatsupdog · 2026-04-22T07:02:46 1776841366

Doesn't have to be. You can train it on normal pictures of children and nude images of adults.

throwanem · 2026-04-22T07:33:29 1776843209

> Doesn't have to be. You can train it on normal pictures of children and nude images of adults.

You say this so casually, as though it were a normal thing to know, or as if a normal person would know it. Does that actually seem true where you live right now?

And how do you know that, anyway, Harsh? I mean, all those "unblocked" games you stole to give away and that you also put on Github, that's one thing. But this...

arowthway · 2026-04-22T10:32:19 1776853939

Come on, it's not hard to come up with this idea. And it's not even true, model trained on clothed children and nude adults wouldn't know how children's genitals look like.

jacques_chester · 2026-04-22T05:06:25 1776834385

If it's not in an 8K filing it isn't real.

mlinsey · 2026-04-21T18:47:26 1776797246

You're observing that:

a) effective price-per-token is rising b) there is insufficient compute to meet the demand.

And your conclusion is that the industry is circling the drain and due to collapse?

svnt · 2026-04-21T18:57:16 1776797836

They are different observations, I think, though the phrasing confuses it:

a) cost per successful task is rising — eg claude max allocation is functionally shrinking

b) is there enough potential cost reduction in the queue to make up the gap

c) if open models converge on a more efficient but slightly-less capable point (which has effectively happened) what is the actual moat?

mlinsey · 2026-04-21T20:25:32 1776803132

Yes, cost per successful task is rising - ie, we are all paying effectively more for AI.

And yet - Anthropic is still struggling to have enough capacity to serve demand - they are virtually sold out.

And yes, are almost-as-good open models, on part with the closed models from 6 months ago (at worst), that are just a single Openrouter API call away, and yet Anthropic is still selling out. So people are paying for the premium product anyway, for whatever reason - maybe the last bit of intelligence is worth it, maybe they like the harnesses/products around the models, maybe it's a brand/enterprise sales thing.

Put aside your feelings about the AI industry and imagine we are talking about thingamajigs. Prices for thingamajigs are going up. They are still selling out about as fast (or faster) than the company selling them can build factories. There are more cost-effective competitors already in the market, but thingamajigs are selling out anyway.

Would you, looking at the thingamajig industry, conclude the "jig is almost up"? That "the returns aren’t anywhere close to what investors expect" and that the impending IPO is all some desperate hail mary to save things before the collapse?

svnt · 2026-04-21T21:42:53 1776807773

I don’t have feelings about the AI industry to put aside. I would not have sufficient information to assess whether thingamajigs are legitimately valuable or whether they are tulips. The only indicator I see is the last point about people using it in the short term despite having access to cost effective alternatives, which actually points to irrationality/FOMO more than legitimate value.

What we are looking at looks to me like it is rapidly becoming a a commodity: it will become as existential as electricity and water to businesses, and it will be sold and marketed and regulated, more or less like a utility.

waterloser · 2026-04-21T22:52:41 1776811961

Nice em-dash there bro

svnt · 2026-04-21T23:12:34 1776813154

Thanks I am the source. em-dashing since 1997

Argonaut998 · 2026-04-22T18:47:27 1776883647

They can't wait forever, especially at this level of investment

mlinsey · 2026-04-17T17:21:12 1776446472

I agree, but also the model intelligence is quite spikey. There are areas of intelligence that I don't care at all about, except as proxies for general improvement (this includes knowledge based benchmarks like Humanity's Last Exam, as well as proving math theorems etc). There are other areas of intelligence where I would gladly pay more, even 10X more, if it meant meaningful improvements: tool use, instruction following, judgement/"common sense", learning from experience, taste, etc. Some of these are seeing some progress, others seem inherent to the current LLM+chain of thought reasoning paradigm.

manmal · 2026-04-17T19:20:30 1776453630

Common sense isn’t a language pattern. I doubt this will ever work w/ LLMs.

mlinsey · 2026-04-18T04:29:46 1776486586

The models that we are paying to generate tokens are already not really just LLMs, as anyone studying language models ten years ago (or someone who describes them as "next token predictors") would understand them. Doing a bunch of reinforcement learning so that a model performs better at ssh'ing into my server and debugging my app is already realllly stretching the definition of "language pattern".

I think when we do get AI that can perform as well as a human at functionally all tasks, they will be multi-paradigm systems; some components will not resemble anything in any commercial system today, but one component will be recognizably LLM-like, and act as an essential communication layer.

mlinsey · 2026-04-12T15:21:45 1776007305

Different users do seem to be encountering problems or not based on their behavior, but for a rapidly-evolving tool with new and unclear footguns, I wouldn't characterize that as user error.

For example, I don't pull in tons of third-party skills, preferring to have a small list of ones I write and update myself, but it's not at all obvious to me that pulling in a big list of third-party skills (like I know a lot of people do with superpowers, gstack, etc...) would cause quota or cache miss issues, and if that's causing problems, I'd call that more of a UX footgun than user error. Same with the 1M context window being a heavily-touted feature that's apparently not something you want to actually take advantage of...

mlinsey · 2026-04-07T18:41:40 1775587300

I'm pretty optimistic that not only does this clean up a lot of vulns in old code, but applying this level of scrutiny becomes a mandatory part of the vibecoding-toolchain.

The biggest issue is legacy systems that are difficult to patch in practice.

qingcharles · 2026-04-07T20:51:42 1775595102

I could see some of these corps now being able to issue more patches for old versions of software if they don't have to redirect their key devs onto prior code (which devs hate). As you say though, in practice it is hard to get those patches onto older devices.

I'm looking at you, Android phone makers with 18 months of updates.

phist_mcgee · 2026-04-07T21:51:37 1775598697

Yeah but who pays the enormous cost?

khalic · 2026-04-08T08:42:01 1775637721

obviously the people responsible for the software. Would you rather anthropic kept the vulns quiet?

repelsteeltje · 2026-04-08T10:45:29 1775645129

Off course not, but there is infinitely more vulnerable software escaping Anthropic's scrutiny. And when AI-powered discovery becomes a necessity, that will lead to concentration of power to these kinds of companies.

Bruce Scheier made a comprehensive analysis of the pros and cons and forces at play for adversary and defenders [1].

I think it's safe to predict yet more money previously directed to us techies will find its way to the Anthropics of this world.

[1] https://www.schneier.com/blog/archives/2026/04/cybersecurity...

wslh · 2026-04-07T19:25:46 1775589946

I imagine that some levels of patching would be improving as well, even as a separate endeavor. This is not to say that legacy systems could be completely rewritten.

pipo234 · 2026-04-07T18:46:26 1775587586

Wait. Wasn't AI supposed to alleviate the burden of legacy code?!

mlinsey · 2026-04-07T18:59:05 1775588345

If we have the source and it's easy to test, validate, and deploy an update - AI should make those easier to update.

I am thinking of situations where one of those aren't true - where testing a proposed update is expensive or complicated, that are in systems that are hard to physically push updates to (think embedded systems) etc

rattlesnakedave · 2026-04-07T19:00:31 1775588431

Legacy code, not the running systems powered by legacy code

buzzerbetrayed · 2026-04-07T21:11:58 1775596318

If you’re still an AI skeptic at this point, I don’t know what sort of advancement could convince you that this is happening.

mlinsey · 2026-04-03T18:45:48 1775241948

I feel like every new iteration of ways to find good content online: webrings, blogrolls, user upvoting/downvoting, giving everyone their own microblog to share interesting links, ML to learn your own preferences by your behavior - they all worked really well at first, but then eroded significantly once people figured out how to game them.

The economic incentive is overwhelming to corrupt these signals, either directly (link sharing schemes, upvote rings, bots to like your content) or indirectly (shaping your content itself to have the shape of what will be promoted, regardless of its quality).

What you almost want is to use any of these ideas and hope for it to catch on widely enough in your small niche to be useful, but not so much that it comes an optimization target.

KPGv2 · 2026-04-03T19:16:46 1775243806

Smolnet might be the answer. There really isn't a feasible mechanism for monetizing it. At worst, you could have some text ad embedded. No images. Minimal semantic markup (links, lists, quotes, code, generic text) in the case of gemini/gemtext.

mlinsey · 2026-04-02T18:47:54 1775155674

It's CNBC for Silicon Valley - a combination of good background noise, a broad survey of what people are talking about around the valley, and occasionally really great interviews.

They get a lot of guests to do interviews that they wouldn't do elsewhere, in part because they are unabashedly and unapologetically cheerleaders - pro-tech, pro-VC, pro-startup, pro-Big-Tech, etc. They don't grill you like an old-school journalist would about whatever the latest political controversy is, they ring a giant gong when their guest brings up a cool traction or fundraising number.

I would never use it as my only source of news for what's going on in tech, but with a lot of other tech journalism covering the downsides or problems with the industry, there is definitely a niche for them.

mlinsey · 2026-04-02T18:28:51 1775154531

Just based on the number of very prominent guests they get to do interviews, they clearly have a lot of viewers in influential tech/vc circles, even if their total audience size isn’t huge.

i_have_an_idea · 2026-04-02T18:44:18 1775155458

That's true, but a lot of these people are also competitors. I can't imagine it'll be attractive going to the OpenAI media channel to talk about Gemini or Grok.