Reading the comments here drives home an industry wide problem with these tools:...

selcuka · 2026-04-22T00:00:00 1776816000

> Most of the time, Haiku is fine.

Haiku is most definitely not fine for the code bases that I work on. Sonnet is probably fine for most daily tasks, but Opus is still needed to find that pesky bug you've been chasing, or to thoroughly review your PR.

timr · 2026-04-22T00:17:40 1776817060

> Haiku is most definitely not fine for the code bases that I work on. Sonnet is probably fine for most daily tasks, but Opus is still needed to find that pesky bug you've been chasing, or to thoroughly review your PR.

Yeah, I hear that a lot, but it never comes with proof. Everyone is special.

I’m sure you’d find that Haiku is pretty functional if there were a constraint on your use.

Aurornis · 2026-04-22T03:55:41 1776830141

I use models from Opus through Haiku and down into Qwen locally hosted models.

I don't know how anyone could believe that Haiku is useful for most engineering tasks. I often try to have it take on small tasks in the codebase with well defined boundaries to try to conserve my plan limits, but half the time I end up disappointed and feeling like I wasted more time than I should have.

The differences between the models is vast. I'm not even sure how you could conclude that Haiku is usable for most work, unless you have a very different type of workload than what I work on.

timr · 2026-04-22T05:12:00 1776834720

More information required. What are you working on? What languages? How do you define “small tasks”? What are “well-defined boundaries”? What is your workflow?

Most importantly, define your acceptance criteria. What do you mean by “disappointed” - this word is doing most of the heavy lifting in your anecdote. (i.e. I know plenty of coders who are “disappointed” by any code that they didn’t personally write, and become reflexively snobby about LLM code quality. Not saying that’s you, but I can’t rule it out, either.)

The models are not the same, but Haiku is definitely not useless, and without a lot more detail, I just ignore anecdotal statements with this sort of hyperbole. Just to illustrate the larger point, I find something wrong with nearly everything Haiku writes, but then again, I don’t expect perfection. I’d probably get a “better” end result for most individual runs with the more expensive models, but at vastly higher cost that doesn’t justify the difference.

krzyk · 2026-04-22T08:28:00 1776846480

I use Haiku frequently, and for my codebase it is working fine.

But I'm not vibecoding, I don't let models do large work or refactorings, this is just for some small boring tasks I don't want to do.

zdragnar · 2026-04-22T01:02:53 1776819773

I don't think it's really helpful to tell people they're holding it wrong, especially when you hear the problem a lot.

Maybe, just maybe, the tool isn't suitable for all problem spaces.

timr · 2026-04-22T01:08:01 1776820081

> I don't think it's really helpful to tell people they're holding it wrong

I’m not saying that. If anything, it really doesn’t matter much what model you use, and it’s only a case of “you’re holding it wrong” in the sense that you have to use your brain to write code, and that if you outsource your thinking to a machine, that’s the fundamental mistake.

In other words, it’s a tool, not a magic wand. So yeah, you do have to understand how to use it, but in a fairly deterministic way, not in a mysterious woo-woo way.

macintux · 2026-04-22T02:42:43 1776825763

“Everyone is special” is a snarky, derogatory comment we don’t need here.

timr · 2026-04-22T03:00:51 1776826851

It’s not snarky. It’s literally the argument people are making: I am special, my use case is exceptional, therefore I need to use the special tool, even if you don’t need to.

qingcharles · 2026-04-25T00:25:38 1777076738

Also, GitHub Copilot dropped down to defaulting to GPT5.3 for most requests after culling Opus, and honestly, that was a SOTA model only weeks ago. It blew people's minds about how good it was. And now we're all here moaning. It's even worse than your Ferrari->Bicycle example. It's like they took away your 2026 Ferrari and gave "just" a 2025 Ferrari.

enraged_camel · 2026-04-22T08:28:39 1776846519

>> Yeah, I hear that a lot, but it never comes with proof. Everyone is special.

You were the one who made the claim that Haiku is fine most of the time. To any reasonable person, the burden of proof is on you. Maybe you should share some high level details about your codebase, like its stack, size, problem domain, and so on? Maybe they are so generic that Haiku indeed does fine for you.

demorro · 2026-04-22T09:23:33 1776849813

Most of the people using these models aren't skilled enough to make that determination. Seems rough trying to sell yourself as the thing that means you don't need to understand what you're doing but also insist that you understand what you're doing well enough to select an appropriate model.

ncruces · 2026-04-22T08:31:37 1776846697

I think Haiku is fine (e.g.) for any task that you could almost, but not quite, complete with (regex?) find and replace.

You give it 3 examples of the change you want, then ask it to do the other 87. You'll end up saving time and “money”.

anabis · 2026-04-22T01:10:38 1776820238

AI should decide the level of model needed, and fallback if it fails. It mostly is a UX problem. Why do I need to specify the level of model beforehand? Many problems don't allow decision pre-implementation.

jeremyjh · 2026-04-22T02:04:48 1776823488

This is the approach of Auto in Cursor and I've not been impressed with it at all. I think I'm always getting Composer and while its fast it wastes my time. GLM 5.1 in OpenCode is far better and less expensive, it can do planning and implementation both very effectively. Opus is still the best but GPT 5.4 (in Codex) is good enough too, and way more affordable.

Vegenoid · 2026-04-22T03:18:39 1776827919

This would require LLMs being good at knowing when they are doing a bad job, which they are still terrible at. With a good testing and verification harness set up, sure, then it could just go to a more powerful model if it can't make tests pass. But not a lot of usage is like this.

koonsolo · 2026-04-22T08:59:10 1776848350

At the current cost, I just use the best model all the time. Why wouldn't I?

timr · 2026-04-22T01:36:03 1776821763

That’s certainly an opinion. Not one I agree with, but sure, if you entirely outsource all of your thinking to the magic box, then you probably want the box to have the strongest possible magic.

YmiYugy · 2026-04-22T09:01:58 1776848518

Because judging failure is itself a complex task requiring a potentially expensive model.

p1necone · 2026-04-22T00:34:46 1776818086

I think it heavily depends on how you're using it. If you understand your codebase and you're using it like "build a function that does x in y file" then smaller/cheaper models are great. But if you're saying "hey build this relatively complex feature following the 30,000 foot view spec in this markdown doc" then Haiku doesn't work (unless your "complex feature" is just an api endpoint and some UI that consumes it).

timr · 2026-04-22T00:48:30 1776818910

I largely agree. But that goes back to my point (albeit with mixed metaphors): there are lots of people who are just hitting things with a jackhammer in lieu of understanding how to properly use a hammer.

I basically never just yolo large code changes, and use my taste and experience to guide the tools along. For this, Haiku is perfectly fine in nearly all circumstances.

YmiYugy · 2026-04-22T09:00:22 1776848422

Of course you don't NEED the better models, but figuring out what model you need can waste a lot of time and effort. Even when a cheap model is capable of a task it needs a lot more guidance than a more expensive one. They are also less reliable. You can waste a lot of time cleaning up after them. Judging whether something is good enough is hard work and rerolling with a more expensive model is painful. Judging the difficulty of a task ahead of time is very hard. Judging how good a model is for a given task even harder, especially when models and harnesses keep changing all the time. The real productivity boost LLMs provide is already modest and when you start tinkering with models it can easily evaporate.

Aurornis · 2026-04-22T03:54:16 1776830056

> people are just using the latest and most expensive models because they can, and because they’re cargo-culting. This is perhaps the first time that software has had this kind of problem, and coders are not exactly demonstrating great discretionary decision making.

> I’ve been using Anthropic models exclusively for the last month on a large, realistic codebase, and I can count the number of times I needed to use Opus on one hand. Most of the time, Haiku is fine. About 10% of the time I splurge for Sonnet, and honestly, even some of those are unnecessary.

You and I couldn't have more different experiences. Opus 4.7 on the max setting still gets lost and chokes on a lot of my tasks.

I switch to Sonnet for simpler tasks like refactoring where I can lay out all of the expectations in detail, but even with Opus 4.7 I can often go through my entire 5-hour credit limit just trying to get it to converge on a reasonable plan. This is in a medium size codebase.

For the people putting together simple web apps using Sonnet with a mix of Haiku might be fine, but we have a long way to go with LLMs before even the SOTA models are trustworthy for complex tasks.

timr · 2026-04-22T05:23:50 1776835430

I don’t use Haiku for planning of big tasks, so we basically agree on that. But even just Sonnet 4.6, on a fairly large codebase, only truly goes into the weeds maybe 10% of the time for me. I also write pretty specific initial prompts, and have a good idea of how I want the code to work before I start prompting. For example, sometimes I will spend several hours writing a spec before even picking up the power tools.

I have never had the situation you describe, where Opus won’t come up with “a reasonable plan”, but your definition of “reasonable” might be very different than mine, and of course, running through your credit limit is an entirely tangential problem.

computerex · 2026-04-22T01:31:28 1776821488

Model selection for day to day tasks based on vibes is not very scientific. Micromanaging the model doesn't seem like a great idea when doing real professional work with professional goals/deadlines/pressures.

selcuka · 2026-04-22T01:58:26 1776823106

> Micromanaging the model doesn't seem like a great idea when doing real professional work with professional goals/deadlines/pressures.

Remember that it's not only the cost per token, but also speed. Some tasks are done faster with simpler/less-thinking models, so it might actually make sense to micromanage the model when you have deadlines.

computerex · 2026-04-22T03:28:12 1776828492

If you're using the models to generate 99%-100% of the code, then it doesn't make sense to plug yourself into the loop as a bottleneck.

timr · 2026-04-22T01:39:11 1776821951

It’s deeply ironic that the folks who want to outsource as much thought to the model as possible are saying that my stance - use your brain to decide the right tool for the job - is tantamount to “vibes”.

computerex · 2026-04-22T03:25:36 1776828336

You are being deeply reductive and that's against the spirit of hacker news. The issue is that models are difficult to objectively benchmark. The benchmarks don't always align with real world performance. It's not easy and clear cut to determine which model will work best in a given situation. It boils down to loose experiences/anecdotes. Do you have an objective criteria for model selection that you have tested to be effective with reproducible tests?

polski-g · 2026-04-22T04:24:56 1776831896

85% of my code tasking can be handled by either GLM or Sonnet. The truth of the matter is that most software isn't that complicated. Even more hilarious is that people were running Opus on their OpenClaw setups. I'm glad Anthropic kicked them to the curb.

fy20 · 2026-04-22T04:01:12 1776830472

I think the reason is two fold:

- If you pay for unlimited trips will you choose the Ferrari or the old VW? Both are waiting outside your door, ready to go.

- Providers that let you choose models don't really price much difference between lower class models. On my grandfathered Cursor plan I pay 1x request to use Composer 2 or 2x request to use Opus 4.6. Until the price is more differentiated so people can say "ok yes Opus is smarter, but paying 10x more when Haiku would do the same isn't worth it" it won't happen.

timr · 2026-04-22T05:53:52 1776837232

Agreed on both points. We’re dealing with a cost/benefit analysis, and to this point, coders have been subsidized, coerced…maybe even mandated into using the most expensive option as if it was a limitless resource. Clearly not true, and so of course we’re going to see nerfing of the tools over time.

Obviously we’re a long way away from being able to rationally evaluate whether the value of X tokens in model Y is better than model Z, let alone better in terms of developer cost, but that’s kind of where we need to get to, otherwise the model providers are selling magic beans rated in ineffable units of magicalness. The only rational behavior in such a world is to gorge yourself.

adonese · 2026-04-22T07:29:45 1776842985

>people are just using the latest and most expensive models because they can,

While I agree with the sentiment, I think that might have been initially driven by older models being nerfed and/or newer ones were better at token/$. And there is this notion that those labs don't constraint the model on the first days after its release.

novia · 2026-04-26T01:24:41 1777166681

today was my first time using copilot since the change. i never told it to always use opus or the latest codex model. i always relied on the auto-router to choose which model to use. It always worked well. Today it sucks, because they took away the option for the autorouter to use the best models. The thing that is failing is something extremely simple. Maybe the routing component got broken with this new change

koonsolo · 2026-04-22T08:56:28 1776848188

> coders are not exactly demonstrating great discretionary decision making.

From a business perspective, why would I start thinking about which model to use, when I could cheaply always use the best model?

CGamesPlay · 2026-04-22T05:33:16 1776835996

Claude Code doesn't have an option to use Opus 4.6 any more for me. It was great, but I guess now I have to use it half as much or upgrade my subscription again.

esafak · 2026-04-22T01:41:23 1776822083

It is not that simple; companies retire old models. I wanted to use 5.1 Codex Max to save money and I could not on my subscription.

to11mtm · 2026-04-22T00:15:35 1776816935

> I’ve been using Anthropic models exclusively for the last month on a large, realistic codebase, and I can count the number of times I needed to use Opus on one hand. Most of the time, Haiku is fine. About 10% of the time I splurge for Sonnet, and honestly, even some of those are unnecessary.

I mean at some point some people learn...

I was doing Opus for nasty stuff or otherwise at most planning and then using Sonnet to execute.

Buuuuut I'm dealing with a lot of nonstandard use cases and/or sloppy codebases.

Also, at work, Haiku isn't an enabled model.

But also, if I or my employer are paying for premium requests, then they should be served appropriately.

As it stands this announcement smells of "We know our pricing was predatory and here is the rug pull."

My other lesser worry isn't that Opus 4.7 has a 7.5x multi, it's that the multiplier is quoted as an 'introductory' rate.

ThunderSizzle · 2026-04-22T00:19:12 1776817152

Haiku is complete crap compared to sonnet in GHCP. A basic task in Haiku takes 3 prompts with a lot of correction. 1 prompt in sonnet. It isn't worth a third of the price if I have to fix it twice.