Reading the comments here drives home an industry wide problem with these tools: people are just using the latest and most expensive models because they can, and because they’re cargo-culting. This is perhaps the first time that software has had this kind of problem, and coders are not exactly demonstrating great discretionary decision making.
I’ve been using Anthropic models exclusively for the last month on a large, realistic codebase, and I can count the number of times I needed to use Opus on one hand. Most of the time, Haiku is fine. About 10% of the time I splurge for Sonnet, and honestly, even some of those are unnecessary.
Folks are complaining because they lost unlimited access to a Ferrari, when a bicycle is fine for 95% of trips.
Haiku is most definitely not fine for the code bases that I work on. Sonnet is probably fine for most daily tasks, but Opus is still needed to find that pesky bug you've been chasing, or to thoroughly review your PR.
Most of the people using these models aren't skilled enough to make that determination. Seems rough trying to sell yourself as the thing that means you don't need to understand what you're doing but also insist that you understand what you're doing well enough to select an appropriate model.
> Haiku is most definitely not fine for the code bases that I work on. Sonnet is probably fine for most daily tasks, but Opus is still needed to find that pesky bug you've been chasing, or to thoroughly review your PR.
Yeah, I hear that a lot, but it never comes with proof. Everyone is special.
I’m sure you’d find that Haiku is pretty functional if there were a constraint on your use.
I use models from Opus through Haiku and down into Qwen locally hosted models.
I don't know how anyone could believe that Haiku is useful for most engineering tasks. I often try to have it take on small tasks in the codebase with well defined boundaries to try to conserve my plan limits, but half the time I end up disappointed and feeling like I wasted more time than I should have.
The differences between the models is vast. I'm not even sure how you could conclude that Haiku is usable for most work, unless you have a very different type of workload than what I work on.
More information required. What are you working on? What languages? How do you define “small tasks”? What are “well-defined boundaries”? What is your workflow?
Most importantly, define your acceptance criteria. What do you mean by “disappointed” - this word is doing most of the heavy lifting in your anecdote. (i.e. I know plenty of coders who are “disappointed” by any code that they didn’t personally write, and become reflexively snobby about LLM code quality. Not saying that’s you, but I can’t rule it out, either.)
The models are not the same, but Haiku is definitely not useless, and without a lot more detail, I just ignore anecdotal statements with this sort of hyperbole. Just to illustrate the larger point, I find something wrong with nearly everything Haiku writes, but then again, I don’t expect perfection. I’d probably get a “better” end result for most individual runs with the more expensive models, but at vastly higher cost that doesn’t justify the difference.
> I don't think it's really helpful to tell people they're holding it wrong
I’m not saying that. If anything, it really doesn’t matter much what model you use, and it’s only a case of “you’re holding it wrong” in the sense that you have to use your brain to write code, and that if you outsource your thinking to a machine, that’s the fundamental mistake.
In other words, it’s a tool, not a magic wand. So yeah, you do have to understand how to use it, but in a fairly deterministic way, not in a mysterious woo-woo way.
It’s not snarky. It’s literally the argument people are making: I am special, my use case is exceptional, therefore I need to use the special tool, even if you don’t need to.
>> Yeah, I hear that a lot, but it never comes with proof. Everyone is special.
You were the one who made the claim that Haiku is fine most of the time. To any reasonable person, the burden of proof is on you. Maybe you should share some high level details about your codebase, like its stack, size, problem domain, and so on? Maybe they are so generic that Haiku indeed does fine for you.
AI should decide the level of model needed, and fallback if it fails.
It mostly is a UX problem. Why do I need to specify the level of model beforehand?
Many problems don't allow decision pre-implementation.
This is the approach of Auto in Cursor and I've not been impressed with it at all. I think I'm always getting Composer and while its fast it wastes my time. GLM 5.1 in OpenCode is far better and less expensive, it can do planning and implementation both very effectively. Opus is still the best but GPT 5.4 (in Codex) is good enough too, and way more affordable.
This would require LLMs being good at knowing when they are doing a bad job, which they are still terrible at. With a good testing and verification harness set up, sure, then it could just go to a more powerful model if it can't make tests pass. But not a lot of usage is like this.
That’s certainly an opinion. Not one I agree with, but sure, if you entirely outsource all of your thinking to the magic box, then you probably want the box to have the strongest possible magic.
Of course you don't NEED the better models, but figuring out what model you need can waste a lot of time and effort.
Even when a cheap model is capable of a task it needs a lot more guidance than a more expensive one.
They are also less reliable. You can waste a lot of time cleaning up after them.
Judging whether something is good enough is hard work and rerolling with a more expensive model is painful.
Judging the difficulty of a task ahead of time is very hard. Judging how good a model is for a given task even harder, especially when models and harnesses keep changing all the time.
The real productivity boost LLMs provide is already modest and when you start tinkering with models it can easily evaporate.
I think it heavily depends on how you're using it. If you understand your codebase and you're using it like "build a function that does x in y file" then smaller/cheaper models are great. But if you're saying "hey build this relatively complex feature following the 30,000 foot view spec in this markdown doc" then Haiku doesn't work (unless your "complex feature" is just an api endpoint and some UI that consumes it).
I largely agree. But that goes back to my point (albeit with mixed metaphors): there are lots of people who are just hitting things with a jackhammer in lieu of understanding how to properly use a hammer.
I basically never just yolo large code changes, and use my taste and experience to guide the tools along. For this, Haiku is perfectly fine in nearly all circumstances.
Model selection for day to day tasks based on vibes is not very scientific. Micromanaging the model doesn't seem like a great idea when doing real professional work with professional goals/deadlines/pressures.
> Micromanaging the model doesn't seem like a great idea when doing real professional work with professional goals/deadlines/pressures.
Remember that it's not only the cost per token, but also speed. Some tasks are done faster with simpler/less-thinking models, so it might actually make sense to micromanage the model when you have deadlines.
It’s deeply ironic that the folks who want to outsource as much thought to the model as possible are saying that my stance - use your brain to decide the right tool for the job - is tantamount to “vibes”.
You are being deeply reductive and that's against the spirit of hacker news. The issue is that models are difficult to objectively benchmark. The benchmarks don't always align with real world performance. It's not easy and clear cut to determine which model will work best in a given situation. It boils down to loose experiences/anecdotes. Do you have an objective criteria for model selection that you have tested to be effective with reproducible tests?
> people are just using the latest and most expensive models because they can, and because they’re cargo-culting. This is perhaps the first time that software has had this kind of problem, and coders are not exactly demonstrating great discretionary decision making.
> I’ve been using Anthropic models exclusively for the last month on a large, realistic codebase, and I can count the number of times I needed to use Opus on one hand. Most of the time, Haiku is fine. About 10% of the time I splurge for Sonnet, and honestly, even some of those are unnecessary.
You and I couldn't have more different experiences. Opus 4.7 on the max setting still gets lost and chokes on a lot of my tasks.
I switch to Sonnet for simpler tasks like refactoring where I can lay out all of the expectations in detail, but even with Opus 4.7 I can often go through my entire 5-hour credit limit just trying to get it to converge on a reasonable plan. This is in a medium size codebase.
For the people putting together simple web apps using Sonnet with a mix of Haiku might be fine, but we have a long way to go with LLMs before even the SOTA models are trustworthy for complex tasks.
I don’t use Haiku for planning of big tasks, so we basically agree on that. But even just Sonnet 4.6, on a fairly large codebase, only truly goes into the weeds maybe 10% of the time for me. I also write pretty specific initial prompts, and have a good idea of how I want the code to work before I start prompting. For example, sometimes I will spend several hours writing a spec before even picking up the power tools.
I have never had the situation you describe, where Opus won’t come up with “a reasonable plan”, but your definition of “reasonable” might be very different than mine, and of course, running through your credit limit is an entirely tangential problem.
- If you pay for unlimited trips will you choose the Ferrari or the old VW? Both are waiting outside your door, ready to go.
- Providers that let you choose models don't really price much difference between lower class models. On my grandfathered Cursor plan I pay 1x request to use Composer 2 or 2x request to use Opus 4.6. Until the price is more differentiated so people can say "ok yes Opus is smarter, but paying 10x more when Haiku would do the same isn't worth it" it won't happen.
Agreed on both points. We’re dealing with a cost/benefit analysis, and to this point, coders have been subsidized, coerced…maybe even mandated into using the most expensive option as if it was a limitless resource. Clearly not true, and so of course we’re going to see nerfing of the tools over time.
Obviously we’re a long way away from being able to rationally evaluate whether the value of X tokens in model Y is better than model Z, let alone better in terms of developer cost, but that’s kind of where we need to get to, otherwise the model providers are selling magic beans rated in ineffable units of magicalness. The only rational behavior in such a world is to gorge yourself.
>people are just using the latest and most expensive models because they can,
While I agree with the sentiment, I think that might have been initially driven by older models being nerfed and/or newer ones were better at token/$. And there is this notion that those labs don't constraint the model on the first days after its release.
Claude Code doesn't have an option to use Opus 4.6 any more for me. It was great, but I guess now I have to use it half as much or upgrade my subscription again.
> I’ve been using Anthropic models exclusively for the last month on a large, realistic codebase, and I can count the number of times I needed to use Opus on one hand. Most of the time, Haiku is fine. About 10% of the time I splurge for Sonnet, and honestly, even some of those are unnecessary.
I mean at some point some people learn...
I was doing Opus for nasty stuff or otherwise at most planning and then using Sonnet to execute.
Buuuuut I'm dealing with a lot of nonstandard use cases and/or sloppy codebases.
Also, at work, Haiku isn't an enabled model.
But also, if I or my employer are paying for premium requests, then they should be served appropriately.
As it stands this announcement smells of "We know our pricing was predatory and here is the rug pull."
My other lesser worry isn't that Opus 4.7 has a 7.5x multi, it's that the multiplier is quoted as an 'introductory' rate.
85% of my code tasking can be handled by either GLM or Sonnet. The truth of the matter is that most software isn't that complicated. Even more hilarious is that people were running Opus on their OpenClaw setups. I'm glad Anthropic kicked them to the curb.
Haiku is complete crap compared to sonnet in GHCP. A basic task in Haiku takes 3 prompts with a lot of correction. 1 prompt in sonnet. It isn't worth a third of the price if I have to fix it twice.
I’ve been using Anthropic models exclusively for the last month on a large, realistic codebase, and I can count the number of times I needed to use Opus on one hand. Most of the time, Haiku is fine. About 10% of the time I splurge for Sonnet, and honestly, even some of those are unnecessary.
Folks are complaining because they lost unlimited access to a Ferrari, when a bicycle is fine for 95% of trips.