Hacker Newsnew | past | comments | ask | show | jobs | submit | aray07's commentslogin

Came to a similar conclusion after running a bunch of tests on the new tokenizer

It was on the higher end of Anthropics range - closer to 30-40% more tokens

https://www.claudecodecamp.com/p/i-measured-claude-4-7-s-new...


yeah thats the part that is unclear to me as well - if our usage capacity is now going to run out faster.

The same thing I've been doing all the time, now has used up 1/3rd of my week in one day on max20.

So yes, for the same tasks, usage runs out faster (currently)


im running some experiments on this but based on what i have seen on my own personal data - I dont think this is true

"given that Opus 4.7 on Low thinking is strictly better than Opus 4.6 on Medium, etc., etc.”

Opus 4.7 in general is more expensive for similar usage. Now we can argue that is provides better performance all else being equal but I haven’t been able to see that


effort level is separate from tokenization. Tokenization impacts you the same regardless.

I find 5 thinking levels to be super confusing - I dont really get why they went from 3 -> 5


i think the new qwen models are supposed to be good based on some the articles that i read

anthropic’s pricing is all based on token usage

https://platform.claude.com/docs/en/about-claude/pricing

So if you are generating more tokens, you are eating up your usage faster


are you okay with paying more for your services without any perceived improvement in the service itself?

That's been a constant for my entire adult life.

yeah thats is my biggest issue - im okay with paying 20-30% more but what is the ROI? i dont see an equivalent improvement in performance. Anthropic hasnt published any data around what these improvements are - just some vague “better instruction following"

Its enshittificating real fast. They'll just keep releasing model after model, more expensive than the last, marginal gains, but touted as "the next thing". Evangelists will say that they're afraid, it's the future, in 6 months it's all over. Anthropic will keep astroturfing on Reddit. CEOs will make even more outlandish claims.

You raised a good point, what's a good metric for LLM performance? There's surely all the benchmarks out there, but aren't they one and done? Usually at release? What keeps checking the performance of those models. At this point it's just by feel. People say models have been dumbed down, and that's it.

I think the actual future is open source models. Problem is, they don't have the huge marketing budget Anthropic or OpenAI does.


This is most likely trajectory I fear. It reminds me a lot of Oracle, where they rebrand and reskin products just to change pricing/marketing without adding anything.

Win 10, win 11, all the recent macOS,… could have been released as features and not new products

The other thing is most people don't really care about price per token or whatever but how much it will cost to execute (successfully) a task they want.

It doesn't matter if a model is e.g. 30% cheaper to use than another (token-wise) but I need to burn 2x more tokens to get the same acceptable result.


isn’t caveman a joke? why would you use it for real work?

yeah opus 4.7 feels a lot more verbose - i think they changed the system prompt and removed instructions to be terse in its responses

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: