Hacker Newsnew | past | comments | ask | show | jobs | submit | alecco's commentslogin

Related interesting find on Qwen.

"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"

https://xcancel.com/N8Programs/status/2044408755790508113


This makes a lot of my experience with Qwen make sense. I’ve watched all the benchmarks imply how close it should be to various GPT or Claude releases, but in my own use chatting with it or trying to get it do agentic tasks it was nowhere near as smart as even GPT-3.5 for example. Meanwhile Gemma 4 casually dropped and even the 4B models were performing better than Qwen 3.5 MOE in my chats. Benchmaxxing.

They don't have demand for the price it would require for inference.

They are definitely distilling it into a much smaller model and ~98% as good, like everybody does.


Some people are speculating that Opus 4.7 is distilled from Mythos due to the new tokenizer (it means Opus 4.7 is a new base model, not just an improved Opus 4.6)

The new tokenizer is interesting, but it definitely is possible to adapt a base model to a new tokenizer without too much additional training, especially if you're distilling from a model that uses the new tokenizer. (see, e.g., https://openreview.net/pdf?id=DxKP2E0xK2).

Not impossible, but you have to be at least a little bit mad to deploy tokenizer replacement surgery at this scale.

They also changed the image encoder, so I'm thinking "new base model". Whatever base that was powering 4.5/4.6 didn't last long then.


Yes, I was thinking that. But it could as well be the other way around. Using the pretrained 4.7 (1T?) to speed up ~70% Mythos (10T?) pretraining.

It's just speculative decoding but for training. If they did at this scale it's quite an achievement because training is very fragile when doing these kinds of tricks.


Reverse distillation. Using small models to bootstrap large models. Get richer signal early in the run when gradients are hectic, get the large model past the early training instability hell. Mad but it does work somewhat.

Not really similar to speculative decoding?

I don't think that's what they've done here though. It's still black magic, I'm not sure if any lab does it for frontier runs, let alone 10T scale runs.


> They don't have demand for the price it would require for inference.

citation needed. I find it hard to believe; I think there are more than enough people willing to spend $100/Mtok for frontier capabilities to dedicate a couple racks or aisles.


Apple got it right with unified memory with wide bus. That's why Mac Minis are flying for local models. But they are 10x less powerful in AI TOPS. And you can't upgrade the memory.

I really wish AMD and Intel boards get replaced by competent people. They could do it in very short time. Both have integrated GPUs with main memory. AMD and Intel have (or at least used to have) serious know-how in data buses and interconnects, respectively. But I don't see any of that happening.

ROCm? It can't even support decent Attention. It lacks a lot of features and NVIDIA is adding more each year. Soon they will reach escape velocity and nobody will catch them for a decade. smh


Aren't mac minis flying for "local models" because people have no clue what they are doing?

All those people who bought them for openclaw just bought them because it was the trendy thing to do. No one of those people is running local models on there.


> I really wish AMD and Intel boards get replaced by competent people.

Intel? Agreed. But AMD is making money hand over fist with enterprise AI stuff.

Right now, any effort that AMD or NVIDIA expend on the consumer sector is a waste of money that they could be spending making 10x more at the enterprise level on AI.


They aren't flying outside US, or countries with similar salary levels.

Granted, I feel like NVIDIA GPU pricing is such that Mac minis will be way less than 10x cheaper if not already, so one might still get ahead purchasing a bulk order of Mac minis....

A 5090 will cost you about the same amount of money as a Mac Studio M3 Ultra with eight times the RAM.

It's pretty insane how overpriced NVIDIA hardware is.


The 256GB Mac Studio (the one with "eight times the RAM") is listed for ~$2000 more than the current 5090 prices, and another additional $1500 for the 80-core GPU variant. Only the "base" model with 96gb is a remotely similar price, $3600-$4000.

And a 5090 has a little over 2x the memory bandwidth - ~820GB/s vs ~1790GB/s. And significantly higher peak FLOPS on the 5090 too.

Sure, if the goal is to get the "Cheapest single-device system with 256GB ram" it looks pretty good, but there's lots of other axes it falls down on. Great if you know you don't care about them, but not "Better In Every Way". Arguably, better in only a single way - but that single way may well be the one you need.

And the current 5090 price might be a transient peak - only three months ago they were closer to $2500 - significantly less than half the $6000 base-spec 256GB Mac Studio. While the Mac Studio has been constant.


Yes but the 5090 can run games.

Running games on my loaded M4 Max is worse than on my 3090 despite the over-four-year generational gap.

Like, Pacific Drive will reach maybe 30fps at less than 1080p whereas the 3090 will run it better even in 4K.

That could just be CrossOver's issue with Unreal Engine games, but "just play different games" is not a solution I like.


It seems like general improvements in ram efficiency, such as that used in Gemma 4, means it’s back to memory bandwidth as the bottleneck and less about total available memory size. I’m also curious to see how much more agent autonomy will reduce less need for low latency and shift the focus to more throughput. Meaning it’s easier to spread the model out over multiple smaller GPUs and use pipeline parallelism to keep them busy. This would also mean using ram for market discrimination becomes less effective.

But the 5090 can run Crysis

> I'm afraid the music may be slowly fading at this party, and the lights will soon be turned on. We may very well look back on the last couple years as the golden era of subsidized GenAI compute.

Indeed. Anthropic is just leading the pack switching to juicy corporate users who are happy to pay thousands per month per dev and leave the fans behind. And now OpenAI is following suit. They lowered significantly the limits for the Plus $20 plan and answered concerns with vague confusing tweets about promotions.

All this is pushed by the fastest rising demand (Codex growing +50% monthly) while having a serious bottleneck building data centers and getting parts (permits, energy, memory, flash, etc).

Users on reddit and Discord are trying to switch to open models or Chinese alternatives. But there's no real replacement.


I don't know about users on reddit and discord, but the open models are essentially at SotA with a 3-4 months delay. That puts a hard backstop at what OpenAI and Anthropic can do before I personally can cut them off entirely without losing too much.

Granted the experience can be worse, esp. if you're using it very hands-off and not like a junior assistant who's extremely fast but doesn't know what he's doing at the architecture and strategy level. But even for that I'm relatively confident the Chinese will be competitive pretty soon, and they won't be too expensive. And we know this because we can see their current models and we know what it takes to run them.

Currently my Strix Halo computer that costed me under £3k can do a lot of LLM stuff that is perfectly useful. In some ways, it's better than "cloud" models, I have models that essentially don't say "no" and I have relatively predictable setups. If you want to get fancy, you can right now rent compute to run models that are extremely capable like the latest ones from Kimi, GLM, Qwen, Minimax at full size from providers that are not operating at a loss and it won't be too expensive. You can pool resources to do the same locally. You can do stuff that cloud providers are unlikely to market, like distillation and abliteration to serve your specific needs.

I'm very optimistic about open weights models just the way they are right now.

But I agree with you that OpenAI will likely play similar games to Anthropic and it could be soon.


What was wrong with a $20 palm rest/cover? It would also protect it for resale value.

For the past 10 years I found most movies to be unwatchable and not worth the time. Last one I saw was Project Hail Mary at a cinema and it was really bad in spite of a huge budget (more than Interstellar!).

So long, Hollywood.


Project Hail Mary was not bad.

All these, include the war and oil spike, and all the current liquidity dramas, will be historic rounding errors.

Boomers already started to burn their $78 trillions in savings. And taxes will skyrocket for the rest to pay their fat unfunded pensions. Oh, and don't forget giving them subsidized/free healthcare. And a last FU, they collude to rise rent (they are the landlord class).

But hey, they never forget to vote.


https://pricedingold.com/sp-500/

A matter of perspective.


> taking care of the elderly is a fundamental tenet of Judaism

As it's done in most of Europe, Latin America, Asia, Africa. US/Canada were turned against family and tradition thanks to decades of brainwashing by movies, TV, and ads. To be cool you had to go to a college in another state. Then move to a big city. Pick career over family and having children. Take antidepressants.

This started in the 70s and is well documented.


https://archive.ph/qL0dp

Middle East war to push American price growth to ‘highest in G7’

The Middle East crisis will fuel a surge in US inflation to 4.2 per cent this year, the highest in the G7, according to an OECD forecast that highlights the cost of the US-Israeli war with Iran.


Inflation makes the very rich even richer and more powerful, that warning is good news to them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: