More

mips_avatar · 2026-05-05T06:06:44 1777961204

You can fully train a 1.6b model on a single 3090. That’s a reasonably big model.

electroglyph · 2026-05-05T06:41:57 1777963317

you can train it, but not fully

mips_avatar · 2026-05-06T05:16:01 1778044561

I trained karpathys d28 1.6b nanochat on a 3090. Took an extremely long time but I did it.

mips_avatar · 2026-05-03T07:55:51 1777794951

Microsoft is trying to sell things like extended servicing agreements. They purposefully make Windows worse so they can sell you solutions to fix it. They purposefully keep it insecure so you need their updates. It’s about taking the customers hostage.

mips_avatar · 2026-04-24T01:20:35 1776993635

I think if modern LLMs were invented in the mid 2010s it would have been promoted in more positive ways, but because everyone is afraid for their economic security saying scary things gets more of a response. I think it's kind of gross that it's a race to scare ordinary people and especially Dario Amodei should feel kind of ashamed of himself.

mips_avatar · 2026-04-16T20:30:40 1776371440

with Argo networking

mips_avatar · 2026-04-16T20:30:25 1776371425

So it's basically just openrouter with cloudflare argo networking? I feel like they could do some much more interesting stuff with their replicate acquisition. Application specific RL is getting so good but there's no good way to deploy these models in a scalable way. Even the providers like fireworks which claim to let you deploy LORAs in a scalable way can't do it. For now I literally have to host base load on my application on a rack of 3090s in my garage which seems silly but it saves me $1k a month.

bryden_cruz · 2026-04-17T07:38:46 1776411526

Running a rack of 3090s in your garage to avoid provider lock-in/costs is the most Hacker News thing. Out of curiosity, what are you doing for uptime/failover? If you are running production traffic to that garage rack, does your app just degrade gracefully if your home internet drops, or do you have a cloud fallback?

mips_avatar · 2026-04-17T16:59:20 1776445160

Yeah the model i'm running locally is just one of several models the app supports and it falls back to others if not available.

jonfromsf · 2026-04-17T03:30:07 1776396607

Gilfoyle? Is that you?

mips_avatar · 2026-04-17T04:24:19 1776399859

I think these gpus were actually used for bitcoin mining before I bought them

menno-sh · 2026-04-17T16:54:59 1776444899

It's Anton's grandson!

vladgur · 2026-04-16T22:00:40 1776376840

Curious which models are you able to run and how many 3090s do they require at scale?

mips_avatar · 2026-04-16T22:20:55 1776378055

4 3090s with nvlinks on each pair. Super fast inference on Moe models around 20-36b

embedding-shape · 2026-04-17T16:00:46 1776441646

> Super fast inference

How fast is "super fast" exactly, and with what runtime+model+quant specifically? Curious to see how how 4x 3090s compare to 1x Pro 6000, could probably put together 4x 3090s for a fraction of the cost compared to the Pro 6000, but the times I've seen the tok/s in/out for multiple GPUs my heart always drops a little.

mips_avatar · 2026-04-17T16:43:03 1776444183

I haven't benchmarked against a pro 6000, it's more that i have 4 3090s and i don't have a pro 6000.

embedding-shape · 2026-04-17T17:00:25 1776445225

Yes, that's why I'm asking you what exactly 4 3090s get in prompt-processing and generation, sorry if I was unclear.

mips_avatar · 2026-04-17T19:29:25 1776454165

Maxes out around 4K tok/s output. Each pair of 3090s has its own instance of the model with parallelism across the nvlink bridge. Though nvlink is only 2x over pcie5

ascorbic · 2026-04-17T11:39:21 1776425961

The interesting part is that you can use the same API with Workers AI models (hosted at the edge) and proxied models (OpenRouter-style).

Disclaimer: I work at Cloudflare, but not on this.

mips_avatar · 2026-04-17T23:11:47 1776467507

It's the same problem as fireworks, the only models supporting LORA are like year old dense models that perform horribly on most tasks. If you want to do anything close to relevant you still need to rent/own dedicated GPUs, which seems insane to me when vLLM fully support dynamic LORA loading.

mips_avatar · 2026-03-11T21:25:17 1773264317

He's roommates with an Anthropic researcher, I was roommates with a Google product manager I don't think I'm really bought out by Google.

iso-logi · 2026-03-12T00:02:10 1773273730

[flagged]

pibaker · 2026-03-12T04:43:59 1773290639

I don't know GP's situation. But in the case of the linked article, given anthropic's tie to the Bay Area "rationalist" community, one possible reason why the author has a roommate is he bought in to the rationalist "group house" culture and moved in with one of them.

linkregister · 2026-03-12T00:55:00 1773276900

Product managers aren't management. They manage the trajectory of a software initiative. They can be hired straight out of college.

Rents in the San Francisco Bay area are too high to live a practical distance from job centers as a junior without roommates.

etrautmann · 2026-03-12T00:35:57 1773275757

What? Plenty of people prefer to live with roommates, especially in the bay.

chrsw · 2026-03-12T00:37:26 1773275846

Bought out, bought in. Is the distinction important?

dang · 2026-03-12T05:32:05 1773293525

mips_avatar is describing neither.

Henchman21 · 2026-03-11T23:44:31 1773272671

You don't think that. But I do. Prove you aren't.

Analemma_ · 2026-03-12T00:53:28 1773276808

It is literally impossible to prove a negative, that’s how conspiracy thinking operates and it’s why fortunately the justice system operates on the opposite principle and requires proof of guilt.

It’s true that in some circumstances we require avoiding even the appearance of impropriety or a conflict of interest, but that’s simply too large a burden to impose on everyone all of the time, especially for allegedly dire sins like “having a roommate who works for a Google”

mips_avatar · 2026-03-10T21:24:49 1773177889

Have you tried any really big models on a mac studio? I'm wondering what latency is like for big qwens if there's enough memory.

sanchitmonga22 · 2026-03-11T01:56:46 1773194206

Not yet with MetalRT, right now we support models up to ~4B parameters (Qwen3 4B, Llama 3.2 3B, LFM2.5 1.2B). These are optimized for the voice pipeline use case where decode speed and latency matter more then model size.

Expanding to larger models (7B, 14B, 32B) on machines with more unified memory is on the roadmap. The Mac Studio with 192GB would be an interesting target, a 32B model at 4-bit would fit comfortably and MetalRT's architectural advantages (fused kernels, minimal dispatch overhead) should scale well.

What model / use case are you thinking about? That helps us prioritize.

mips_avatar · 2026-03-11T08:05:49 1773216349

Well it’s just more that I’ve noticed in the agents I’ve built that qwen doesn’t get reliable until around 27b so unless you want to rl small qwen I don’t think I would get much useful help out of it.

sanchitmonga22 · 2026-03-11T15:02:47 1773241367

That tracks with what we've seen too. For agent workflows with reliable tool calling, you really do need the larger models. Larger model support is a priority for us. Thanks for the data point.

asimovDev · 2026-03-11T09:22:51 1773220971

I am running 80b Qwen coder next 4bit quant MLX version on a 96GB M3 MacBook and it responds quickly, almost immediately. I can fit the model + 128k context comfortably into the memory

mips_avatar · 2026-03-09T21:44:01 1773092641

The striking thing I heard from Meta staff is that Alexandr Wang would walk around campus with very obvious bodyguards surrounding him. Like sure maybe security is needed, but the decision to be surrounded with bouncerish guys says something about him.

DonThomasitos · 2026-03-09T21:55:12 1773093312

But even on campus? Weird.

mips_avatar · 2026-03-09T22:19:14 1773094754

Yeah on campus apparently

UltraSane · 2026-03-09T21:58:13 1773093493

That is just being obnoxiously self important.

0xy · 2026-03-09T21:59:54 1773093594

It could be required by the company. Many companies require top executives to have personal security. I'd be surprised if Zuck didn't have bodyguards even within the office. He has 24/7 security outside, so why wouldn't he inside?

mips_avatar · 2026-03-09T22:19:36 1773094776

I think the notes about Alexander I’ve heard is just how obvious his were

reverius42 · 2026-03-09T23:52:50 1773100370

Yeah, like all the tech CEOs surely have bodyguards, but they try to blend in and not be noticeable as bodyguards; sounds like these were trying to make a certain impression?

mips_avatar · 2026-03-08T16:11:40 1772986300

The key is that Andrej has really good taste. It takes a lot to make a great harness for these models.

mips_avatar · 2026-03-08T16:09:53 1772986193

nanochat is super capable, the d34 (2.2b) variant is competitive with qwens of that size. Andrej is I assume building out the improvements in preparation for bigger training runs. We desperately need a truly open model, so i think this is incredibly important.