Hacker Newsnew | past | comments | ask | show | jobs | submit | tmaly's commentslogin

For me, I volunteered to teach something for free and help people.

They were more than happy to write me testimonials.


I see a lot of complaints on X about 4.7. Boris just dropped a post on how to use Opus 4.7 in Claude Code.

I guess they broke continuity with a 0.1 in model version change in some ways.


They swapped the tokenizer which either means a new pretrain, or token/weights surgery. The latter one seems more likely both because

- economics: i'd wager a bet that Opus 4.7 is just distilled Mythos Preview - performance: surgery like this would explain the spiky performance and weird issues

just spitballing tho


I think you can teach some skills through games. Coding in a REPL loop is great for learning certain types of problem solving since the feedback loop is so tight.

Chess is another good one, but the feedback loop is not nearly as tight.


the feedback loop point is exactly it. flight simulators work because every decision gets immediate feedback. most real life skill practice has the worst feedback loop of all, sometimes you don’t find out you negotiated badly until years later. that’s what we’re trying to compress

Did I miss something, or was Section 702 the same thing used on Trump during his first term?

One of Australia's two remaining refineries, the one that makes jet fuel, just caught on fire.

Uncertainty over fuel supplies after major fire at oil refinery in Geelong (2 points) https://news.ycombinator.com/item?id=47785552 https://www.abc.net.au/news/2026-04-16/geelong-corio-refiner...

What is the min VRAM this can run on given it is MOE?

Fwiw, with its predecessor's Qwen3.5-35B-A3B-Q6_K.gguf, on a laptop's 6 GB VRAM and 32 GB RAM, with default llama.cpp settings, I get 20 t/s generation.

Have you tried running llama.cpp with Unified Memory Access[1] so your iGPU can seamlessly grab some of the RAM? The environment variable is prefixed with CUDA but this is not CUDA specific. It made a pretty significant difference (> 40% tg/s) on my Ryzen 7840U laptop.

1 - https://github.com/ggml-org/llama.cpp/blob/master/docs/build...


Your link seems to be describing a runtime environment variable, it doesn't need a separate build from source. I'm not sure though (1) why this info is in build.md which should be specific to the building process, rather than some separate documentation; and (2) if this really isn't CUDA-specific, why the canonical GGML variable name isn't GGML_ENABLE_UNIFIED_MEMORY , with the _CUDA_ variant treated as a legacy alias. AIUI, both of these should be addressed with pull requests for llama.cpp and/or the ggml library itself.

You are right that it is an environment variable, and that's how I have it set in my nix config. Thanks for correcting that.

Unfortunately llama.cpp is somewhat notorious for having lackluster docs. Most of the CLI tools don't even tell you what they are for.


Hmm. Perhaps there's a niche for a "The Missing Guide to llama.cpp"? Getting started, I did things like wrapping llama-cli in a pty... and only later noticing a --simple-io argument. I wonder if "living documents" are a thing yet, where LLMs keep an eye on repo and fora, and update a doc autonomously.

I hadn't tried that, thanks! I found simply defining GGML_CUDA_ENABLE_UNIFIED_MEMORY, whether 1, 0, or "", was a 10x hit to 2 t/s. Perhaps because the laptop's RAM is already so over-committed there. But with the much smaller 4B Qwen3.5-4B-Q8_0.gguf, it doubled performance from 20 to 40+ t/s! Tnx! (an old Quadro RTX 3000 rather than an iGPU)

That is pretty solid, I have a 2070 with 8GB VRAM and 64GB RAM, but I haven't run too much. I regret not getting a 3090 back when I built this machine.

Nod. Mine was VR dev leftovers. Fwiw, running 6ish prompts in parallel, roughly doubles my aggregate t/s (but requires cooling kludgery). If one's goal is not local, but rather real-time or consistent or transparent or scalable, there's AWS.

I am waiting for the 2x usage window to close to try it out today.

If they are charging 2x usage during the most important part of the day, doesn't this give OpenAI a slight advantage as people might naturally use Codex during this period?


I can't help but notice the quality of the writing on this article is very low. Years ago Wired use to write with quite a bit more flair.

A decade or so ago before the Conde Nast take over.

I would be happy if all the laws both at the state and federal level were under version control and we could see who added each line.

"propaganda engineering" should be a new role to replace growth


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: