More

ACCount37 · 2026-04-18T18:59:09 1776538749

You probably are.

The "small subset" argument is profoundly unconvincing, and inconsistent with both neurobiology of the human brain and the actual performance of LLMs.

The transformer architecture is incredibly universal and highly expressive. Transformers power LLMs, video generator models, audio generator models, SLAM models, entire VLAs and more. It not a 1:1 copy of human brain, but that doesn't mean that it's incapable of reaching functional equivalence. Human brain isn't the only way to implement general intelligence - just the one that was the easiest for evolution to put together out of what it had.

LeCun's arguments about "LLMs can't do X" keep being proven wrong empirically. Even on ARC-AGI-3, which is a benchmark specifically designed to be adversarial to LLMs and target the weakest capabilities of off the shelf LLMs, there is no AI class that beats LLMs.

bigyabai · 2026-04-18T19:27:54 1776540474

> Human brain isn't the only way to implement general intelligence - just the one that was the easiest for evolution to put together out of what it had.

The human brain is not a pretrained system. It's objectively more flexible than than transformers and capable of self-modulation in ways that no ML architecture can replicate (that I'm aware of).

ACCount37 · 2026-04-18T19:32:43 1776540763

Human brain's "pre-training" is evolution cramming way too much structure into it. It "learns from scratch" the way it does because it doesn't actually learn from scratch.

I've seen plenty of wacky test-time training things used in ML nowadays, which is probably the closest to how the human brain learns. None are stable enough to go into the frontier LLMs, where in-context learning still reigns supreme. In-context learning is a "good enough" continuous learning approximatation, it seems.

bigyabai · 2026-04-18T19:36:52 1776541012

> In-context learning is a "good enough" continuous learning approximatation, it seems.

"it seems" is doing a herculean effort holding your argument up, in this statement. Say, how many "R"s are in Strawberry?

ACCount37 · 2026-04-18T19:42:56 1776541376

If you think that "strawberry" is some kind of own, I don't know what to tell you. It takes deep and profound ignorance of both the technical basics of modern AIs and the current SOTA to do this kind of thing.

LLMs get better release to release. Unfortunately, the quality of humans in LLM capability discussions is consistently abysmal. I wouldn't be seeing the same "LLMs are FUNDAMENTALLY FLAWED because I SAY SO" repeated ad nauseam otherwise.

bigyabai · 2026-04-18T19:45:32 1776541532

I can ask a nine-year-old human brain to solve that problem with a box of Crayola and a sheet of A4 printer paper.

In-context learning is professedly not "good enough" to approximate continuous learning of even a child.

ACCount37 · 2026-04-18T20:11:59 1776543119

You're absolutely wrong!

You can also ask an LLM to solve that problem by spelling the word out first. And then it'll count the letters successfully. At a similar success rate to actual nine-year-olds.

There's a technical explanation for why that works, but to you, it might as well be black magic.

And if you could get a modern agentic LLM that somehow still fails that test? Chances are, it would solve it with no instructions - just one "you're wrong".

1. The LLM makes a mistake

2. User says "you're wrong"

3. The LLM re-checks by spelling the word out and gives a correct answer

4. The LLM then keeps re-checking itself using the same method for any similar inquiry within that context

In-context learning isn't replaced by anything better because it's so powerful that finding "anything better" is incredibly hard. It's the bread and butter of how modern LLM workflows function.

squeaky-clean · 2026-04-19T03:17:01 1776568621

This is false. You can ask it to spell out strawberry and count the letters and it will still say 2 (it's unable to actually count the letters by the way). The only way to get a model that believes strawberry has 2 R's to consistently give the correct answer is to ask it to code the problem and return the output.

In fact, asking a model not to repeat the same mistake makes it more likely to commit that mistake again, because it's in it's context.

I think anyone who uses LLMs a lot will tell your that your steps 3 and 4 are fictional.

ACCount37 · 2026-04-19T12:38:24 1776602304

Have you actually tried?

The "spell out" trick, by the way, was what was added to the system prompts of frontier models back when this entire meme was first going around. It did mitigate the issue.

bigyabai · 2026-04-18T22:30:07 1776551407

> it's so powerful that finding "anything better" is incredibly hard.

We're back around to the start again. "Incredibly hard" is doing all of the heavy lifting in this statement, it's not all-powerful and there are enormous failure cases. Neither the human brain nor LLMs are a panacea for thought, but nobody in academia or otherwise is seriously comparing GPT to the human brain. They're distinct.

> There's a technical explanation for why that works, but to you, it might as well be black magic.

Expound however much you need. If there's one thing I've learned over the past 12 months it's that everyone is now an expert on the transformer architecture and everyone else is wrong. I'm all ears if you've got a technical argument to make, the qualitative comparison isn't convincing me.

ACCount37 · 2026-04-19T12:59:55 1776603595

I do know far more than you, which is a laughably low bar. If you want someone to hold your hand through it, ask an LLM.

The key words are "tokenization" and "metaknowledge", the latter being the only non-trivial part. An LLM can explain it in detail. They know more than you do too.

8note · 2026-04-18T23:36:51 1776555411

why is the breakdown from words to letters your highest priority thing to add to the training data?

what problem does this allow you to solve that you couldnt otherwise?

squeaky-clean · 2026-04-19T03:21:29 1776568889

This comment is tangential to their point that a transformer architecture can or cannot be functionally equivalent to a human brain. Practicality of those limitations is a different discussion

ACCount37 · 2026-04-16T17:10:00 1776359400

Ctrl-F "neuralese" on that page.

ACCount37 · 2026-04-16T16:54:36 1776358476

Not impossible, but you have to be at least a little bit mad to deploy tokenizer replacement surgery at this scale.

They also changed the image encoder, so I'm thinking "new base model". Whatever base that was powering 4.5/4.6 didn't last long then.

ACCount37 · 2026-04-16T16:22:44 1776356564

This is the same paranoid, anxious behavior that ChatGPT has. One hell of a bad sign.

driverdan · 2026-04-17T01:19:16 1776388756

Models are not paranoid or anxious, they do not think or have feelings. I know you're probably using those words as a metaphor but we need to be careful about anthropomorphizing LLMs.

adammarples · 2026-04-17T08:49:26 1776415766

They didn't describe the model, they described (accurately) the behaviour. They are useful descriptors of behaviour.

selfhoster11 · 2026-04-17T17:39:57 1776447597

They are trained on natural language. Not anthropomorphizing them is the worse end of the spectrum.

Gareth321 · 2026-04-17T08:13:48 1776413628

As an accelerationist and transhumanist, no way! These models passed the Turing test years ago. When a thing is indistinguishable from human, it is human. Our brains are, after all, just a collection of learned memetic weights. Just ask the determinists.

fourside · 2026-04-17T15:10:20 1776438620

Except there are several obvious ways in which LLMs are not indistinguishable from humans.

ACCount37 · 2026-04-16T15:41:26 1776354086

Reverse distillation. Using small models to bootstrap large models. Get richer signal early in the run when gradients are hectic, get the large model past the early training instability hell. Mad but it does work somewhat.

Not really similar to speculative decoding?

I don't think that's what they've done here though. It's still black magic, I'm not sure if any lab does it for frontier runs, let alone 10T scale runs.

ACCount37 · 2026-04-16T15:36:53 1776353813

Model inference compute over model lifetime is ~10x of model training compute now for major providers. Expected to climb as demand for AI inference rises.

Glemllksdf · 2026-04-16T15:59:11 1776355151

For sure and growth also costs money for buying DCs etc.

howdareme9 · 2026-04-16T15:46:38 1776354398

They are constantly training and getting rid of older models, they are losing money

ACCount37 · 2026-04-16T16:08:20 1776355700

Which part of "over model lifetime" did you not understand?

adgjlsfhk1 · 2026-04-16T20:54:26 1776372866

That's not a sufficient condition for profitability if both inference and scaling costs continue to increase over time.

ACCount37 · 2026-04-16T15:25:20 1776353120

Some labs do it internally because RLVR is very token-expensive. But it degrades CoT readability even more than normal RL pressure does.

It isn't free either - by default, models learn to offload some of their internal computation into the "filler" tokens. So reducing raw token count always cuts into reasoning capacity somewhat. Getting closer to "compute optimal" while reducing token use isn't an easy task.

stingraycharles · 2026-04-16T15:30:32 1776353432

Yeah the readability suffers, but as long as the actual output (ie the non-CoT part) stays unaffected it’s reasonably fine.

I work on a few agentic open source tools and the interesting thing is that once I implemented these things, the overall feedback was a performance improvement rather than performance reduction, as the LLM would spend much less time on generating tokens.

I didn’t implement it fully, just a few basic things like “reduce prose while thinking, don’t repeat your thoughts” etc would already yield massive improvements.

ACCount37 · 2026-04-16T15:19:18 1776352758

Constantly. Minor revisions can easily "wobble" on benchmarks that the training didn't explicitly push them for.

Whether it's genuine loss of capability or just measurement noise is typically unclear.

ACCount37 · 2026-04-16T15:06:33 1776351993

People were "predicting" the plateau since GPT-1. By now, it would take extraordinary evidence for me to take such "predictions" seriously.

ACCount37 · 2026-04-16T14:57:44 1776351464

> We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses.

Fucking hell.

Opus was my go-to for reverse engineering and cybersecurity uses, because, unlike OpenAI's ChatGPT, Anthropic's Opus didn't care about being asked to RE things or poke at vulns.

It would, however, shit a brick and block requests every time something remotely medical/biological showed up.

If their new "cybersecurity filter" is anywhere near as bad? Opus is dead for cybersec.

methodical · 2026-04-16T15:11:09 1776352269

To be fair, delineating between benevolent and malevolent pen-testing and cybersecurity purposes is practically impossible since the only difference is the user's intentions. I am entirely unsurprised (and would expect) that as models improve the amount to which widely available models will be prohibited from cybersecurity purposes will only increase.

Not to say I see this as the right approach, in theory the two forces would balance each other out as both white hats and black hats would have access to the same technology, but I can understand the hesitancy from Anthropic and others.

ACCount37 · 2026-04-16T15:22:49 1776352969

Yes, and the previous approach Anthropic took was "allow anything that looks remotely benign". The only thing that would get a refusal would be a downright "write an exploit for me". Which is why I favored Anthropic's models.

It remains to be seen whether Anthropic's models are still usable now.

I know just how much of a clusterfuck their "CBRN filter" is, so I'm dreading the worst.

ninjagoo · 2026-04-17T13:02:54 1776430974

> since the only difference is the user's intentions

Have these been banned yet: dual-use kitchen items, actual weapons of war for consumer use, dual-use garden chemicals, dual-use household chemicals etc. etc? Has human cybersecurity research stopped? Have malware authors stopped research?

No? then this sounds more like hype than real reasons.

There's also the possibility that there's a singular anthropic individual who's gained a substantial amount of internal power and is driving user-hostile changes in the product under the guise of cybersecurity.

trinix912 · 2026-04-16T20:53:16 1776372796

But this technology is now out there, the cat's out of the bag, there's no going back to a world where people can't ask AI to write malware for them.

I'd argue that black hats will find a way to get uncensored models and use them to write malware either way, and that further restricting generally available LLMs for cybersec usage would end up hurting white hats and programmers pentesting their own code way more (which would once again help the black hats, as they would have an advantage at finding unpatched exploits).

zb3 · 2026-04-16T15:01:20 1776351680

It appears we're learning the hard way that we can't rely on capabilities of models that aren't open weights. These can be taken from us at any time, so expect it to get much worse..

hootz · 2026-04-16T16:22:32 1776356552

Can't wait for a random chinese company to train a model on Mythos by breaking Anthropic's ToS just to release it for free and with open weights.

Havoc · 2026-04-16T15:01:44 1776351704

Claude code had safeguards like that hardcoded into the software. You could see it if you intercept the prompts with a proxy

brynnbee · 2026-04-16T16:14:23 1776356063

I'm currently testing 4.7 with some reverse engineering stuff/Ghidra scripting and it hasn't refused anything so far, but I'm also doing it on a 20 year old video game, so maybe it doesn't think that's problematic.

ACCount37 · 2026-04-16T17:17:58 1776359878

I really hope it's that way for my use cases too, also Ghidra and decompiler outputs, but I'm not optimistic.

senko · 2026-04-16T15:36:29 1776353789

From the article:

> Security professionals who wish to use Opus 4.7 for legitimate cybersecurity purposes (such as vulnerability research, penetration testing, and red-teaming) are invited to join our new Cyber Verification Program.

atonse · 2026-04-16T16:21:40 1776356500

This seems reasonable to me. The legit security firms won't have a problem doing this, just like other vendors (like Apple, who can give you special iOS builds for security analysis).

If anyone has a better idea on how to _pragmatically_ do this, I'm all ears.

adrian_b · 2026-04-16T18:00:55 1776362455

If the vendors of programs do not want bugs to be found in their programs, they should search for them themselves and ensure that there are no such bugs.

The "legit security firms" have no right to be considered more "legit" than any other human for the purpose of finding bugs or vulnerabilities in programs.

If I buy and use a program, I certainly do not want it to have any bug or vulnerability, so it is my right to search for them. If the program is not commercial, but free, then it is also my right to search for bugs and vulnerabilities in it.

I might find acceptable to not search for bugs or vulnerabilities in a program only if the authors of that program would assume full liability in perpetuity for any kind of damage that would ever be caused by their program, in any circumstances, which is the opposite of what almost any software company currently does, by disclaiming all liabilities.

There exists absolutely no scenario where Anthropic has any right to decide who deserves to search for bugs and vulnerabilities and who does not.

If someone uses tools or services provided by Anthropic to perform some illegal action, then such an action is punishable by the existing laws and that does not concern Anthropic any more than a vendor of screwdrivers should be concerned if someone used one as a tool during some illegal activity.

I am really astonished by how much younger people are willing to put up with the behaviors of modern companies that would have been considered absolutely unacceptable by anyone, a few decades ago.

atonse · 2026-04-16T19:44:04 1776368644

Not sure where the younger people thing came from, but I'm 45 and have been working in this industry since 1999. But even when I was in my 20s, I don't remember considering that I had a "right" to do something with a company's product before they've sold it to me.

In fact, I would say the idea of entitlement and use of words like "rights" when you're talking about a company's policies and terms of use (of which you are perfectly fine to not participate. rights have nothing to do with anything here. you're free to just not use these tools) feels more like a stereotypical "young" person's argument that sees everything through moralistic and "rights" based principles.

If you don't want to sign these documents, don't. This is true of pretty much every single private transaction, from employment, to anything else. It is your choice. If you don't want to give your ID to get a bank account, don't. Keep the cash in your mattress or bitcoin instead.

Regarding "legit" - there are absolutely "legit" actors and not so "legit" actors, we can apply common sense here. I'm sure we can both come up with edge cases (this is an internet argument after all), but common cases are a good place to start.

adrian_b · 2026-04-16T20:59:02 1776373142

You cannot search for bugs or vulnerabilities in "a company's product before they've sold it to you", because you cannot access it.

Obviously, I was not talking about using pirated copies, which I had classified as illegal activities in my comment, so what you said has nothing to do with what I said.

"A company's policies and terms of use" have become more and more frequently abusive and this is possible only because nowadays too many people have become willing to accept such terms, even when they are themselves hurt by these terms, which ensures that no alternative can appear to the abusive companies.

I am among those who continue to not accept mean and stupid terms forced by various companies, which is why I do not have an Anthropic subscription.

> "if you don't want to give your ID to get a bank account, don't"

I do not see any relevance of your example for our discussion, because there are good reasons for a bank to know the identity of a customer.

On the other hand there are abusive banks, whose behavior must not be accepted. For instance, a couple of decades ago I have closed all my accounts in one of the banks that I was using, because they had changed their online banking system and after the "upgrade" it worked only with Internet Explorer.

I do not accept that a bank may impose conditions on their customers about what kinds of products of any nature they must buy or use, e.g. that they must buy MS Windows in order to access the services of the bank.

More recently, I closed my accounts in another bank, because they discontinued their Web-based online banking and they have replaced that with a smartphone application. That would have been perfectly OK, except that they refused to provide the app for downloading, so that I could install it, but they provided the app only in the online Google store, which I cannot access because I do not have a Google account.

A bank does not have any right to condition their services on entering in a contractual relationship with a third party, like Google. Moreover, this is especially revolting when that third party is from a country that is neither that of the bank nor that of the customer, like Google.

These are examples of bad bank behavior, not that with demanding an ID.

atonse · 2026-04-17T01:42:50 1776390170

With the bank example, I thought your comment had some anti KYC language so I mixed it up with another response, sorry for the confusion.

I actually kind of agree with you in some principle, IF we had no choice. Like the only reason I can say “you can choose not to purchase this product” is because that is true today, thanks to competition from commercial and open source models.

But I’d be right there with you on “someone needs to force these companies to do ____” if they were quasi monopolies and citizens needed to use their technology in some form (we see this with certain patents around cell phone tech for example)

senko · 2026-04-16T18:38:57 1776364737

> If someone uses tools or services provided by Anthropic to perform some illegal action, then such an action is punishable by the existing laws and that does not concern Anthropic any more than a vendor of screwdrivers should be concerned if someone used one as a tool during some illegal activity.

In civilised parts of the world, if you want to buy a gun, or poison, or larger amount of chemicals which can be used for nefarious purposes, you need to provide your identity and the reason why you need it.

Heck, if you want to move a larger amount of money between your bank accounts, the bank will ask you why.

Why are those acceptable, yet the above isn't?

> I am really astonished by how much younger people are willing to put up with

Unsure where you got the "younger people" from.

adrian_b · 2026-04-16T21:10:28 1776373828

Your examples have nothing to do with Anthropic and the like.

A gun does not have other purposes than being used as a weapon, so it is normal for the use of such weapons to be regulated.

On the other hand it is not acceptable to regulate like weapons the tools that are required for other activities, for instance kitchen knives or many chemicals, like acids and alkalis, which are useful for various purposes and which in the past could be bought freely for centuries, without that ever causing any serious problems.

LLMs are not weapons, they are tools. Any tools can be used in a bad or dangerous way, including as weapons, but that is not a reason good enough to justify restrictions in their use, because such restrictions have much more bad consequences than good consequences.

> Unsure where you got the "younger people" from.

Like I have said, none of the people that I know from my generation have ever found acceptable the kinds of terms and conditions that are imposed nowadays by most big companies for using their products or their attempts to transition their customers from owning products to renting products.

The people who are now in their forties are a generation after me, so most of them are already much more compliant with these corporate demands, which affects me and the other people who still refuse to comply, because the companies can afford to not offer alternatives when they have enough docile customers.

ACCount37 · 2026-04-16T15:44:26 1776354266

Yeah no. They can fuck right off with KYC humiliation rituals.

johnmlussier · 2026-04-16T15:46:15 1776354375

Incredible - in one fell swoop killing my entire use case for Claude.

I have about 15 submissions that I now need to work with Codex on cause this "smarter" model refuses to read program guidelines and take them seriously.