More

varispeed · 2026-04-18T18:25:41 1776536741

How do you get anything sensible out of Kimi?

varispeed · 2026-04-18T18:21:29 1776536489

I spent one day with Opus 4.7 to fix a bug. It just ran in circles despite having the problem "in front of its eyes" with all supporting data, thorough description of the system, test harness that reproduces the bug etc. While I still believe 4.7 is much "smarter" than GPT-5.4 I decided to give it ago. It was giving me dumb answers and going off the rails. After accusing it many times of being a fraud and doing it on purpose so that I spend more money, it fixed the bug in one shot.

Having a taste of unnerfed Opus 4.6 I think that they have a conflict of interest - if they let models give the right answer first time, person will spend less time with it, spend less money, but if they make model artificially dumber (progressive reasoning if you will), people get frustrated but will spend more money.

It is likely happening because economics doesn't work. Running comparable model at comparable speed for an individual is prohibitively expensive. Now scale that to millions of users - something gotta give.

varispeed · 2026-04-18T16:28:18 1776529698

But Amazon has infinite money, so licences are meaningless.

varispeed · 2026-04-17T16:54:35 1776444875

Don't forget that the model doesn't have an incentive to give right solution the first time. At least with Opus 4.6 after it got nerfed, it would go round in circles until you tell it to stop defrauding you and get to correct solution. That not always worked though. I found starting session again and again until less nerfed model was put on the request. Still all points to artificially make customer pay more.

varispeed · 2026-04-17T15:11:44 1776438704

Why does it have to be reserved to security space? Here is my API please find vulnerabilities I missed (otherwise someone with not restricted AI will find them first).

Cat is out of the bag.

Removing restrictions will help everybody in the long run.

varispeed · 2026-04-17T13:10:36 1776431436

This week I tried to use Opus to analyse output from an oscilloscope and it was impossible to complete, because Python scripts (Opus wrote itself) were flagged for cyber security risk. Baffling.

varispeed · 2026-04-16T16:26:21 1776356781

How do you get codex to generate any code?

I describe the problem and codex runs in circles basically:

codex> I see the problem clearly. Let me create a plan so that I can implement it. The plan is X, Y, Z. Do you want me to implement this?

me> Yes please, looks good. Go ahead!

codex> Okay. Thank you for confirming. So I am going to implement X, Y, Z now. Shall I proceeed?

me> Yes, proceed.

codex> Okay. Implementing.

...codex is working... you see the internal monologue running in circles

codex> Here is what I am going to implement: X, Y, Z

me> Yes, you said that already. Go ahead!

codex> Working on it.

...codex in doing something...

codex> After examining the problem more, indeed, the steps should be X, Y, Z. Do you want me to implement them?

etc.

Very much every sessions ends up being like this. I was unable to get any useful code apart from boilerplate JS from it since 5.4

So instead I just use ChatGPT to create a plan and then ask Opus to code, but it's a hit and miss. Almost every time the prompt seems to be routed to cheaper model that is very dumb (but says Opus 4.6 when asked). I have to start new session many times until I get a good model.

skocznymroczny · 2026-04-16T20:39:57 1776371997

It's just like subscription based MMORPGs that delay you as much as possible every step of the way because that's the way they can extract more money from you. If you pay for the tokens it's not in their benefit to give you the answer directly.

Gracana · 2026-04-16T17:04:09 1776359049

Do you have to put it in a build/execute mode (separate from a planning mode) to allow it to move on? I use opencode, and that's how it works.

johanyc · 2026-04-17T00:35:18 1776386118

Weird. I never had that issue when writing code.

varispeed · 2026-04-16T13:18:36 1776345516

If they work for hostile state, the payoff is destruction of economy and social contract. Damage here, damage there. It all adds up.

varispeed · 2026-04-16T13:16:28 1776345388

This is clearly setup for VC backed companies where shareholders don't care about spend as long as they can brag about investing in this cool start up at dinner parties. Normal and true business should stay away.

varispeed · 2026-04-16T11:21:33 1776338493

Codex exploited or you exploited? It's like saying a hammer drove a nail, without acknowledging the hand and the force it exerted and the human brain behind it.

freedomben · 2026-04-16T11:36:42 1776339402

Feels like the truth is somewhere in between. For example if it was a "smart" hammer and you could tell your hammer "go pound in those nails" and it pounded in the wrong ones, or did it too hard, or something, that feels more equivalent. You would still be blamed for your ambiguous prompt, and fault/liability is ultimately on you the hammer director, but it still wasn't you who chose the exact nails to hammer on.

I also think taking credit for writing an exploit that you didn't write and may not even have the knowledge to do yourself is a bit gray.

Glemllksdf · 2026-04-16T11:54:00 1776340440

Wrong questions.

Could a script kiddy stear an LLM? How much does this reduce the cost of attacks? Can this scale?

What does this mean for the future of cyber security?

par1970 · 2026-04-16T11:29:01 1776338941

Do you have a defense of why human-hammer-nail is a good analogy for human-chatgpt5.4-pwndsamsung?

BLKNSLVR · 2026-04-16T11:44:59 1776339899

AI without a suitably well crafted prompt is like a firework tube held by a 3 year old.

AI without a prompt is a hammer sitting in a drawer.

croes · 2026-04-16T11:35:43 1776339343

If I just point to the wall and say "nail" then I would day the hammer drive the nail

saintfire · 2026-04-16T21:03:30 1776373410

You didn't, you figured out where the nail needs to go, got the nail and then swung the hammer until the nail was driven.

This is really just closer to a drill in that it automated the grunt work with full guidance.

croes · 2026-04-17T15:22:54 1776439374

Then explain vibe coders.

They are the customer who just tell their wishes, can't handle a hammer, can't handle a drill, don't know which nail, hammer or drill to use. Still the nail is in the wall.

Who did it?

Zigurd · 2026-04-16T13:09:38 1776344978

You could call the LLMs role "smart grep," and mean it to be derisive. But I would have gladly used a real smart grep.