Hacker Newsnew | past | comments | ask | show | jobs | submit | varispeed's commentslogin

How do you get anything sensible out of Kimi?

I spent one day with Opus 4.7 to fix a bug. It just ran in circles despite having the problem "in front of its eyes" with all supporting data, thorough description of the system, test harness that reproduces the bug etc. While I still believe 4.7 is much "smarter" than GPT-5.4 I decided to give it ago. It was giving me dumb answers and going off the rails. After accusing it many times of being a fraud and doing it on purpose so that I spend more money, it fixed the bug in one shot.

Having a taste of unnerfed Opus 4.6 I think that they have a conflict of interest - if they let models give the right answer first time, person will spend less time with it, spend less money, but if they make model artificially dumber (progressive reasoning if you will), people get frustrated but will spend more money.

It is likely happening because economics doesn't work. Running comparable model at comparable speed for an individual is prohibitively expensive. Now scale that to millions of users - something gotta give.


But Amazon has infinite money, so licences are meaningless.

Don't forget that the model doesn't have an incentive to give right solution the first time. At least with Opus 4.6 after it got nerfed, it would go round in circles until you tell it to stop defrauding you and get to correct solution. That not always worked though. I found starting session again and again until less nerfed model was put on the request. Still all points to artificially make customer pay more.

Why does it have to be reserved to security space? Here is my API please find vulnerabilities I missed (otherwise someone with not restricted AI will find them first).

Cat is out of the bag.

Removing restrictions will help everybody in the long run.


This week I tried to use Opus to analyse output from an oscilloscope and it was impossible to complete, because Python scripts (Opus wrote itself) were flagged for cyber security risk. Baffling.

How do you get codex to generate any code?

I describe the problem and codex runs in circles basically:

codex> I see the problem clearly. Let me create a plan so that I can implement it. The plan is X, Y, Z. Do you want me to implement this?

me> Yes please, looks good. Go ahead!

codex> Okay. Thank you for confirming. So I am going to implement X, Y, Z now. Shall I proceeed?

me> Yes, proceed.

codex> Okay. Implementing.

...codex is working... you see the internal monologue running in circles

codex> Here is what I am going to implement: X, Y, Z

me> Yes, you said that already. Go ahead!

codex> Working on it.

...codex in doing something...

codex> After examining the problem more, indeed, the steps should be X, Y, Z. Do you want me to implement them?

etc.

Very much every sessions ends up being like this. I was unable to get any useful code apart from boilerplate JS from it since 5.4

So instead I just use ChatGPT to create a plan and then ask Opus to code, but it's a hit and miss. Almost every time the prompt seems to be routed to cheaper model that is very dumb (but says Opus 4.6 when asked). I have to start new session many times until I get a good model.


It's just like subscription based MMORPGs that delay you as much as possible every step of the way because that's the way they can extract more money from you. If you pay for the tokens it's not in their benefit to give you the answer directly.

Do you have to put it in a build/execute mode (separate from a planning mode) to allow it to move on? I use opencode, and that's how it works.

Weird. I never had that issue when writing code.

If they work for hostile state, the payoff is destruction of economy and social contract. Damage here, damage there. It all adds up.

This is clearly setup for VC backed companies where shareholders don't care about spend as long as they can brag about investing in this cool start up at dinner parties. Normal and true business should stay away.

Codex exploited or you exploited? It's like saying a hammer drove a nail, without acknowledging the hand and the force it exerted and the human brain behind it.

Feels like the truth is somewhere in between. For example if it was a "smart" hammer and you could tell your hammer "go pound in those nails" and it pounded in the wrong ones, or did it too hard, or something, that feels more equivalent. You would still be blamed for your ambiguous prompt, and fault/liability is ultimately on you the hammer director, but it still wasn't you who chose the exact nails to hammer on.

I also think taking credit for writing an exploit that you didn't write and may not even have the knowledge to do yourself is a bit gray.


Wrong questions.

Could a script kiddy stear an LLM? How much does this reduce the cost of attacks? Can this scale?

What does this mean for the future of cyber security?


Do you have a defense of why human-hammer-nail is a good analogy for human-chatgpt5.4-pwndsamsung?

AI without a suitably well crafted prompt is like a firework tube held by a 3 year old.

AI without a prompt is a hammer sitting in a drawer.


If I just point to the wall and say "nail" then I would day the hammer drive the nail

You didn't, you figured out where the nail needs to go, got the nail and then swung the hammer until the nail was driven.

This is really just closer to a drill in that it automated the grunt work with full guidance.


Then explain vibe coders.

They are the customer who just tell their wishes, can't handle a hammer, can't handle a drill, don't know which nail, hammer or drill to use. Still the nail is in the wall.

Who did it?


You could call the LLMs role "smart grep," and mean it to be derisive. But I would have gladly used a real smart grep.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: