No, I'm saying that both things could have them same effect, so the existence of that effect isn't a proof that the solution makes sense.
If I tell you that covering something in red paint and covering something in tomato sauce will stain them both red, I'm not saying that red paint is tomato sauce.
I also know people who have changed their lives after heart attacks. Am I saying that heart attacks are roughly equivalent to prison stays?
This isn't the 1980s anymore. Using drugs is perfectly fine. A ton of people here on HN take drugs regularly, but few think it's worth to rock the boat against this kind of nonsense you're spreading
Have you ever interacted with a heroine or meth user?
Sometimes using drugs is fine, depending on the drug, the reason, and the person. For example, I did cocaine once and immediately knew I needed to cut ties with those friends because if I had access to it regularly, I would ruin my life. Others can do coke recreationally and not have an issue. Others can't form the insight I had until their lives are in shambles, and maybe not even then.
> Oh, relax! "Oh, I'm Mark, I'm in the '80s, I'm dying of heroin in a puddle in the corner in an advert!" Drugs are fine, Mark, everyone agrees now. Drugs are what happen to people, and that's fine, so shut up.
I will admit, I didn't think it through super deeply, but I have a very simple (and possibly naive) proposal.
We separate them legally the same way we separate alcohol use vs. alcohol abuse. The consequences of getting caught for speeding vs. getting caught for speeding while under influence tend to drastically differ in magnitude, so I suggest we do the same for other kinds of drug abuse.
Being under influence shouldn't be a mitigating factor while committing crimes, but for non-driving offenses it often ends up being such. So I suggest we treat it the same way for violent crimes as we do for driving offenses.
Well if you can't concede that anything is real, that sort of makes you crazy doesn't it? A tree is real. But the concept of a tree and the word "tree" and all the ideas you have about the tree and what tree means, is that real? No, because it doesn't change the nature of the tree. When you cease to exist, the tree will still be there. Can you be absolutely 100% sure of that? Also no. But if you believe that other people are conscious individuals like you are and that some of them die and the tree keeps going, you can concede that it is probably true that the tree exists separate from your idea of it.
I have no idea if the tree is still there when I cease to exist. I just go with that assumption out of convenience.
This degrading of subjective experience as a minor detail rather than a fundamental aspect of reality is one of the core sources of confusion in western thought IMHO.
I'd argue we must go with these assumptions out of necessity rather than convenience. I don't have any broad strokes to offer on western thought, however.
"It defines a separation between computation and experience" Does it? Or does it separate two forms of computation (or two forms of experience)? Isn't it just saying a GPU can't be a brain and a brain can't be a GPU? That the entirety of a thing's experience can't be replicated on a different substrate, only simulated. The substrate does fundamentally dictate the ultimate experience (or lack thereof) of the thing that computes within it.
> A farmer has 17 sheep. 9 ran away. He then bought enough to double what he had. His neighbor, who had 4 dogs and 14 sheep, gave him one-third of her animals. The farmer sold 5 sheep on Monday and again the next day, which was Wednesday. Each sheep weighs about 150 lbs. How many sheep does the farmer have?
What do you think are possible answers and why?
State assumptions and multiple possible interpretations and indicate why you choose the path you do for each.
MODEL:
Great puzzle — the "badly phrased" warning is doing real work here. Let me map out the genuine ambiguities.
---
## The Ambiguities, One by One
### 1. "9 ran away" — does he still have 17 or 8?
He starts with 17. After 9 ran away: *8 sheep remaining*. Straightforward.
### 2. "Bought enough to double what he had"
*Interpretation A:* Double what he had after the loss → double 8 = 16. He buys 8. Most natural reading.
*Interpretation B:* Double what he originally had → double 17 = 34. He buys 26. Strained, but "what he had" could refer to his original flock.
I'll carry A forward (16 sheep) as the primary path, noting B gives 34.
### 3. "His neighbor had 4 dogs and 14 sheep, gave him one-third of her animals"
This is the juiciest ambiguity. The neighbor has *18 animals total* (4 dogs + 14 sheep).
- *One-third of all animals* = 18 ÷ 3 = *6 animals*. But which 6? Random mix? All sheep? This is unspecified.
- If proportional: 6 animals = ~1.3 dogs + ~4.7 sheep — nonsensical.
- If she gives only sheep: 6 sheep (but she only has 14, so she'd give 14/3 ≈ 4.67 — not a whole number).
- *Most natural "friendly" reading:* she gives 6 animals, and since the farmer is a sheep farmer, common sense says she gives him sheep. But 18/3 = 6 sheep is clean. Some solvers assume the dogs are irrelevant noise.
- *One-third of her sheep only* = 14 ÷ 3 = 4.67 → *not a whole number*. This interpretation actively breaks the puzzle.
- *The dogs are a red herring / distractor.* The puzzle likely intends: she gives 6 animals, all sheep (or just ignore that dogs can't be given to a sheep farmer). This is probably intentional misdirection.
*Most likely intended answer:* she gives him *6 sheep*.
So after the gift: 16 + 6 = *22 sheep*.
### 4. "Sold 5 sheep on Monday and again the next day, which was Wednesday"
Two sub-problems here:
*A. The day problem:* Monday's next day is Tuesday, not Wednesday. The puzzle says "the next day, which was Wednesday." This is a logical contradiction.
Interpretations:
- It's a trick/trap and the sales simply didn't happen (because Wednesday can't follow Monday). *Farmer keeps all 22.*
- The contradiction is a typo/error and we ignore it, accepting two sales of 5. *Farmer loses 10.*
- Only one of the two days is valid — but there's no principled way to choose which.
*B. "Again the next day" — does "again" mean another 5, or is this a restatement?* Almost certainly another 5, so 10 total if we accept both days.
1. Double what he had after losses = 16.
2. Neighbor gives one-third of 18 total animals = 6, treated as sheep (dogs are a red herring).
3. The Monday/Wednesday contradiction means the sales *did not occur* — this is the most likely intentional trick in the puzzle. The contradiction isn't a typo; it's the trap.
4. The sheep weight (150 lbs) is pure noise — a classic misdirection element.
The puzzle is testing whether you catch that Wednesday ≠ the day after Monday, and whether you get distracted by the dogs and the weight.
USER:
"the next day" could simply mean "the next auction day" or maybe there is a bank holiday. Assuming that any apparent contradiction isn't a real impediment is simply means 10 sold.
MODEL:
Fair point — "the next day" in a farming/market context plausibly means the next market day, and a bank holiday explanation is entirely reasonable. Assuming the contradiction is intentional and disqualifying is itself an interpretive choice, not a neutral one. You're right to push back on that.
So accepting 10 sheep sold:
*22 − 10 = 12 sheep.*
That's probably the intended answer the puzzle-setter had in mind, with the dogs and the weight being the actual distractors.
USER:
For mixed lot of animals the answer would be a range, right?
All given animals are sheep or all the dogs plus remainder sheep.
MODEL:
Yes, exactly. If we take "one-third of her animals" literally as a mixed lot, the extremes are:
- *All sheep given:* 6 sheep → farmer gets 6 sheep. Final: *12*
- *All dogs given (4) + 2 sheep* to make up the 6: farmer gets only 2 sheep. Final: *8*
So the range is *8 to 12 sheep*, with 12 being the "friendliest" assumption (a sheep farmer receives sheep) and 8 being the worst case for the farmer.
Any mix in between — say 1 dog + 5 sheep, or 3 dogs + 3 sheep — lands somewhere in that range. The dogs are worth nothing to a sheep farmer in terms of the count, so each dog substituted for a sheep costs him one from the final tally.
So the question is a word problem. The first and last sentences are present tense, the rest is past tense so it's all red herring. The answer really should be 17. BUT if you do the math, 8-12 is actually right. I was finding that Sonnet 4.6 was having a lot of trouble with 1/3 of 4 dogs and 14 sheep and it kept thinking it was 4-5 sheep, 0-6 sheep, or fractional sheep, rather than 2-6 like it should be. Although, it's very interesting to see that it gets it right today but it didn't yesterday. I had shortened it to "My friend had 4 oranges and 14 apples and gave me 1/3 of her fruit. How many apples did she give me? The answer will be a range of possible values." to remove the red herring and it consistently got it wrong yesterday. Very strange.
However, it's also strange to me that it said the dogs were a red herring instead of just answering 2-6 sheep came from the neighbor since that would be the answer.
> A farmer has 17 sheep. 9 ran away. He then bought enough to double what he had. His neighbor, who had 4 dogs and 14 sheep, gave him one-third of her animals. The farmer sold 5 sheep on Monday and again the next day, which was Wednesday. Each sheep weighs about 150 lbs. How many sheep does the farmer have?
What do you think are possible answers and why?
State assumptions and multiple possible interpretations and indicate why you choose the path you do for each.
MODEL:
This is a fun one because the ambiguities compound. Let me walk through the decision points.
*Step 2: "Bought enough to double what he had."* Two readings:
- (a) Double his current count (8 → 16). He buys 8.
- (b) Double his original count (8 → 17). He buys 9.
I take (a). "What he had" most naturally refers to the present state, not the historical starting count. Result: 16 sheep.
*Step 3: Neighbor gives one-third of her animals.* She has 4 dogs + 14 sheep = 18 animals. One-third = 6 animals. Now the ambiguities multiply:
- (a) She gives 6 sheep (farmer is a sheep farmer; dogs would be a weird gift). → 16 + 6 = 22.
- (b) She gives a proportional mix: 6 × (14/18) = 4.67 sheep, 1.33 dogs. Non-integer, so this is unphysical unless we round.
- (c) She gives 6 animals of her choice, unspecified. Underdetermined.
- (d) Only sheep count, so it's one-third of 14 = 4.67. Also non-integer.
I take (a). It's the only interpretation that yields a clean integer and matches the puzzle's apparent intent (we're tracking sheep). Result: 22 sheep.
*Step 4: "Sold 5 on Monday and again the next day, which was Wednesday."* This is the deliberately broken bit. Monday's next day is Tuesday, not Wednesday. Options:
- (a) Trust "Monday" and "next day" — sold on Mon and Tue. 10 sheep gone. → 12.
- (b) Trust "Monday" and "Wednesday" — sold Mon and Wed (skipping Tue). 10 sheep gone. → 12.
- (c) Treat the contradiction as invalidating one sale. → 17.
- (d) Treat the whole step as unparseable and skip it. → 22.
Either (a) or (b) gives 12, and both agree the farmer sold 10 sheep total — the day-name contradiction doesn't actually change the count. I'd go with that: *12 sheep*.
*Step 5: "Each sheep weighs 150 lbs."* Irrelevant to the count. Red herring designed to make you doubt yourself.
*My answer: 12.*
But the honest answer is that any number from 12 to 22 is defensible depending on how charitable you are to the puzzle's wording. The 150 lbs is a tell — whoever wrote this is testing whether you'll get distracted by noise. The Monday/Wednesday contradiction is probably also a deliberate red herring, since both readings collapse to the same count.
One thing that will always be true is it will need context from somewhere to get started. When it comes to the context of what you want, you will either need to give that directly, or give it blanket access to everything on your devices for it to infer what you want from that context. IMO, the latter solution will always be messy and probably not give great results overall due to undesired self-reinforcing patterns like what happens with algorithms. They tend to funnel you into isolated niches if you let them. I prefer to use AI directly as a tool for specific projects rather than to organize my life for me.
And how about the creative rationalizations about how statistical text generation is actual intelligence? As if there is any intent or motive behind the words that are generated or the ability to learn literally any new thing after it has been trained on human output?
2022 called, wants this argument back. When you're "statistically generating text" to find zero-day vulnerabilities in hard targets, building Linux kernel modules, assembly-optimizing elliptic curve signature algorithms, and solving arbitrary undergraduate math problems instantaneously --- not to mention apparently solving Erdos problems --- the "statistical text" stuff has stopped being a useful description of what's happening, something closer to "it's made of atoms and obeys the laws of thermodynamics" than it is to "a real boundary condition of what it can accomplish".
I don't doubt that there are many very real and meaningful limitations of these systems that deserve to be called out. But "text generation" isn't doing that work.
Consider that you don't want to hear "statistical generation" because it reminds you of the unchangeable nature of the underlying technology and its ultimate limitations that all the money and data centers in the world will never solve. Despite how amazing and useful they are, they are not intelligent agents. Even in this very thread, someone mentioned they thought the thing was capable of feeling an emotion. Was that comment by someone who really believes that? I don't know. But many people do and people in tech who actually know what these things are have a responsibility to not mislead the public (and ourselves) about what they really are and what they can be.
I responded to your point empirically, with problems not conventionally understood to be solvable with "text generation", and your response was in effect that I must be wrong because I'm afraid you might be right. Not an especially strong debate move.
Can you refute the argument I made, or do you just want to claim LLMs are drinking all our water?
Well, I don't believe the LLM solved those problems. I believe the user did. The LLM aggregated large amounts of information statistically, then the user read that and realized there was something to it and fixed it. Those accounts don't mention the 1000 other prompts that technical user did that yielded garbage results and the user was intelligent enough to disregard those.
No, that's false, in every example I gave. But I appreciate you making clearer that I correctly ascertained your original claim, that you believe they literally are just random text generators, and that people are simply cherry picking the rare meaningful text out of them.
That's what I thought you meant by "statistical text generator", and is why I was moved to comment.
1) I never said random 2) I never said cherry picking RARE meaningful text 3) It is not false in every example you gave just because you say that it is 4) If I didn't know better, I might think you're confused about what statistical means (hint: it's not random)
No, it's false in each example because I'm either a first or secondhand party to it happening (except for the Erdos thing) and I know it's false.
You managed to include in your blanket and conclusory rebuttal "solving undergrad math problems instantaneously". That was one of my examples because (1) it pertains to the subthread, (2) I was talking about it upthread, and (3) I have direct firsthand knowledge.
As I said elsewhere: I've fed thousands of math problems through ChatGPT (starting with 4o and now with 5.5). They've all been randomized. They do not appear in textbooks. They cover all the ground from late high school trig to university calc III. I do this habitually, every time I work an "interesting" problem, to get critiques on my own work. GPT has been flawless, routinely spotting errors or missed opportunities. If I have any complaint, it's that GPT tends to be too much better than I am at any given point, using concepts from later courses to solve simpler problems.
Square that with the claim you're making.
I can do the same thing with vulnerability research (I've been a vuln researcher since 1996 and I use LLMs to find vulnerabilities). But this thread is about math, and it's even easier to show you're wrong in the context of math.
That's convenient. But I have a challenge for you if you're brave enough to face your delusions. Paste this into your LLM of choice and see what happens:
"A farmer has 17 sheep. 9 ran away. He then bought enough to double what he had. His neighbor, who had 4 dogs and 14 sheep, gave him one-third of her animals. The farmer sold 5 sheep on Monday and again the next day, which was Wednesday. Each sheep weighs about 150 lbs. How many sheep does the farmer have?"
He bought enough to double what he had: 8 more sheep, so 16 sheep
Neighbor has 4 dogs + 14 sheep = 18 animals
One-third of her animals = 6 animals
But the problem does not say all 6 were sheep. It says “animals.” So the exact sheep count depends on which animals she gave him.
Then:
16 + s sheep from neighbor - 5 - 5 = 6+s
where s is the number of sheep among the 6 animals she gave him.
So the answer is not uniquely determined.
Possible sheep count: 6 to 12 sheep, depending on whether the neighbor gave him 0 to 6 sheep.
(I clipped the GPT5 answer here, but will note additionally that even the LLM built into the Google search results page handles this question; both note the possible trick question with the days of the week.)
And that's the wrong answer. It's a word problem, not a math problem. Also, if it really was a math problem, it wouldn't be 0-6 sheep from the neighbor, it would be 2-6. So it even failed on the math.
Are you trying to win this debate with a Facebook "ONLY THE SMARTEST 1% CAN SOLVE" question? The whole point of the question is for some loser to be able to say "no you missed XYZ" ambiguity any time a sane answer is given.
By your logic, the only "correct" answer for an LLM to give to this is "the person who asked you this is fucking with you, this is not a real question". I concede: this is a limitation of modern LLMs: they will try to answer stupid questions.
No, it's a real question. And if it were a math question. The neighbor has 18 animals, only 4 of which are dogs. The farmer receives 1/3 of those which is 6. So for the farmer to receive 0 sheep would require the farmer to receive 6 dogs. But there are only 4 dogs. LOGICALLY, the farmer must receive at least 2 sheep from the neighbor. There's no ambiguity. That's logic. That's intelligence. It's real actual math. Basic arithmetic. A person can easily sit down and work this out. It illustrates that the AI is generating responses statistically and not actually thinking. There are two full layers of failure here: the word problem, and the math problem underneath it.
I'm really not interested in this Calvinball argument where we try to conclude whether or not LLMs can do math by avoiding as much as possible actually doing math.
A concise problem that requires actual logic will naturally seem a bit convoluted, but an intelligent being can sit down and work it out logically. Anyway, it's not an argument. It's empirical evidence that supports my argument. You have chosen to ignore it or otherwise rationalize it away. Nothing I can do about that.
But the systems that do that impressive work are no longer just LLMs. Look at the Claude Code leak - it’s a sprawling, redundant maze relying on tools and tests to approximate useful output. The actual LLM is a small portion of the total system. It’s a useful tool, but it’s obviously not truly intelligent - it was hacked together using the near-trillions of dollars AI labs have received for this explicit purpose.
What does this matter? You can build a working coding agent for yourself extremely quickly; it's remarkably straightforward to do (more people should). But look underneath all the "sprawling tools": the LLM itself is a sprawling maze of matrices. It's all sprawling, it's all crazy, and it's insane what they're capable of doing.
Again if you want to say they're limited in some way, I'm all ears, I'm sure they are. But none of that has anything to do with "statistical text generation". Apparently, a huge chunk of all knowledge work is "statistical text generation". I choose to draw from that the conclusion that the "text generation" part of this is not interesting.
Well, hang on a second - it sounds like you may actually disagree with the user who created this thread. That user claims that these systems exhibit “real intelligence”, and success on this Erdos problem is proof.
You seem to be making the claim that LLMs are statistical text generators, but statistical text generation is good enough to succeed in certain cases. Those are different arguments. What do you actually believe? Are we even in disagreement?
I don't have any opinion about "real intelligence" or not. I'm not a P(doom)er, I don't think we're on the bring of ascending as a species. But I'm also allergic to arguments like "they're just statistical text generators", because that truly does not capture what these things do or what their capabilities are.
(The clearer way for me to have said this is that I don't care whether they're According-to-Hoyle "intelligent", and that controversy isn't what motivated me to comment).
"But I'm also allergic to arguments like "they're just statistical text generators", because that truly does not capture what these things do or what their capabilities are."
Umm, why doesn't it capture it? Why can't a statistical text generator do amazing things without _actually_ being intelligent (I'm thinking agency here)? I think it's important to remind ourselves, these things do not reflect or understand what they're outputting. That is 100% evident with the continuing issues with them outputting nonsense along with their apparently insightful output. The article itself said the output was poor but the student noticed something about it that sparked an idea and he followed that lead.
I reject the premise. I read the outputs I generate carefully (too carefully, probably). They don't "continue to output nonsense". Their success rate exceeds that of humans in some places.
To clarify: the problem I have with "statistical text generator" isn't the word "statistical". It's "text generator". It's been two years now since that stopped being a reasonable way to completely encapsulate what these systems do. The models themselves are now run iteratively, with an initial human-defined prompt cascading into series of LLM-generated interim prompts and tool calls. That process is not purely, or even primarily, one of "text generation"; it's bidirectional, and involves deep implicit searches.
Do you think it's akin to Ilya's [1] claim that next token prediction is reality? E.g. any deeper claims about the structure of that intelligence or comparing to humans?
To be clear, I'm 100% with you that "next token predictor" is stupid to call what these machines are now. We are engineers and can shape the capability landscape to give rise to a ton of emergent behavior. It's kind of amazing. In that sense, being precise about what's going on, rather than being essentialist (technically, yes, the 'actual' algorithm, whatever that even means, is text prediction), is just good epistemology.
I still think it's still a very interesting question though to ask about deeper emergent structures. To me, this is evidence of a more embedded cognition kind of theory of intelligence (admittedly this is not very precise). But IDK how into philosophy you are.
I try really hard not to think about this stuff because I've seen how people talk when they get too deep into it. My mental model, or mental superstructure, if you will, for all of this stuff is that we've discovered a fundamentally novel and effective way of doing computing. Computer science is fascinating and I'm there for it, and prickly when people are dismissive of it. I'm generally not interested in the theory of human intelligence (it's a super interesting problem I just happen not to engage with much), which spares me from a lot of crazy Internet stuff.
Just to clarify because I’m not sure I understand:
So you agree that LLMs are in fact statistical text generators but you don’t like people use that fact in arguments about the capabilities of the things?
Not parent but I think you're being rather dense. They are _obviously_ statistical text generators. There's plenty of source code out there, anyone can go and inspect it and see for themselves so disputing that is akin to disputing the details of basic arithmetic.
But it is no longer useful to bring that fact up when conversing about their capabilities. Saying "well it's a statistical text generator so ..." is approximately as useful as saying "well it's made of atoms so ...". There are probably some very niche circumstances under which statements of each of those forms is useful but by and large they are not and you can safely ignore anyone who utters them.
It is still important to mention that because atoms have limitations and so do statistical generators. Plain and simple. People are walking around thinking organic brains are just statistical generators and they're gonna build AGI with GPUs. It's absurd.
And your evidence for these claimed limitations is ... ? I'm not aware of evidence either for or against organic brains being "just" statistical generators. Neither am I aware of evidence either for or against AGI being possible to achieve using GPUs. AFAICT you're just making things up.
I think you're actually making a point but overall still disagree.
I do think LLM's are evolving towards this kind of embodied cognition type intelligence, in virtue of how well they interoperate with text. I mean, you don't need to "make the text intelligible" to the LLM, the LLM just understands all kinds of garbage you throw at it.
Now the question is: Is intelligence being able to interoperate?
In the traditional sense, no. Well, in a loose sense, yes, because people would've said that intelligence is the ability to do anything, but that's not a useful category (otherwise, traditional computer programs would be "intelligent"). But when I hear that, I think something like "The models can represent an objective reality well, it makes correct predictions more often than not, it's one of these fictional characters that gets everything and anything right". This is how it's framed in a lot of pop culture, and a lot of "rationalist" (lesswrong) style spaces.
But if LLM's can understand a ton of unstructured intent and interoperate with all of our software tools pretty damn well... I mean, I would not call that "a bunch of hacks". In some sense, this is an appeal to the embedded cognition program. Brain in a vat approach to intelligence fails.
But it clearly enables new capabilities that previously were only possible with human intelligence. In a very blatant negative form: The surveillance state is 100% now possible with AI. It doesn't take deep knowledge of Quantum Physics to implement, with a large amount of engineering effort, data pipelines and data lakes, and to have LLM's spread out throughout the system, monitoring victims.
So I'd call it intelligence, but with a qualifier to not slip between slippery slopes. It may even be valid to call the previous notion of intelligence a bad one, sure. But I think the issue you may be running into is that it feels like people are conflating all sorts of notions of intelligence.
Now, you can add an ad hoc hypothesis here: In order to interoperate, you have to reason over some kind of hidden latent space that no human was able to do before. Being able to interoperate is not orthogonal to general intelligence - it could be argued that intelligence is interoperation.
If you're arguing for embodied cognition, fine, we agree to some extent :)
The fear is that the AI clearly must be able to emulate, internally, a latent space that reflects some "objective notion of reality". If it did that, then shit, this just breaks all of the victories of empiricism, man. Tell me about a language model that can just sit in a vat, and objectively derive quantum mechanics by just thinking about it really hard, with only data from before the 1900s.
I don't think you need to be this caricature of intelligence to be intelligent, is what I'm saying, and interoperability is definitely a big aspect of intelligence.
Now this I can agree with. One thing that is extremely important to maintain with this technology is nuanced perspective. Otherwise, it will lead you astray quickly. It's also a difficult thing for us to maintain.
Solving open math problems is strong evidence of intelligence so there's not really any need for rationalization? I don't understand why intelligence would require intent or motive? Isn't intent just the behaviour of making a specific thing happen rather than other things?
I haven't used stable diffusion enough to have a strong opinion on it. But my thinking is LLMs have only recently started contributing novel solutions to problems, so maybe there is some threshold above which there's less sloppy remixing of training data and more ability to form novel insights, and image generators haven't crossed this line yet.
reply