Hacker Newsnew | past | comments | ask | show | jobs | submit | xpe's commentslogin

> Those tools seem mostly useful for a Google alternative, scaffolding tedious things, code reviewing, and acting as a fancy search.

Just to get a sense for the rate of change, imagine if you took a survey. Compare what people said about AI tools... 3 years ago, 2 years ago, 1 year ago, 6 months ago. Then think about what is plausible that people will be saying in 3 months, 6 months, 9 months ...

Moving the goalposts has always happened, but it is happening faster than I've ever seen it. Many people seem to redefine their expectations on a monthly basis now. Worse, they seem to be unaware they are doing it.

Fancy search? Ok, I'll bite. Compare today's "fancy search" to what we had ~3 years ago according to your choice of metric. Here's one: minutes spent relative to information found. Today, in ~5 minutes I can do a literature review that would have taken me easily 10+ hours five years ago. We don't need to argue phrasing when we can pick some prototypical tasks and compare them.

We're going to have different takes about where various AI technologies will be in these future timelines. It is much better to run to where the ball is likely to be, even if we have different ideas of where that is.

The human brain, at best, struggles to grasp even linear change. But linear change is not a good way to predict compounding technological change.


> Today, in ~5 minutes I can do a literature review that would have taken me easily 10+ hours five years ago.

And it will not yield the same outcome you would have had. Your own taste in clicking links and pre-filtering as you do your research, is no longer being done if you outsource this. I‘m guilty of this myself. But let’s not kid ourselves.

I’ve had GPT Pro think 40 minutes about the ideal reverse osmosis setup for my home. It came up with something that would have been able to support 10 houses and cost 20k. Even though I did tell it all about what my water consumers are and that it should research their peak usage. It just failed to observe that you can buffer water in a tank.

There‘s a reason they let you steer GPT-Pro as it goes, now.


I don't claim using AI is the same as doing it yourself. My point is that AI capabilities are much more extensive than "fancy search". By giving a metric and an example I hoped to make that point without getting into hair-splitting.

I wouldn’t call that hair-splitting. I’m saying, it’s not a real literature review, but even fancier search.

Words hint at concept space, which is messy and interconnected. I think a charitable reading can understand the difference between "powerful search, kind of like Google as of 2020, or Lexus-Nexus" and LLM-AI chatbot interfaces... I would hope. But I've been developing software since the 1980s so I can't speak for the newer generations who might not have a quadruple decade view. I've been in meetups in San Francisco around 2018, where people were excited to find multimodal reasoning in early days proto-language models. There have been qualitatively noticeable historical shifts. We don't have to agree on the exact labels used, but what LLM's enable is different enough from e.g. ElasticSearch of 2020 to call out.

Your quoted example to make that point isn't particularly convincing, IMO. Cursor came out in 2023 and everything on that list would be a typical use case, plus ChatGPT for the search replacement.

Of course, it wasn't nearly as effective back then compared to current SOTA models, but none of those are hard to imagine someone recommending Cursor for anytime in 2024 or later.

If OP instead said something like one shotting an entire line of business app with 10k LoC I would agree with your reminder about perspective. But it feels somewhat hype-y to say that goal posts are being moved "monthly" when most of their list has been possible for years.


I was attempting to give an example to say that AI-LLM technology is more than "fancy search" which to me sounds like "search engine". / I realize now that ChatGPT was released in late 2022, more than 3 years ago. Time flies.

> But it feels somewhat hype-y to say that goal posts are being moved "monthly"...

Here's what I mean. What you see if you kept a journal once a day and wrote down:

1. what impressed you about AI that day;

2. what did you do with it that day that you pretty much took for granted ("just SoTA")

Then compare today against 30 days ago. A lot changes! My point is that it is getting harder to impress us: our standard for what we expect seems to be changing significantly on a ~monthly basis. What does this rate of change where you "just expect something to work as table stakes" feel like to you? Certainly faster than annually, right? 6 months? 3? 2? 1?

For me, a lot of this isn't just the raw technology but also socialization of what the tools can do and the personal experience of doing it yourself.


Can you explain this literature review process?

I don't believe you can do a same quality job with an LLM in 5 minutes.


I don't mean writing a literature review. I mean reviewing the literature to find what I need. My point is that this was not practical with "fancy search" three years ago by which I mean Google-like search engines.

My example: I wanted to get a sense for the feasibility of doing a project that blends Gaussian Processes, active learning, and pairwise comparisons. So I want to dig into the literature to find out what is out there. This was around 5 minutes with Claude. In this case, I don't think I could have found what I wanted in 10 hours of searching and reading. This is the kind of thing that great LLMs unlock.


It is a better investment to read about those things for a bit in my experience. It should not be scary or niche to take some time and read a textbook or a high quality paper.

There is no replacement for reading textbooks or high quality papers.

If you are saying that you didn't do this kind of thing anyway and now you can do it. Then I would question the definition of the action you are doing because it is not the same in my opinion.


This is just "classic" (but avoidable) miscommunication. I don't even disagree with you on your point! I'm only saying you are not reading my point in the context it was offered.

For more background, please read Rapoport's Rules : https://themindcollection.com/rapoports-rules/

> Rapoport’s Rules, also known as Dennett’s Rules, is a list of four guidelines that detail how to interpret arguments charitably and criticise constructively. The concept was coined by philosopher Daniel C. Dennett in his book Intuition Pumps. Dennett acknowledged our proclivity to misinterpret and attack a counterpart’s argument instead of engaging meaningfully with what was actually said.


You're relying on the public's sentiment as a metric. The public's sentiment is, more than often, skewed, influenced by marketing, or flat out wrong. That is not a good metric to rely on.

Did it ever occur to you that the ever changing goalposts might have more to do with the expensive marketing campaigns of the big LLM providers?

We could talk about what's a measurable metric and what's not. Certainly, we have not much more other than "benchmarks" of which, honestly, I don't know the veracity of, or if big LLM cheats somehow, or if the performance is even stable. The core idea is that LLMs remain able to do exactly what they were able to do back at release; text prediction. They got better in some regards, sure.

Your example is worrisome to me. It should be to you too. You didn't write a literature review, you generated a scaffold of a literature review, with the same vices of LLM-based-writing as anything it does and still needing review and revising. I would hope rewriting to avoid your work be associated with LLM-generation. For better or worse, you still need to, normally, revise your work. For, once again, because this point seems to be difficult to grasp, a text predictor is not a reliable source of information. We make tradeoffs, sacrificing reliability for ease of use, but any real work needs human reviewing: which goes back to my first point. In this example it's doing nothing other than it being a fancy search and scaffolding tool.

The ball is likely to be in the same place because, once again, they're text predictors. Not sentient beings, or intelligent. Still generating text, still hallucinating, probably even more so thanks to the ever increasing amount of LLM-written content on the internet and initiatives like poison fountain doing a number on the generated content.

It's wild to me to make such claims about the rate of change of those tools. You're claiming we'll see exponential gains for those tools, I take, while completely ignoring the base set of constraints those models will, never, be able to get rid of. They only know how to produce text. They don't know, and will never really, know if it's right.


Hi. I read your message, and I considered it. I've also read some of your previous HN comments. Briefly, I'll just say I've argued at length against many of the claims you make (you certainly aren't alone in making them). I don't feel it would be useful to repeat these again here, but I'll reference a few, below, just to show that I do care about the subject matter and am happy to dig deeper ...

... but only with certain conversational norms. I say this because I predict we aren't (yet) matched up in a way such that we would have a conversation useful to us. The main reason (I guess) isn't about our particular viewpoints nor about i.e. "if we're both critical thinkers". We're both demonstrating that frame, at least in our language. Instead, I think it is about the way we engage and what we want to get out of a conversation. Just to pick one particular guide star, I strive to follow Rapoport's Rules [1]. FWIW, HN Guidelines are not all that different, so simply by commenting here, one is explicitly joining a sort of social contract that point in their direction already.

Anatol Rapoport or Daniel Dennett were not only brilliant in their areas of specialty but also in teaching us how to criticize constructively in general. I offer the link at [1] just in case you want to read them and give them a try, here. We can start the conversation over (if you want).

---

In response to your comments about consciousness, intelligence, etc, here are some examples of what I mean by intelligence and why:

- intelligence: https://news.ycombinator.com/item?id=43236444

- general intelligence: https://news.ycombinator.com/item?id=43223521

- pressure towards AGI: https://news.ycombinator.com/item?id=41707643

- intelligence as "what machines cannot do" / no physics-based constraints to surpass human intelligence: https://news.ycombinator.com/item?id=44974963

---

[1]: https://assets.edge.bigthink.com/uploads/attachment/file/151...


> ... but it seems like Anthropic is going for the Tinder/casino intermittent reinforcement strategy: optimized to keep you spending tokens instead of achieving results.

This part of the above comment strikes me as uncharitable and overconfident. And, to be blunt, presumptuous. To claim to know a company's strategy as an outsider is messy stuff.

My prior: it is 10X to 20X more likely Anthropic has done something other than shift to a short-term squeeze their customers strategy (which I think is only around ~5%)

What do I mean by "something other"? (1) One possibility is they are having capacity and/or infrastructure problems so the model performance is degraded. (2) Another possibility is that they are not as tuned to to what customers want relative to what their engineers want. (3) It is also possible they have slowed down their models down due to safety concerns. To be more specific, they are erring on the side of caution (which would be consistent with their press releases about safety concerns of Mythos). Also, the above three possibilities are not mutually exclusive.

I don't expect us (readers here) to agree on the probabilities down to the ±5% level, but I would think a large chunk of informed and reasonable people can probably converge to something close to ±20%. At the very least, can we agree all of these factors are strong contenders: each covers maybe at least 10% to 30% of the probability space?

How short-sighted, dumb, or back-against-the-wall would Anthropic have to be to shift to a "let's make our new models intentionally _worse_ than our previous ones?" strategy? Think on this. I'm not necessarily "pro" Anthropic. They could lose standing with me over time, for sure. I'm willing to think it through. What would the world have to look like for this to be the case.

There are other factors that push back against claims of a "short-term greedy strategy" argument. Most importantly, they aren't stupid; they know customers care about quality. They are playing a longer game than that.

Yes, I understand that Opus 4.7 is not impressing people or worse. I feel similarly based on my "feels", but I also know I haven't run benchmarks nor have I used it very long.

I think most people viewed Opus 4.6 as a big step forward. People are somewhat conditioned to expect a newer model to be better, and Opus 4.7 doesn't match that expectation. I also know that I've been asking Claude to help me with Bayesian probabilistic modeling techniques that are well outside what I was doing a few weeks ago (detailed research and systems / software development), so it is just as likely that I'm pushing it outside its expertise.


> To claim to know a company's strategy as an outsider is messy stuff.

I said "it seems like". Obviously, I have no idea whether this is an intentional strategy or not and it could as well be a side effect of those things that you mentioned.

Models being "worse" is the perceived effect for the end user (subjectively, it seems like the price to achieve the same results on similar tasks with Opus has been steadily increasing). I am claiming that there is no incentive for Anthropic to address this issue because of their business model (maximize the amount of tokens spent and price per token).


>>> ... but it seems like Anthropic is going for the Tinder/casino intermittent reinforcement strategy: optimized to keep you spending tokens instead of achieving results.

>> This part of the above comment strikes me as uncharitable and overconfident. And, to be blunt, presumptuous. To claim to know a company's strategy as an outsider is messy stuff.

> I said "it seems like".

Sorry. I take back the "presumptuous" part. But part of my concern remains: of all the things you chose to wrote, you only mentioned "the Tinder/casino intermittent reinforcement strategy". That phrase is going to draw eyeballs, and you got mine at least. As a reader, it conveys you think it is the most likely explanation. I'm trying to see if there is something there that I'm missing. How likely do you think is? Do you think it is more likely than the other three I mentioned? If so, it seems like your thinking hinges on this:

> I am claiming that there is no incentive for Anthropic to address this issue because of their business model (maximize the amount of tokens spent and price per token).

No incentive? Hardly. First, Anthropic is not a typical profit-maximizing entity, it a Public Benefit Corporation [1] [2]. Yes, profits matter still, but there are other factors to consider if we want to accurately predict their actions.

Second, even if profit maximization is the only incentive in play, profit-maximizing entities can plan across different time horizons. Like I mentioned in my above comment, it would be rather myopic to damage their reputation with a strategy that I summarize as a short-term customer-squeeze strategy.

Third, like many people here on HN, I've lived in the Bay Area, and I have first-degree connections that give me high confidence (P>80%) that key leaders at Anthropic have motivations that go much beyond mere profit maximization.

A\'s AI safety mission is a huge factor and not the PR veneer that pessimists tend to claim. Most people who know me would view me as somewhat pessimistic and anti-corporate and P(doomy). I say this to emphasize I'm not just casting stones at people for "being negative". IMO, failing to recognize and account for Anthropic's AI safety stance isn't "informed hard-hitting pessimism" so much as "limited awareness and/or poor analysis".

I'm not naive. That safety mission collides in a complicated way with FU money potential. Still, I'm confident (P>60%) that a significant number (>20%) of people at Anthropic have recently "cerebrated bad times" [3] i.e. cogitated futures where most humans die or lose control due to AI within ~10 to ~20 years. Being filthy rich doesn't matter much when dead or dehumanized.

[1]: https://law.justia.com/codes/delaware/title-8/chapter-1/subc...

[2]: https://time.com/6983420/anthropic-structure-openai-incentiv...

[3]: Weird Al: please make "Cerebration" for us.


I like your style, and I appreciate you trying to get to the truth, despite us both being aware that we are engaging in persuasive writing here, so part of the rhetorical game is in what we choose to emphasize and what we choose to leave out.

> How likely do you think this is? Do you think it is more likely than the other three I mentioned?

I won't write down probability estimates, because frankly, I have no idea. Unless you are yourself a decision-maker at Anthropic, which, from what I can infer, you aren't, both of us are speculating. However, I can try to address each of your explanations at face value, because I don't think any of them makes Anthropic look any better than the explanation I provided.

> (1) One possibility is they are having capacity and/or infrastructure problems so the model performance is degraded.

As far as I understand it, scaling issues would result in increased latency or requests being dropped, not model quality being lower. However, there is a very widespread rumor that Anthropic is routing traffic to quantized models during peak times to help decrease costs. Boris Cherny, Thariq Shihipar, and others have repeatedly denied this is happening [1]. I would be more concerned if this were the actual explanation, because as a user of the Claude Code Max plan and of the API, I have the expectation that each dollar I spend buys me access to the same model without opaque routing in the background.

> (2) Another possibility is that they are not as tuned to what customers want relative to what their engineers want.

There is actually a strong case for this: the high performance on the benchmarks relative to the qualitatively low performance reported on real-world tasks after launch. I suspect quite a bit of RL training was spent optimizing for beating those benchmarks, which resulted in overfitting the model on particular kinds of tasks. I'm not claiming this is nefarious in any way or that it is something only Anthropic is guilty of doing: these benchmarks are supposed to be a good representation of general software tasks, and using them as a training ground is expected.

> (3) It is also possible they have slowed their models down due to safety concerns. To be more specific, they are erring on the side of caution (which would be consistent with their press releases about safety concerns of Mythos).

This would be the most concerning to me. I don't want to get too deeply into a political/philosophical argument, but I am very much on the other side of the e/accy vs. P(doomy) debate, and I strongly believe that keeping these tools under the control of some council of enlightened elders who claim to know what is best for humanity is ultimately futile.

If the result of the behind-the-scenes "cerebration" is an actual effort to try and slow down AI development or access, I don't have much confidence in the future of Anthropic.

I agree that there are incentives other than pure profit maximization here (I don't want to get into "my friend at Anthropic told me such and such" games, but I also believe this is the case). I'm sure there is some tension between these objectives inside Anthropic, but what is interesting is that lower model quality and maximizing user engagement could, at least in principle, align with both constraints.

[1] https://x.com/trq212/status/2043023892579766290


I strive to be decently Bayesian and embrace uncertainty. I'm sharing my probability estimates because it helps me to stop and think ("is this roughly what I think?" and "let spend a minute making sure before I say so"). But yeah, of course, they are my priors and fuzzy. Hopefully I can reflect I figure them out +/- 15% or so. But at least you can see how my takes compare with each other. And down the road I can see how I did.

Thanks for getting into some of the details ...

>> (1) One possibility is they are having capacity and/or infrastructure problems so the model performance is degraded.

> As far as I understand it, scaling issues would result in increased latency or requests being dropped, not model quality being lower.

Yes, many scaling issues would manifest in that way -- but not all. It seems plausible for Anthropic to have other ways to degrade model performance that don't show up in the latency or reliability metrics. I need to research more... (I'll try to think more on your other points later).


Explore help in all the forms you can find. Feel free to contact me if you like. I'm happy to meet people and do a phone call or video call. I'm not hard to find with a little digging.

I've been through a fair amount of situations in my life, and I don't think have any illusions about places AI could go. I'm definitely not an optimist -- and I don't think being naively optimistic is something we want in everyone! -- but I'm still fighting. I think it takes a lot more strength to say "the world is looking pretty messed up and not getting better if I just sit here, so here we go..." Here are some things I suggest:

- Seek connection and community. This depends where you live, but get out there. Coffee shops, volunteering, just saying hello to people.

- Fill your brain with interesting thoughts. If you are feeling rough or depressed, you naturally may want to feel lifted up most of the time. Everybody needs a break, a laugh, some levity, or at least a change.

- But not everyone, at least not all the time, truly wants a fake sense of "everything is going to be fine". Sometimes we need to find people that are fully engaged in reality who say "yes, this is unacceptable and not getting better anytime soon but we're not giving up". And they find ways to still move forward.

- Back to the personal connections again!: It helps me to know people who have come to the US from other places in worse conditions. It helps to know that people can move forward even when many things are terribly broken. From this point of view, humanity really can be impressive. (To me, sometimes I'm most critical of people who get complacent when things seem good.)

- So, to me, and many others (Stoics especially), pessimism has a huge role to play. Things could go very badly. There are no guarantees. So get prepared -- getting prepared for tough times is a concrete activity that has meaning.

- More sunshine for you: You might benefit most by reading some really hard-hitting authors. Read about how f--ked up wars can be, how precarious the Cold War was. But somehow we made it through. There are no do-overs. Hopefully people realize the best time to do something was yesterday, but today is pretty good too!

- Train your mind. It feels good to invest in your own thinking. I recommend finding the most substantive and engaging material you can find about understanding how your brain works. For many people, this opens up a whole new set of tools.

- Find sources of inspiration. Personally, I'm a secular humanist. I've found great wisdom in the book Replacing Guilt by Nate Soares. You might too. https://replacingguilt.com/toc/ Some of my favorite sections are:

See the dark world : https://mindingourway.com/see-the-dark-world/

Detach the grim-o-meter : https://mindingourway.com/detach-the-grim-o-meter/

Dark, Not Colorless : https://mindingourway.com/dark-not-colorless/

> The last arc of posts has been about how to handle a dour universe. Become unable to despair, learn to see the darkness rather than flinching from it, learn to choose between bad and worse without suffering. Learn to live in a grim world without becoming grim yourself, learn to hear bad news without suffering, and stop needing to know your actions were acceptable. Come to terms with the fact you may lose, use the darkness as a source of fuel, and let go of dreams of total victory. These are the tools I use to tap into intrinsic motivation, in a precarious world where the problems are larger than I am.


> More opportunities will be posted here in the coming months. Click here to sign up for updates to stay informed when new roles open.

Which links to: https://lp.constantcontactpages.com/su/sKWkWfp

Would anyone like to do some citizen journalism and see if the Constant Contact data handling is done above-board. I've done some Claude research -- enough to make me suspicious -- but I Am Not A Lawyer.


I understand the spirit of this comment (and I get it), but we want the opposite to be true. Let's find ways to support good people who step up.

Edits (in case my meaning above is not clear):

1. When I write "but we want the opposite to be true" I mean this: if only Trump-aligned or Trump-tolerant people sign up for these roles, I do not think this is desirable for NASA.

2. When I write "I understand the spirit of this comment (and I get it)" I mean: from an individual point of view, I fully grant that many people would be better off seeking work elsewhere.

3. My experience and scientific research shows that people are not merely selfish actors. While individual incentives matter a lot, perhaps even predominantly, it isn't accurate to claim that we can fully explain human behavior with exclusively narrow individualist framings.

4. Many of us act selfishly much of the time, and this is indeed reasonable and even beneficial at times. But taken to an extreme it can be worse overall, even for those individuals. See: game theory, social connections, morality, and so on.

5. When I write "Let's find ways to support good people who step up" I do mean concrete things such as "let's crowdfund ethical people's legal fees" to survive the Trump administration.


Given what we're facing, I am actually skeptical of people who step up to work for the government at this moment in time. There's a lot of nationalist language on this site. Even if your motivations are for science, do we really want to give any assistance to the goals of this administration?

> I am actually skeptical of people who step up to work for the government at this moment in time.

I'm sure this wounds them deeply.

Given what we're facing worldwide, I'd say more people are skeptical of anyone that works in tech at this moment in time.

>There's a lot of nationalist language on this site.

Incredibly the US government isn't anti-US. This may come as a surprise to some in certain online bubbles.

>do we really want to give any assistance to the goals of this administration?

The goals of going to the moon? You're right, it's a giant waste of money when there are problems to be solved on earth. Something many people have been saying for a long time. Glad you're coming around.


I think it's a bit of, "Be the change you want to see". It may not be a bad thing to get tech folk with sense into these roles. They probably tend to have enough of a cushion to be able to refuse unethical work without worrying about the immediate consequences.

NASA had a nationalist origin and has always kept those undertones even in the modern day, but I don't think anyone's ever accused it of being partisan. I don't believe many Americans associate NASA with any particular president, except maybe JFK, and I don't believe they'd conflate working for NASA with working for Trump.

I think part of the point of OP was that this isn't a good way to support people to step up. It's frankly bizarre and has dubious future prospects like any other federal program under the current administration.

Good people need to make a living too!

These job postings opened today on April 17 and close in four days (on April 21). This is highly compressed and highly unusual.

Being no fan of the current administration and its hangers-on, my brain quickly jumps to less flattering reasons for these short time windows. A four day application window favors people they want to select. They may well have told certain people in advance to be ready. I don't have direct "proof" of this, and I'm open to learning more, but the current administration has beyond exhausted any presumption of fair dealing.

I encourage anyone and everyone interested to apply and report back. NASA has a good mission and its needs people with a moral backbone and intrinsic pro-science drive.


I initially thought this was a call for technologists to commit to volunteering on a deep technical project for four days. That’s not enough time to design a component. But it might e.g. let some minor work on a protocol advance.

That has been the assumption in most of these cases. The agency must already have a list of people they want, so a short window keeps the risk of someone else jumping to the front of the queue.

I’m sure it’s based on merit

This. / Who remembers the "birth" of crowdfunding? Why did so much seem to happen at all once? The most likely explanation imo is that it was "in the air" -- we share culture and ideas. These ideas don't have to be stolen to co-occur... quite the opposite.

The human brain strikes again. It is built into our cognitive machinery to look for patterns and naively ascribe causation. We're not rational beings that sometimes mess up. We're a clusterf--k of cognitive biases all the way down.*

Cool pattern! Sure, maybe there is something there.** And/or maybe our brain is doing "conspiracy theorizing lite". Its all on the same spectrum -- the same flawed cognitive machinery trying to operate in a weird modern world quite different from where we came from.

A better way: write out your favorite hypothesis. But don't stop there... keep going... write out many hypotheses. Then find ways to test them. To tap into our best selves, I recommend The Scout Mindset (book). Here is an infographic summary of part of it: https://imgur.com/qN31PX8

Probably not a better way: float one's first gut feels to the Internet phrased as i.e. the better question and feed empty calories to our pattern-craving brains. There is reason some of our brain functions are considered higher order.

* Maybe I'm overstating this. Let me know? I want to read Rationality and the Reflective Mind by Keith Stanovich (https://academic.oup.com/book/5930) as a counterpoint to the usual suspects (such as Tversky & Kahneman)

** But what is there. What kind of pattern? What kind(s) of causation could be at work? See Judea Pearl's "ladder of causation". Nice write-up here: https://samuel-book.github.io/causal_inference_notebook/pear...


We cannot trust identity like we used to here on HN (even pre-LLM-AI I thought we seemed naive.) Unfortunately, we live in a world or anyone or any AI can claim almost anything plausible sounding.

Where do we go from here? (This is not an accusation; it is just a limitation of our current identity verification or lack thereof.)


You can confirm that the people who say things are in a position to know.

Please don't forget that OpenAI's leadership has shown the world what it is really made of.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: