Not the first to notice this I'm sure but it feels like there's an insane amount of pressure pushing capital towards anything with a hint of AI legitimacy. It's as if asset owners across the planet have come to a consensus that the only industry that will matter going forward is this one (fair enough I guess), but this intense systemic pressure squeezes insane amounts of money toward litearlly any AI shaped outlet that opens up. It's just starting to feel like "scared and desperate" money more than "smart money".
Is it not a case of many funders don't want to risk missing out on the next big thing? And a loss of a few billion now is better than the loss of many billions down the line and control of the future?
Of course the motivation makes sense on the surface. What I'm getting at is that the supply of capital vs the supply of potential "control of the future" plays feels incredibly imbalanced. Money seems to be so desperate to move into AI it's lost all prudence (the particular people and company mentioned in the OP nonwithstanding, maybe they do deserve 1B).
"not wanting to risk missing out" is essentially just FOMO right? "Smart" money has feels more like FOMO money these days. We literally have shoe companies savying they're going to pivot to AI and having their market cap increase in multiples as reward.
I don't think Silicon Valley has been smart money for a decade plus. Quantum Computing is becoming the exact same with academic and government funding. With a lot of cash being spent on long shots or no hopers.
It clearly was, at least in part. Somehow, it feels just right here: Man trusts AI to do the right thing and it burns him. 5 minutes later, man trusts AI to explain what happened on X.
I like the way the LLM implies that an API call should have a “type DELETE to confirm”. That would make no sense, and no human would ever suggest or want that, I hope.
Never thought I'd see the day ragebait made it to HN. Yes, let's pretend doing a long jump on the moon is comparable to running a marathon at its prescheduled time at its prescheduled location. Weather is always a factor in sports that take place outside. Might as well put asterisks on all accomplishments that took place on sunny days by your logic right?
Not sure I understand what you mean by "scientific." If you mean exactly reproducible, then almost nothing in athletics fits that definition. Every record in baseball, football, etc. would fail that definition.
I’m deeply interested and invested in the field but I could really use a support group for people burnt out from trying to keep up with everything. I feel like we’ve already long since passed the point where we need AI to help us keep up with advancements in AI.
This one’s been particularly hard to sit out because the executive and managerial class are absolutely mainlining this stuff and pushing it hard on the rest of the organization, and so whether or not I want to keep up, I need to, because my job is to actually make stuff work and this stuff is a borderline existential risk to the quality of the systems I’m responsible for and rely on.
This is only good advice if you don’t have the need to understand what’s happening on the edge of the frontier. If you do, then you’ll lose on compounding the knowledge from staying engaged with the major developments.
Not all developments are equal. Many are experimental branches of testing things out that usually get merged back into the core, so to speak. For example, I knew someone who was full into building their own harness and implementing the Ralph loop and various other things, spending a lot of time on it and now, guess what? All of that is in Claude Code or another harness and I didn't have to spend any amount of time on it because ultimately they're implementation details.
It's like ricing your Linux distro, sure it's fun to spend that time but don't make the mistake of thinking it's productive, it's just another form of procrastination (or perhaps a hobby to put it more charitably).
I agree that a full linux distro compile as a matter of practice is a waste of time. But, doing it a few times is good if you want to understand your tools.
I don’t believe that top tier engineers just skip learning things because they might turn out to be dead-ends or incorporated into tools by someone else; in my experience they tend to be extremely interested in things that seem like minutiae to others when working on the bleeding edge, often implementing their own systems just to more fully understand the problem space.
If it’s a day job for someone and they are not ambitious, fine. But we are at hacker news. I would bet 99%+ of top tier software talent could tell you practical experience with ralph loops this year, or a homegrown variety, simply because they are an attempt to solve a very real engineering problem (early exit, shitty code/incorrect responses, poor context window length and capacity), and top tier software people expect more control of their engineering environment, and success using their tools than they’d get by just saying ‘meh, whatever, I don’t get this and I’ll just wait it out.’
The players barely ever change. People don't have problems following sports, you shouldn't struggle so much with this once you accept top spot changes.
I didn't express this well but my interest isn't "who is in the top spot", and is more _why and _how various labs get the results they do. This is also magnified by the fact that I'm not only interested in hosted providers of inference but local models as well. What's your take on the best model to run for coding on 24GB of VRAM locally after the last few weeks of releases? Which harness do you prefer? What quants do you think are best? To use your sports metaphor it's more than following the national leagues but also following college and even high school leagues as well. And the real interest isn't even who's doing well but WHY, at each level.
It is funny seeing people ping pong between Anthropic and ChatGPT, with similar rhetoric in both directions.
At this point I would just pick the one who's "ethics" and user experience you prefer. The difference in performance between these releases has had no impact on the meaningful work one can do with them, unless perhaps they are on the fringes in some domain.
Personally I am trying out the open models cloud hosted, since I am not interested in being rug pulled by the big two providers. They have come a long way, and for all the work I actually trust to an LLM they seem to be sufficient.
Their financial projections that to a big part their valuation and investor story is built on involves actually making money, and lots of money, at some point. That money has to come from somewhere.
I’m very satisfied with being three months behind everything in AI. That’s a level that’s useful, the overhyped nonsense gets found out before I need to care, and it’s easy enough to keep up with.
It honestly has all kinda felt like more of the same ever since maybe GPT4?
New model comes out, has some nice benchmarks, but the subjective experience of actually using it stays the same. Nothing's really blown my mind since.
Feels like the field has stagnated to a point where only the enthusiasts care.
For coding Opus 4.5 in q3 2025 was still the best model I've used.
Since then it's just been a cycle of the old model being progressively lobotomised and a "new" one coming out that if you're lucky might be as good as the OG Opus 4.5 for a couple of weeks.
Subjective but as far as I can tell no progress in almost a year, which is a lifetime in 2022-25 LLM timelines
Another annoyance (for more API use) is summarized/hidden reasoning traces. It makes prompt debugging and optimization much harder, since you literally don't have much visibility into the real thinking process.
I don't trust the benchmarks either, so I maintained a set of benchmarks myself. I'm mostly interested in local models, and for the past 2 years they have steadily gotten better.
Can't argue with subjective experience, but if there were some tasks that you thought LLMs can't do two years ago, maybe try again today. You might be surprised.
I'd wager that being conscripted in Norwary carries a different level of risk of deployment than being conscripted in the US, given the fact that we've been essentially been nonstop involved in wars for my entire lifetime.
When you were conscripted did you fear you might be sent to Iraq or Afganistan? It just feels like given our history an American conscript will litearlly always have some active warzone to possibly be sent off to. Our contries and our armies are not the same. Is Norway today chomping at the bit to send its soldiers to Iran? Or, per Trump, "our next conquest" Cuba? I really don't think you can think of being drafted into the American army the same way you think of the compulsory service of countries like South Korea or your own.
Being conscripted in a defensive army is materially different than being conscripted into one that takes every opportunity to engage in conflicts across the globe.
I did my service right around the time GWOT started, and it was around this time that our military started to focus more on transitioning to a professional (we do have professional units) military aimed at fighting terrorism in the middle east(Afghanistan/ISAF) as part of our NATO duties.
By the time you were finishing up your service (6-12 months depending on where you were stationed), you'd get a presentation on "the road ahead" if you wanted to continue military life: military school/college, become a professional soldier, etc.
With that said, I think maybe 10%-15% of the guys in our platoon decided to go with the Afghanistan route. IIRC that meant transferring / trying out for the professional battalion (TMBN), training for some time, and then deployed.
I don't think sending all conscripted soldiers to some foreign war will yield good results. But I do think that by the end of their service, some will be hyped up and "thirsty" enough to just go for it.
Genuinely sorry he let you down and you're left holding the bag dude. But please understand people aren't going to accept your weak rationalizations anymore.
Why is anyone still using or even talking about Gas Town? Now that HN is largely onboard with agentic development and has at least tried it themselves who's still under the impression that it's useful?
The value you get out of a simpler adversarial loop to critique your "main" agent's work is high. Stacking Steve Yegge's personal Kingdom of Nouns on top of each other doesn't add much more.
And this doesn't even begin to get into the madness that is verification for software that matters and is exposed through multiple modalities. You cannot let an agent just vibe its way around "does this business-critical thing with these specific use cases do its job correctly", much as Yegge might have you believe.
I was about to post this same q, but saw yours and somehow that switched me from "wtf?" to "I have an answer.": There's just such interest in anything.
To wit, I still can't believe OpenClaw blew up, and it's much less......opinionated, than whatever is going on here. (deacons?)
Non-SWE TradMom™ posted on X™ yesterday about her OpenClaw that is set up with all her accounts so every morning she can get a family summary. She added a hunk with a bunch of stuff amounting to "PLEASE don't do anything insecure!", and the OpenClaw founder retweeted approvingly.
I left Google 3 years ago to build something. I'm very fond of the OpenClaw founder. And yet, absolutely cannot believe that he let such an obvious UX and security mess out into the world. We grew up in the same incubator (~2008 iPhone OS twitter) and presumably share the same values yet came to polar opposite conclusions.
Why do I view it as such a necessity to have a GUI/multiplatform/built in Willison Trifecta stuff that I'm still pounding away 2.5 years in and won't release, when, clearly you don't need that stuff?
I think in a steady state, product and UX discipline will win out. I bet within 3 months Gastown is a ghost town with maybe some non-technical crypto fans. In a year, OpenClaw is probably around, but not nearly the mindshare. It'll be quietly de-invested via OpenAI carefully managing the OpenClaw founder into working on their Everything App. (This is already happening: he got a nice PR interview with an OpenAI lead previewing the Everything App.)
Another anecdote re: demand:
My completely non-technical nurse ex-girlfriend from high school called me two weeks ago, for the first time in years. Lede was I was right about AI, and the substance was: via Claude Code, she built her own Ollama-based Mac Mini server that she could connect to remotely via an Expo app.
Does it work? Astoundingly, yes.
She also has no idea what is going on. She swears up and down that her AIs on Claude.ai, ChatGPT.com and Ollama are somehow talking to each other, and she does not mean APIs. She tried answering a Q I had about a graph visualization of her chats by talking to ChatGPT.com about it, even though Claude Code had wrote it, and I just didn't bother saying anything.
Anthropic recently killed the ability for third parties to use the Claude Code subscription, and it's assumed they're subsidising that price heavily. Which is fine, but it's a good reminder of the vendor lock-in risk. One policy change and your workflow breaks. Twill is agent-agnostic (Claude Code, Codex CLI, OpenCode), so you're not betting on any single vendor's pricing decisions.
On the cost for solo devs, yeah, if you're one person running one agent at a time on your laptop, the sub is probably the better deal today. No argument there. The cloud agent model starts to make sense when you want to fire off multiple tasks in parallel.
Yes, the difference is that Twill launches dedicated infra on each sandbox for each task. This means you can work on multiple tasks requiring a DB migration for instance.
Also you can fire and forget tasks (my favorite) and don't have to keep your laptop running at night.
See also Cowork and other upcoming Anthropic features.
See also Show HN, this exact product is frequently shown as a github link.
The paradigm shift in Ai means what you are making is (1) filling a gap until the primaries implement it, most have it in their pipeline if not already (2) how easy it is to replicate with said Ai using my preferred tech stack
Cowork does not seem to be focused on engineering, but we are fully expecting Anthropic to catch up in this category.
What Anthropic can't offer is to let you use Codex or combine it with Claude Code. That is why we think non ai-labs players have a say in this market.
To your last point, as always there is a buy vs build tradeoff which ultimately comes down to focusing on your core business which we think still remains important in the ai era
My comment about Cowork is more about pointing out a different feature set that will crossover with Code. In example they have the Task related things as an affordance, Code has this coming.
I believe there is a difference between an open source framework and a product. You would still have to manage and scale your infra, build the integration layer around it to make it accessible where your teams are, fix bugs etc...
I am not saying that build is always the bad choice, but the tradeoff did not disappear imo
I'm surprised how much you push back instead of dig in to understand more. I have heard mentor time is way down at YC since they stopped doing things that don't scale. You could be asking questions to better understand where you'd fit in with users and how to better position yourself. We are your market, how do we see the world now, post-ai?
I’m newer to knowing and caring about what YC does at all in terms of the companies it funds. The fact that this is YC makes me think the org has forfeited any sense of “taste” at all. Complete scattershot from people who have money to scatter I guess.
You can read old Paul Graham essays and the early YC Startup School (which is probably when peak YC happened) to get a sense of the ethos. They increased batch size to scale (as context for the "stopped doing things that don't scale" comment)
reply