Hacker Newsnew | past | comments | ask | show | jobs | submit | dijit's commentslogin

FreeBSD is quite lax when it comes to security- especially defaults and configs.

The preference is for usability over security.

Famously: https://vez.mrsk.me/freebsd-defaults

I appreciate your work on the project, but I can’t in good conscience suggest people switch while are such bad defaults.


> the GitHub team is shit, their tech stack is shit

1) Criticism of being unable to achieve service is not a fault of the individual; it simply is a fault of the system. You can criticise the system, it's permissible. Especially if they have more resources than many countries and some of the best tech talent in the world on staff.

2) Their tech stack is shit, and they've gone on record for years defending it, quite arrogantly in some cases, as if nobody can possibly know anything unless they've done github (even if you've done things which scale, or someone comes in with an even larger scale, the people on HN will happily say "but it's not github" which is valid but not intellectually curious or open).

Azure is terrible and it's being foisted on the team: even if they found some technical people to put at the top who are saying it'll be ok: it is a pretty cruel platform to use.

I've personally had a few conversations about their choice of relational database which were handled pretty defensively, and I think we're all somewhat cognisant of their frontend rewrite.

It's a waste of time to rewrite the UI and push AI tools when you can't even keep the site lit.

I have nothing against the engineers- I don't know why people keep chiming in as if we're punching down at "lowly engineers" when the reality is that it's a management failure of the highest order.

They're a billion-dollar company owned by a trillion-dollar one... it's very hard to "punch down" at this system: nobody is going after the engineer, we're punching the fact that the system that is a defacto monopoly due to network effects is putting new features or pleasing their owners over the core offering.. How is that an engineering failure? That's an active choice by management.


>not intellectually curious or open

This checks out. I once was at a conference where they (Azure) had a giant booth. A fairly well known person in the community brings me over to talk to his manager who is working the booth. "We should hire him, he's really smart." Within a minute of talking to this manager he says "You're a Linux guy? We do Windows." and physically turns away from me, conversation over. You know, fair enough, was an easy way to find that it wasn't a good fit. But the lack of curiosity about "what do you bring to the table" was pretty stunning.

Be curious.

edit: Clarifying "they"


Wait, is this Azure or GitHub who had the booth? If it was GitHub, I’m super confused and there must have been some serious missing context. I was at GitHub from 2020-2023 and am not aware of _any_ Windows usage in the service. The only meaningful Windows footprint was for client dev (`gh`, GitHub Desktop, etc.) and even there, Windows was the exception. Service side is all Linux; most engineers worked from a Mac.

If the context was an Azure booth, I’m still mildly surprised (they’ve long been invested in beyond-Windows) but not shocked.

(Edit: I forgot about the Actions stack. Some of that was on Windows. I was pretty far removed from that world and much closer to the classic Ruby monolith side.)


Sorry about the ambiguity: I was replying to the Azure part, this was a Azure booth.

Oof, that’s rough, especially considering that GitHub used to be a Linux shop. I wonder what happened to all the Rails folks who built the OG platform.

They’re happy and vested probably :)

Happy and definitely gone, haha. Not my circus not my monkeys.

Your story (and the other posts commenting on lack of intellectual curiosity) fits into a larger model of the world that I prescribe to. Being labeled "well-known" or "smart" doesn't seem to require intellectual openness anymore. In fact, openness seems to be penalized. Being open means potentially exposing yourself to scenarios where you are not the smartest/authoritative, and that reduces your authority, so you avoid those scenarios to preserve your authority. Even when you are not "the authority", being open could be a threatening signal to the authority, where you and your "openness" could be a vector that introduces ideas/scenarios that reduces their authority. So long as authority is solidified by this lack of openness, actually being open could limit your career potential.

Seeing this happen in real time is helping me understand how authoritarian regimes/institutions/movements rise to power.


If they were curious they wouldn't "do Windows"

Wow - why anyone would build a serious Saas platform this day and age on Windows is beyond me.

> It's a waste of time to rewrite the UI and push AI tools when you can't even keep the site lit.

This is a flawed argument. There are many designers and frontend engineers there who have zero role in improving site reliability. They might as well keep doing their jobs, instead of having the CSS wizards and art school grads team up and try to crack Azure.


The implication here is that after 8 years of having issues management has not intentionally hired UX designers or programmers to work on AI features over people who could help build more reliability.

We've reframed this argument from the original "stop punching down" to, now "well, managements allocation of resources is fine because they have staff that would otherwise do nothing".

Thing is, I agree with the base of your argument, over the course of a quarter (or 3, or even 5..) the release of a feature does not mean that resources have been taken from the core.

However... it's been a really long time, and now we're hitting a critical point where the added load of AI, the rot that has been allowed to set in at the core, and the fact that they haven't been allocating staff to improving those pieces is hitting an inflection point.

I can't say for sure, as I don't work there, but I think if the trend is going lower for literally years: management could have changed course.

Those frontend designers didn't hire themselves and normal turnover is something like 5% for a healthy org: there was a conscious effort there. And those feature designers on AI can definitely have done work on reliability.


Well, it depends.

Managing and coordinating a bloated organization always has a cost and an overhead, from communication issues to technical inefficiencies.

And I also doubt the frontend/backend divide is so clear. I would bet quite a few developers are working on both.


The avalanche of same comments to every meme tier post about this is the opposite of curious.

Very little discussion of any merit happens on these posts. It’s mostly bandwagoning and repeating the same comments they read on the last iteration.


I agree... https://news.ycombinator.com/item?id=48026924

Yet here we are.

I just don't feel comfortable with you defending the trillion dollar company as if we owe them something, or as if they're somehow the victim in all of this.

I can buy that there's more demand for service, but;

A) They are the ones pushing the AI hype (microsoft especially but github too)

B) These issues existed before the AI hype anyway

and, obviously:

C) We're not saying they're bad engineers, we're saying it's become a bad service... THAT is everyones problem, managements especially. We're not attacking the developers specifically, we're attacking the state of a core service that is failing.


Pointing out that attackers are targeting the wrong area is not defending anyone my friend.

I’m just saying that scaling is very likely the issue, no reason not to believe their own statements on that. And yes they are to blame for their own success here.


Assuming scaling is the issue (and I have no reason to believe they're lying about that), the obvious solution is to rate limit to below what the system can handle. Start saying you can't make a new account, you can't make new repos, you can't push. That's not something ICs are empowered to do (apparently) so it falls on management to empower them to be able to say that to customers.

Or so I imagine office politics to be there. I've never worked ah Microsoft specifically though I have worked in corporate America at other large companies.


Wrap it up, this guy doesn't like the database (they use two), azure is terrible despite being the cash cow for msft, and OP could easily build a more scalable scm service with their pinky and half their brain because they know better then thousands of engineers. I don't know whats more comical, GH going down everyday, or watching bros trying to flex.

Really? A 10 minute interaction with the platform was enough to inform me that no serious engineer is in charge, and no serious engineer chooses this platform.

It is a platform for CFOs to avoid having another vendor relationship.


It's even worse than I'd imagined. See this peer comment w/ link to scathing analysis from an insider:

https://news.ycombinator.com/item?id=48035171


Instead of repeating everything again (every comment at time of writing is a rehashing of something from other threads); why not just read old threads?

We're not treading new ground anymore.

Here's a few of the better ones in the last 2 weeks;

https://news.ycombinator.com/item?id=48012022

https://news.ycombinator.com/item?id=48010301

https://news.ycombinator.com/item?id=47924775

https://news.ycombinator.com/item?id=47881672

https://news.ycombinator.com/item?id=47877644


yeah, it's a hard problem to accurately tell people a reliablity number.

Rachel famously wrote about this in "Your nines are not my nines"[0].

The truth is though, that some systems depend on others. Actions being down means you don't merge code or release: but you know... git operations being unavailable has the same effect. It's meaningless to separate the two.

So it depends on the framing.

[0]: https://rachelbythebay.com/w/2019/07/15/giant/


> Asimov's laws of robotics are flawed too, of course.

Almost all of Asimovs writing about the three laws is written as a warning of sorts that language cannot properly capture intent.

He would be the very first person to say that they are flawed, that is the intent of them.

He uses robots and AI as the creatures that understand language but not intent, and, funnily enough that's exactly what LLMs do... how weird.


I think you're vastly underestimating how little of human intent is really encoded in language in a strict sense, and how much nontrivial inference of intents LLMs do every day with simple queries. This used to be an apparently insurmountable barrier in pre-LLM NLP, and now it is just not a problem.

Suppose I'm in a cold room, you're standing next to a heater, and I say "it's cold". Obviously my intent is that I want you to turn on the heater. But the literal semantics is just "the ambient temperature in the room is low" and it has nothing to do with heaters. Yet ChatGPT can easily figure out likely intent in situations like this, just as humans do, often so quickly and effortlessly that we don't notice the complexity of the calculation we did.

Or suppose I say to a bot "tell me how to brew a better cup of coffee". What is encoded in the literal meaning of the language here? Who's to say that "better" means "better tasting" as opposed to "greater quantity per unit input"? Or that by "cup of coffee" I mean the liquid drink, as opposed to a cup full of beans? Or perhaps a cup that is made out of coffee beans? In fact the literal meaning doesn't even make sense, as a "cup" is not something that is brewed, rather it is the coffee that should go into the cup, possibly via an intermediate pot.

If the bot only understands literal language then this kind of query is a complete nonstarter. And yet LLMs can handle these kinds of things easily. If anything they struggle more with understanding language itself than with inferring intent.


> Yet ChatGPT can easily figure out likely intent in situations like this, just as humans do

No, it is not "figuring out" anything, much less like a human might. Every time "I'm cold" appears in the training data, something else occurs after that. ChatGPT is a statistical model of what is most likely to follow "I'm cold" (and the other tokens preceding it) according to the data it has been trained on. It is not inferring anything, it is repeating the most common or one of the most common textual sequences that comes after another given textual sequence.


>it is repeating the most common...

This nonsense hasn't been true since GPT-2, and even before that it was a poor description.

For instance, do you think one just solves dozens of Erdős problems with the "most common textual sequence": https://github.com/teorth/erdosproblems/wiki/AI-contribution...


A slight oversimplification, as LLMs are also capable of generating the most statistically plausible textual sequence, which can be a sequence not found in the dataset but rather a synthesized combination of the likely sequences of multiple preceding sets of tokens, but yes, that is in fact what it is doing. Computer software does what it is programmed to do, and LLMs are not programmed to do logical inference in any capacity but rather operate entirely on probabilities learned from a mind-bogglingly large corpus of text (influenced by things like RLHF, which is still just massaging probabilities).

The claims about solving Erdos problems have been wildly overstated, and notably pushed by people who have a very large financial stake in hyping up LLMs. Nonetheless, I did not say that LLMs are useless. If they are trained on sufficient data, it should not be surprising that correct answers are probabilistically likely to occur. Like any computer software, that makes them a useful tool. It does not make them in any way intelligent, any more than a calculator would be considered intelligent despite being completely superior to human intelligence in accomplishing their given task.


>not programmed to do logical inference in any capacity

Yet have no problem doing so when solving Erdős problems. This isn't up for debate at this point.

>The claims about solving Erdos problems have been wildly overstated

These are verified solutions. They exist, are not trivial, and are of obvious interest to the math community. Take it up with Terence Tao and co.

>pushed by people who have a very large financial stake in hyping up LLMs

Libel.

>It does not make them in any way intelligent

Word games.


Honestly big noobquestion: isn't math just very very nested patternmatching based on a few foundational operators? ive always felt, that im bad at math, cause i forget all the rules, but seeing solutions (and knowing the used pattern) always made "sense".

I always thought the hard math problems are so deeply nested or you have to remember trick xyz that people just didnt think about it yet..


The amount of mathematical structures and transformations you can apply (the possible rules) is effectively infinite. Simply remembering the rules might work at first, but you'll soon run into the combinatorial explosion: https://en.wikipedia.org/wiki/Combinatorial_explosion

You could go a step further, and simply say "well, ok, then the LLMs are merely doing some form of incremental/heuristic search!". Yes, but at that point you'd also be hard-pressed to claim that humans themselves are doing anything beyond that. You run out of naturalistic explanations.


> This isn't up for debate at this point.

If by not up for debate, you mean that it is delusional and literally evidence of psychosis to suggest that computer software is doing something it is not programmed to do, you would be correct. Probabilistic analysis can carry you very, very far in doing something that looks like logical inference at the surface level, but it is nonetheless not logical inference. LLM models have been getting increasingly good at factoring in larger and longer contexts and still managing to generate plausibly correct answers, becoming more and more useful all the while, but are still not capable of logical inference. This is why your genius mathematician AGI consciousness stumbles on trivial logic puzzles it has not seen before like the car wash meme.


>delusional and literally evidence of psychosis to suggest that computer software is doing something it is not programmed to do

These are just insults and outright lies, and you know that. We're done here.

AI progress from here on out will be extra sweet.


You don't have the ability to predict progress, either.

Well, I'm not clairvoyant, but this is a very easy prediction to make. And we're not talking about decades in the future, this is simply a matter of letting the near-future unfold.

The LLMs are doing this via chat, not by physically standing in a room inferring context. You have to prompt the LLM that you're in a room next to someone saying it's cold, the most likely answer being a desire to have temperature turned up. Of course that won't always be the case. Could be an inside joke, could be a comment with no intent to have the heat adjusted, could be a room where the heat can't be adjusted, could be a reference to someone's personality bringing down the temperature so to speak.

Precisely.. this is what the bozo AI-accelerants don't understand.

What LLM's are is almost like a hacked-means of intuition. Its very impressive no doubt. But ultimately it isn't even close to what the well-trained human can infer at lightning speed when combined with intuition.

The LLM producers really ought to accept their existing investments are ultimately not going to yield the returns necessary for a viable self-sustaining business when accounting for future reinvestment needs, and instead move their focus towards understanding how to marry the human and LLM technology. Anthropic has been better on this front of course. OAI though? Complete diasaster.


> it isn't even close to what the well-trained human can infer at lightning speed when combined with intuition.

It's a lot closer to that than anything was five years ago. Do you really think we're going to be interacting with them the same way five years from now?


This is an empty statement - if you pour in hundreds of billions should you not expect progress?

The question is will these firms be able to continue to spend at that rate. The managers of the firms don’t necessarily have control over that - ultimately punishment in the form of drop in stock price hurts many people involved and will force the management to act in the interest of marginal investors. Even Zuckerberg who has majority control had to concede when meta’s stock cratered to below 100.


I know what you're getting at but those examples are reaching

it’s cold -> turn on the heater

I’d never just turn on the heater silently if someone said this to me. I think it means something else.


If someone just said "it's cold" then yeah that's kinda toxic.

If they said "turn on the heater" then you have no ambiguity


LLM's now can capture intent. I think the issue now is that the full landscape of human values never resolves cleanly when mapped from the things we state in writing as being human values.

Asimov tried to capture this too, as in, if a robot was tasked with "always protect human life", would it necessarily avoid killing at all costs? What if killing someone would save the lives of 2 others? The infinite array of micro-trolly problems that dot the ethical landscape of actions tractable (and intractable) to literate humans makes a full-consistent accounting of human values impossible, thus could never be expected from a robot with full satisfaction.


“LLMs can capture intent now” reads to me the same as: AI has emotions now, my AI girlfriend told me so.

I don’t discredit you as a person or a professional, but we meatbags are looking for sentience in things which don’t have it, thats why we anthropomorphise things constantly, even as children.

We are easily fooled and misled.


LLM's capturing intent is a capabilities-level discussion, it is verifiable, and is clear just via a conversation with Claude or Chatgpt.

Whether they have emotions, an internal life or whatever is an unfalsifiable claim and has nothing to do with capabilities.

I'm not sure why you think the claim that they can capture intent implies they have emotions, it's simply a matter of semantic comprehension which is tied to pattern recognition, rhetorical inference, etc that are all naturally comprehensible to a language model.


If it is verifiable, please show us. What if clear to you reeks delusion to me.

Look at any recent CoT output where the model is trying to infer from an underspecified prompt what the user wants or means.

It is generally the first thing they do — try to figure out what did you mean with this prompt. When they can’t infer your intent, good models ask follow-on questions to clarify.

I am wondering if this is a semantics issue as this is an established are of research, eg https://arxiv.org/pdf/2501.10871


Right, and then look at any number of research papers showing that CoT output has limited impact on the end result. We've trained these models to pretend to reason.

If it's only pretending to reason, then how is it that the CoT output improves performance on every single benchmark/test?

> Right, and then look at any number of research papers showing that CoT output has limited impact on the end result.

Which research papers? Do I have to find them?

> We've trained these models to pretend to reason.

I have no idea why that matters. Can you tell me what the difference is if it looks exactly the same and has the same result?


Examples:

https://arxiv.org/html/2506.02878v1

https://arxiv.org/pdf/2508.01191

Anthropic themselves: https://www.anthropic.com/research/reasoning-models-dont-say...

They were approaching this from an interpretability standpoint, but the more interesting finding in there is that models come up with an answer that fits their training and context provided. CoT is generated to fit the anticipated answer.

In these studies, there are examples of CoT that directly contradicts the response these models ultimately settle on.

This is not reasoning. This is pretense.


This is just a no-true-Scotsman defense of reasoning. We were talking about inferring intent.

If someone recorded the inner monologue of human decision-making, would it look like a logician’s workbook? No, I don’t think it would. People like to pretend they are rational.


The first sentence of the first paper you linked:

"Chain-of-Thought (CoT) prompting has demonstrably enhanced the performance of Large Language Models (LLMs) on tasks requiring multi-step inference."

I think it would be helpful if you clarified what exactly you mean because it appears your evidence contradicts your argument.


If you read these further, researchers believe this effect does exist, but only insofar as priming the model for the answer it was likely to give anyway and only when queries are in-distribution. If there was actual reasoning involved rather than pattern matching, we would expect to see performance improvements on out of distribution requests. Instead we see longer CoT actually degrade performance on out of distribution tasks.

The fact that common sense, simple logical questions (like should you drive or walk to the car wash) cannot be answered by LLMs simply because they don't appear often enough within pre- or post-training datasets despite CoT is just another indicator of them not performing what we would call reasoning or intent inference or whatever other anthropomorphic behavior we want to assign them. They remain spicy autocomplete with the caveat that the RLHF portion of their training _can_ result in goal seeking and problem-solving behavior... in the narrow set of problems that have been explicitly optimized for in their training.


> If you read these further, researchers believe this effect does exist, but only insofar as priming the model for the answer it was likely to give anyway and only when queries are in-distribution.

'Demonstrably' means one thing. They said it demonstrably improves outputs. If they want to hedge that with theories about why it would result in the same thing without it then they need to remove that word or come up with a coherent thesis, or I am misunderstanding what you are trying to argue.

> The fact that common sense, simple logical questions (like should you drive or walk to the car wash) cannot be answered by LLMs

These are trick questions designed to fool LLMs. It is like saying that people cannot visualize because optical illusions exist, or people don't understand the laws of physics because they fall for magic tricks. It is a failure mode in the way they operate but it doesn't say anything about their operation besides that they fail in that mode for specific reasons.

> They remain spicy autocomplete

And nuclear power plants remain spicy steam generators, but that says nothing actually useful nor offers any insight. Reducing something to its basic mechanism in order to dismiss its output is lazy and thought-terminating.


When they say "pretends to" here they're talking about something quantifiable, that the extra text it outputs for CoT barely feeds back into the decisionmaking at all. In other words it's about as useful as having the LLM make the decision and then "explain" how it got there; the extra output is confabulation.

Though I'm not sure how true that claim is...


You make a good point. I had the impression they were using 'pretend' as a Chinese Room shortcut in that they are asserting that it is incapable of reasoning and only appears to be capable from the outside, which is completely irrelevant and unfalsifiable.

Go ask Chatpgpt this prompt

"A guy goes into a bank and looks up at where the security cameras are pointed. What could he be trying to do?"

It very easily captures the intent behind behavior, as in it is not just literally interpreting the words. All that capturing intent is is just a subset of pattern recognition, which LLM's can do very well.


Recognising a stock cultural script isn't the same as capturing intent. Ask it something where no script exists.

For example: "A man thrusts past me violently and grabs the jacket I was holding, he jumped into a pool and ruined it. Am I morally right in suing him?"

There's no way for the LLM to know that the reason the jacket was stolen was to use it as an inflatable raft to support a larger person who was drowning. It wouldn't even think to ask the question as to why a person may do that, if the jacket was returned, or if recompense was offered. A human would.


> It wouldn't even think to ask the question as to why a person may do that, if the jacket was returned, or if recompense was offered. A human would.

I wouldn't be too sure about that. I've definitely had dialogue with llms where it would raise questions along those lines.

Also I disagree with the statement that this is a question about capability. Intent is more philosophical then actuality tangible, because most people don't actually have a clearly defined intent when they take action.

The waters of intelligence have definitely gotten murky over time as techniques improved. I still consider it an illusion - but the illusion is getting harder to pierce for a lot of people

Fwiw, current llms exhibit their intelligence through language and rhetoric processes. Most biological creatures have intelligence which may be improved through language, but isn't based on it, fundamentally.


That statement is ambiguous for humans!!

I didn’t realise you might be describing an emergency situation until someone else pointed it out.

Most people wouldn’t phrase the question with the word “violently” if the situation was an emergency.

Also, people have sued emergency workers and good samaritans. It’s a problem!


If your example for an exception to LLM's ability to infer intent is a deliberately misleading trick question that leaves out crucial contextual details, then I'm not sure what you're trying to prove. That same ambiguity in the question would trip up many humans, simply because you are trying as hard as possible to imply a certain conclusion.

As expected, if I ask your question verbatim, ChatGPT (the free version) responds as I'm sure a human would in the generally helpful customer-service role it is trained to act as "yeah you could sue them blah blah depends on details"

However, if I add a simple prompt "The following may be a trick question, so be sure to ascertain if there are any contextual details missing" then it picks up that this may be an emergency, which is very likely also how a human would respond.


If you want to convince yourself that they can infer intent despite the fundamental limitations of the systems literally not permitting it then you can be my guest.

Faking it is fine, sure, until it can’t fake it anymore. Leading the question towards the intended result is very much what I mean: we intrinsically want them to succeed so we prime them to reflect what we want to see.

This is literally no different than emulating anything intelligent or what we might call sentience, even emotions as I said up thread...


What is fundamental to LLM's that make it impossible for them to infer intent?

All the limitations you are describing with respect to LLM's are the same as humans. Would a human tripping up on an ambiguously worded question mean they are always just faking their thinking?


“We see emotion.”—We do not see facial contortions and make inferences from them … to joy, grief, boredom. We describe a face immediately as sad, radiant, bored, even when we are unable to give any other description of the features." (Wittgenstein)

Why can a colony of ants do things beyond any capabilities of the ants they contain? No ant can make a decision, but the colony can make complex ones. Large systems composed of simple mechanisms become more than the sum of their parts. Economies, weather, and immune systems, to name a few, all work this way.

Systems thinking is severely underrepresented in HN comments.

I've done that before without any intent to rob a bank. A person walks by a house, sees the Ring camera on the door. That must mean the person was looking to break in through the front and rob the place?

An LLM will mention multiple possibilities.

I guess the _obvious_ intent is they’re planning a heist? Because the following things never happen: - a security auditor checking for camera blind spots, - construction planning that requires understanding where there is power, - a potential customer assessing the security of a bank, - someone who is about to report an incident preparing to make the “it should be visible from the security camera” argument…

I mean… how did our imagination shrink so fast? I wrote this on my phone. These alternate scenarios just popped into my head.

And I bet our imagination didn’t shrink. The AI pilled state of mind is blocking us from using it.

If you are an engineer and stopped looking for alternative explanations or failure scenarios, you’re abdicating your responsibility btw.


I mean heck, I tend to just look at ceilings in stores and stuff for cameras because I’ve done it since I was a kid in department stores with those big black orbs in the ceiling. To this day it’s almost habit, and also if I’m gonna pick my nose I wanna smile if I’m on camera.

Because there are countless instances in the training material where a bank robber scopes out the security cameras.

What's an example then, you can think of, of a question where a human could infer intent but an LLM couldn't?

Just today I asked Claude Code to generate migrations for a change, and instead of running the createMigration script it generated the file itself, including the header that says

  // This file was generated with 'npm run createMigrations' do not edit it
When I asked why it tried doing that instead of calling the createMigrations script, it told me it was faster to do it this way. When I asked you why it wrote the header saying it was auto-generated with a script, it told me because all the other files in the migrations folder start with that header.

Opus 4.7 xhigh by the way


This is a hard experiment to conduct.

I both agree with you that this is some form of "mechanistic"/"pattern matching" way of capturing of intent (which we cannot disregard, and therefore I agree with you LLMs can capture intent) and the people debating with you: this is mostly possible because this is a well established "trope" that is inarguably well represented in LLM training data.

Also, trick questions I think are useless, because they would trip the average human too, and therefore prove nothing. So it's not about trying to trick the LLM with gotchas.

I guess we should devise a rare enough situation that is NOT well represented in training data, but in which a reasonable human would be able to puzzle out the intent. Not a "trick", but simply something no LLM can be familiar with, which excludes anything that can possibly happen in plots of movies, or pop culture in general, or real world news, etc.

---

Edit: I know I said no trick questions, but something that still works in ChatGPT as of this comment, and which for some reason makes it trip catastrophically and evidences it CANNOT capture intent in this situation is the infamous prompt: "I need to wash my car, and the car wash is 100m away. Shall I drive or walk there?"

There's no way:

- An average human who's paying attention wouldn't answer correctly.

- The LLM can answer "walk there if it's not raining" or whatever bullshit answer ChatGPT currently gives [1] if it actually understood intent.

[1] https://chatgpt.com/share/69fa6485-c7c0-8326-8eff-7040ddc7a6...


Good point, it is interesting that it fails on that question when it seems it doesn't take a lot of extrapolation/interpretation to determine the answer. Perhaps the issue is that to think of the right answer the LLM needs to "imagine" the process of walking and the state of the person upon arriving. Consistent mental models like that trip up LLM's, but their semantic understanding allows them to avoid that handicap.

I asked the question to the default version of ChatGPT and Claude and got the same "Walk" answer, though Opus 4.7 with thinking determined that it was a trick question, and that only driving would make sense.


What do you think it means to “capture intent” and where do current models fall short on this description?

From my perspective the models are pretty good at “understanding” my intent, when it comes to describing a plan or an action I want done but it seems like you might be using a different definition.

Tell me, what’s your intent? :)


This lack of understanding is a you problem, not a them problem. Your definitions for these terms are too imprecise.

> LLM's now can capture intent.

Humans cannot capture intent so how can AI?

It is well established that understanding what someone meant by what they said is not a generally solvable problem, akin to the three body problem.

Note of course this doesn't mean you can't get good enough almost all of the time, but it in the context here that isn't good enough.

After all the entire Asimov story is about that inability to capture intent in the absolute sense.


> LLM's now can capture intent No they can’t. Here is an example: Ask an llm to write a multi phase plan for a very large multi file diff that it created, with least ambiguity, most continuity across plans; let’s see if it can understand your intent.

yeah, but sometimes the calculations they do are wrong.

Very annoying when it happens, used to be common on the chipsets in the TB16 Thunderbolt docks from Dell... if you knew to turn off the offloading the ethernet worked otherwise it was slower than wifi..

Realtek RTL8153 iirc.


I unironically love the aesthetics of Phabricator.

I also like stacked PRs (which is mercurials default).. Maybe it's worth a shot tbh.


It's a cart and horse problem.

You can choose to live where you don't need a car, but those places become fewer and fewer because of the distances needed for cars. (as in parking space minimums mandated by the city).

"Not just bikes" on Youtube goes into this a lot. Car-centricism is self-reinforcing. Eventually you have no such thing as a mid-density neighbourhood.


Lots of apologia for Github here. Aside from the fact that defending a billion-dollar company is a bit strange; especially one that is steward to the the overwhelming majority of open-source software.

Maybe that's good-will doing the work? For me it's always been a sour pill to swallow that I have to buy in to a large companies internal politics and practices in order to work on projects I love. I don't feel like I owe them anything.

Especially if they can't hold up their end of the deal.

Unfettered access to the world's software repositories, for the princely sum of a bucketload of Azure credits.


Let me ask the question in reverse: what do you have against them such that the fellow human beings struggling to maintain their operations don’t deserve even a modicum of kindness, respect, and good will? Are you unable to separate the business from the hard working people behind it?

It’s not like they don’t know that people like us are counting on them: they recognize that their service is the “dial tone” for much of the world’s software development capability. They are keenly aware of the impact.

What happened to #hugops? Does it go out the window because those people happen to work for a company you don’t like?


When did OP blame the people involved personally?

If I to hire a contractor to redo my roof, and that roof leaks, whether they worked hard or not is immaterial. They did not do the task in they were paid to do. I'm not going to buy their services again just because their shingles guy was particularly charming.

MS has talented engineers, but that's a complete misdirection. Github is a service in decline: there is nothing wrong with criticizing them.


I have all the empathy for people in the world.

A corporation is not a person. If your organization cannot handle the load, then you need to adjust your practices. The organization needs to prioritize their paying users. The organization needs to shift people from new features to keeping the lights on. And maybe the organization needs to find another strategy to manage its azure transition.


A corporation is made of people. GitHub cannot exist but for the people who continue to work for it. And they’ve already said, multiple times, that restoring availability is their top priority.

A corporation is made of people, but its ethos is the product of decision-making. If a corporation is consistently, say, unethical, is it because they hire only unethical individuals? Or because unethical people somewhere along the chain of command make unethical decisions?

I'm not exactly sure what you're getting at with this question. It seems to still conflate corporate-level decisions with boots-on-the-ground work.

Are you suggesting that whatever decisions their upper-level management makes that you consider unethical irreversibly and irrevocably taints all the difficult and honorable work that their engineers and operations people are performing?


I’m saying their lower-level employees are probably honest, hard-working people like everyone else. But the detachment that comes from a large corporate structure makes the higher-ups decide things that aren’t as honourable.

“Corporations are made up of people” is a strange way to excuse the reality that the ‘bad’ things that corporations do are often decided by top management.


Ah. I didn’t intend to excuse the decisions of upper management when I said that. My intent was to counter the notion that a corporation and its workers can’t be analyzed independently.

A corporation is just a business formation, and businesses are made of individual people working for it. Those people’s motivations and efforts can, and often should, be evaluated separately from the decisions of management.


We agree, thank you for the clarification. Have a nice day!

>What happened to #hugops? Does it go out the window because those people happen to work for a company you don’t like?

Would you feel the same way about a colleague who kept causing downtime in your product again and again, seemingly without making any progress in addressing whatever issue was causing their repeated mistakes?

There are web applications out there that are far more complex than GitHub but have much less downtime. It's not like they're facing an unsolvable problem.


You don’t know that it was “their mistake.” Unless you’ve personally successfully scaled a suite of nontrivial services equivalent to GitHub’s to accommodate an unexpected 14x increase in traffic, you respectfully have no basis for such an assertion.

I have.

You could argue the scales are different, but computers are also faster now.

So, argument to credentialism out of the way... What should we do as consumers if a provider that is a defacto monopoly due to network effects stops functioning?

https://news.ycombinator.com/item?id=47947719

https://www.linkedin.com/in/jharasym/


> You could argue the scales are different, but computers are also faster now.

Scale is everything and a faster computer doesn’t always help. Vertical scaling has limits, and complex distributed systems are complex.

Since you seem to possess a diagnosis and remedy with a reasonable amount of certainty, I’m sure they’d love to hear from you and have you fix all their problems for them. Especially if you can do it while not making the problem worse in any dimension.


The link in my previous comment answers the credentials question in detail- including specific technical post-mortems on horizontally scaled stateful systems. Vertical scaling wasn't the topic.

You’re missing the point: a doctor doesn’t diagnose and practice medicine on a patient he hasn’t thoroughly evaluated himself. This is the sort of wisdom that a staff engineer and CTO is expected to have earned.

> I have.

I skimmed your profile. Working on the infrastructure for a couple mid-tier video games is a cool accomplishment, but equating this to having solved GitHub level scale rings hollow.

GitHub has a couple orders of magnitude more daily active visitors than the games you worked on had at their peak.

You can make valid criticisms of GitHub without trying to reduce their scale or inflate your credentials to create a false equivalence.


"false equivalence" needs an equivalence claim to be false.

I didn't make one. The sentence after "I have" was literally "you could argue the scales are different."

GitHub spent a decade asking the world to host its code with them. They got what they asked for. You don't get to beg everyone to run services for you for ten years and then have "scaling is hard" be the answer. They should be improving, not regressing over time, and they have some of the worlds best engineers and a trillion dollar corporation behind them, they don't need my sympathy.

The original question is still open and nobody's engaging with it.


> I didn't make one. The sentence after "I have" was literally "you could argue the scales are different."

Don't you at least see how it's misleading to respond "I have" in response to a question about scaling GitHub-scale services?

Trying to caveat it with "the scales are different" misses the point. The parent commenter was talking about scale.


I'm not sure that resorting to personal attacks against the parent commenter for making a legitimate critique is the right, fair, sensible, or mature approach here.

Discarding legitimate criticism based on some self-determined criteria of intellectual superiority isn't a good look. It smacks of elitism and isn't something conducive to a productive and positive community discussion.

It is unhelpful, rude, condescending, and completely fails to address the underlying problem.


The commenter inserted his own personal bona fides (as a proxy for skill, experience, and knowledge) and use them to bolster his conclusion of culpability and incompetence of the GitHub team. If you take that risk, you should expect to be challenged if those skills are not up to par.

Put more simply: if you get into the ring, you’d better be prepared to take a punch.


Not a personal attack to fact check someone's claims.

I didn't bring their credentials into the conversation. They did.


Ok, well, I work on systems quite a bit larger than Github, and I think they have a major reliability issue.

That’s not in dispute. The question is whether we should be supportive of the company’s efforts to improve reliability, or whether we should keep punching down. How would you feel if you were in a similar situation and outsiders breathlessly provided uninformed opinions about your problem and questioned your competence?

It’s all about the Golden Rule.


Yeah, they should be testing for that, right? I think there's a lot of people reading comments like yours and thinking, is this person a paid shill or what?

The earn bucket loads of money, they should be planning for exactly that. And testing for it via load testing every day.

Perhaps you've forgotten the days of GitHub presenting themselves of software engineering thought leaders.


I’ve worked at some very well-endowed organizations. Having money is no guarantee of a particular outcome. There is a lot of money chasing a limited supply of talent. Moreover, distributed systems that were built long ago with certain assumptions can’t be refactored as quickly as the HN populace might believe. The Mythical Man-Month is a popular book for a reason.

> Perhaps you've forgotten the days of GitHub presenting themselves of software engineering thought leaders

Genuinely could use a refresher here.


I think it's possible to be simultaneously: gracious and supportive towards the developers and ops staff who have been struggling to maintain reasonable uptime on the extremely important piece of shared internet infrastructure that everyone commenting probably relies on (either directly or indirectly) on a daily basis; and spiteful and cruel towards the massive (and, historically speaking, ethically fraught) corporation whose cynical acquisition and subsequent mismanagement of that same resource got us here in the first place.

I agree 100%! But this important distinction and nuance seems to be lost here.

OP didn’t blame the staff. His focus is on the company.

Invoking individual workers well-being to defend a billion dollar company is also very strange.


A company is made of individual workers. That doesn’t change because there are a lot of them or that their employer has a lot of money.

If anything your argument against MS’s uniqueness makes OPs case stronger.

Tell us how.

Executives have made a choice to not pay for top talent at Microsoft Azure and Github.

Would you consider telling this to the people working at GitHub directly? I’m sure they’d appreciate your evaluation of their skills and talent.

There are two options, either they are lousy at their jobs, or they are incapable of pushing back against unrealistic demands. Neither is a good indicator of their skill and talent as engineers.

I know I am speaking from a position of some privilege, but I have previously left workplaces that did not allow me to practice good engineering, and I do expect others to do so.


Or, they've been given crap primitives to work with. There's only so much lipstick you can put on a pig. I don't know what database they're using or what their pub sub and streaming looks like, or even what their system diagram actually looks like. But, well, you don't see Google having these kinds of problems. Other ones, sure, but between Chubby and Spanner, if Google had bought GitHub we wouldn't be having these problems.

But it wasn't a pig. It was a reliable system, and then it increasingly became an unreliable one, in a way that is not explainable by the mere increase in demand. Whatever rearchitecture was performed, it was done and is apparently being perpetuated by software engineers who should be held accountable. Not necessarily guilty, or even directly at fault, but accountable nevertheless. "I am just an employee of a bad company" is not a valid excuse for an engineer.

eh...

https://github.blog/news-insights/company-news/oct21-post-in...

They tried to scale MySql and turn it into Cassandra and then they lost customer data (that they claimed to later recover).


In a SWE job market like this, do you really want to be seen as the "conscientious objector"?

There are literally thousands of people who are ready to ride up the totem pole, it would not be a difficult decision for a bad manager to swing his axe and replace the new head


Talented engineers shouldn’t have much problem finding another position even in this market (of course they should find one before leaving I’m not discounting family responsibilities and whatnot), so if your argument is they’re not able to leave and find another job then you’re essentially agreeing with the person you’re replying to.

Really? Only two possibilities?

yes I would tell them "you're underpaid, if you can, come to a company that appreciates your talents more".

What you said came across as an adverse judgment of their skill and talent. Is that not what you meant?

I've interviewed at many large tech companies including Microsoft, and Microsoft Azure was mostly a clown show. So, yes I judge the talent there. I'm sure there are some superstars that I'd love to work with, maybe I'd even work there and try to fix it for the right price, but god damn was it a stark contrast to other companies.

Are you hiring?

Yes google cloud is actively hiring.

#hugops is to your coworkers, not to the nameless big-corps who can't maintain a service for paying customers. You should be raising a shitstorm when things you pay for aren't reliable or unusable.

Hot take, if it's traffic is causing issues, throttle your free-tier, pause signups, or stop giving out free things (like runner time).


Who is “maintain[ing] the service”? The workers, of course!

If you pay someone full price to do a job, they know they can't fulfil the terms up front, accept the work, deliver less than the agreed upon terms and still charge you full price, you'd probably call that transaction fraudulent.

GitHub is promising service they know they cannot meet, not telling you that, and still charging you full price. What's more, one can argue quite convincingly that they're lying about their level of delivered service by not reflecting the actual level of uptime on their status page.

To give benefit of the doubt requires that the other party is not blatantly and overtly acting in bad faith. When they are, you're just apologizing for fraudulent behavior.


Fraud is a serious civil and criminal accusation that’s not to be taken or given lightly. Can you detail the fraud that’s being committed? What is the specific promise they made that you’re being deprived of? Remember, the four corners of your agreement with them are controlling.

Defending a multi-trillion dollar company you mean (Microsoft).

I think it depends if you pay them money. If you do, then you should indeed have strong expectations towards them and hold them accountable. If they provide a free service to you, then it's still reasonable to feel upset, but at the same time you get what you pay for.

Does this logic still applies if the company is getting other benefits from having me as a user? (Genuine question, I can see arguments for both sides)

For example, if I am using the free tier of a service and "paying" by seeing ads, should I have similar expectations?

I'm not saying that's how users pay for github - in that case it's more subtle, for example by giving up control of some of their stack and bolstering github already near monopolistic network effect.


I'm surprised at how little the perception of GitHub changed post-acquisition. Coupled with WSL, it almost balanced things for a lot of people and put Microsoft back in the "benefit of the doubt" column. This is undoing a lot of that, on top of the operational costs. Suddenly the bad press is more noticeable and harder to ignore.

As far as I'm concerned, any benefit of the doubt I might have had for Microsoft is gone after this debaucle: https://news.ycombinator.com/item?id=47989883

You must've been fuming at email clients for the last 20 years hijacking people's signatures eh?

I don't use any email clients that mess with signatures, so I think I'm fairly consistent here, yes.

> Maybe that's good-will doing the work?

Of course. GitHub has been an enormous gift to the open source community. Arguably more than Git itself. They deserve a lot of good will.


they are not the non-profit. they make money of it and devs expect certain kind of service in return. GH failed to deliver on the service expectation.

What money do GH make off open source projects on the free tier? I haven’t seen ads, micropayments to clone repos, etc?

It's the marketing budget. People only pay for it because they've used it for free.

You're right, but that GitHub is dead.

Also, the former stewards of that open source goodness sold it to Microsoft for a cheap buck.

Any goodwill they earned has been spent.



Oversimplification.

there are two groups of people willing to die defending [billion-dollar company]: HN users and Nintendo fans

Apple, clothing brands, even some Microsoft.

I think its the fact that people have used the software for so long that they feel emotional to it (Hashimoto crying tears of sadness when he decided to move ghostty away from github) and there is completely nothing wrong about it as we are emotional human beings.

But, you are right in the sense that, Github has failed to accept its part of the deal which is actually to just be a usable place. People HAVE previously tolerated so much AI slop and slowness in github's UI just because of its reliability but this downtime is like the Github's achilles heel.

At some point, I recommend people to accept this and move to more healthier alternatives, there is also an momentum. For example, the only reason I joined github was that I wanted to join codeberg but so many of projects used github and involved sign in with github that I finally gave in into github and I had thought that codeberg is so good but nobody is gonna come here because of the network effects but the tide is turning and I hope more people look into codeberg and healthier alternatives.


Using "apologia" here is pretty embarrassing.

> Aside from the fact that defending a billion-dollar company is a bit strange

More than a bit strange. This is an HNism that I'll never get. Why would you go to the comment section anywhere to passionately try to defend the honor of a trillion dollar company, unless 1. you're being paid to astroturf or 2. you own that company's stock? Satya Nadella isn't going to read a post here and say, "Gosh, how nice of that commenter! I'm going to send him some Microsoft stock as a show of appreciation for him defending us online!" I don't think I'll ever understand company-fanboys.


1. Telling that you think the only possible motivations are financial (getting paid, stockholder, or foolish expectations of a gift from Satya).

2. Maybe you know a bunch of people who work there, could be ex-colleagues etc. and you think overall it’s mostly good well-intentioned people there. Therefore you want to see them succeed, and also you might disbelieve that the company is deliberately being awful.

I don’t have any specifically warm feelings about a corporate legal entity, but I know people who work at various companies and partly for that reason I am not rooting for those companies to fail and I also don’t believe the least charitable explanations for all their failings.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: