My personal experience with our legal department on naming is that if your product name includes someone else's trademark, you have to say "Our Thing for Their Thing", exactly like that. I was involved in a product that did this, and we came up with some better names, but legal said no, it must be named with "for Their Thing" at the end. Those were the magic words so we don't get sued, and indeed, we weren't sued. Our legal was non-technical and never heard of WSL; they came to this conclusion independently.
The name we shipped was even worse than Windows Subsystem for Linux, honestly. At least Microsoft spent some time on it.
I can't answer for NVIDIA but AWS has its own training and inference chips, and word on the street is the inference chips are too weak, so some companies are running inference on the training chips.
They stopped producing Inferentia altogether and are only investing in Trainium now. They also announced a partnership with Cerebras not long ago. That should give you a clue.
Technically correct by some estimation, perhaps, but Cygwin is a crazy approach, was slow (contrary to the implication of the "low cruft" claim), was not as compatible as these other approaches, required recompilation, and was widely disliked at most points in its life. There's a lot of crazy voodoo stuff happening in cygwin1.dll to make this work; it totally qualifies as "hacking in some foreign Linux plumbing", it's just happening inside your process. Just picture how fork() is implemented inside cygwin1.dll without any system support.
Cygwin doesn't work at all in Windows AppContainer package isolation; too many voodoo hacks. MSYS2 uses it to this day, and as a result you can't run any MSYS2 binaries in an AppContainer. Had to take a completely different route for Claude Code sandboxing because of this: Claude Code wants Git for Windows, and Git for Windows distributes MSYS2-built binaries of bash.exe and friends. Truly native Windows builds don't do all the unusual compatibility hacks that cygwin1.dll requires; I found non-MSYS2-built binaries of the same programs all ran fine in AppContainer.
For AI inference you don't need to geographically distribute your data centers. Latency, throughput, and routes don't matter here. When it's 10 seconds for the first token and then a 1KB/sec streamed response, whatever is fine. You can serve Australia from the US and it'll barely matter. You can find a spot far outside populated areas with cheap power, available water, and friendly leadership, then put all of your data centers there. If you're worried about major disasters, you can pick a second city. You definitely don't need a data center in every continent.
You're not wrong about the rest but no AI company would ever build a data center in every continent for this, even if they were prepared to build data centers. AI inference isn't like general purpose hosting.
>Latency, throughput, and routes don't matter here. When it's 10 seconds for the first token and then a 1KB/sec streamed response, whatever is fine. You can serve Australia from the US and it'll barely matter.
This may be true for simpler cases where you just stream responses from a single LLM in some kind of no-brain chatbot. If the pipeline is a bit more complex (multiple calls to different models, not only LLMs but also embedding models, rerankers, agentic stuff, etc.), latencies quickly add up. It also depends on the UI/UX expectations.
Funny reading this, because the feature I developed can't go live for a few months in regions where we have to use Amazon Bedrock (for legal reasons), simply because Bedrock has very poor latency and stakeholders aren't satisfied with the final speed (users aren't expected to wait 10-15 seconds in that part of the UI, it would be awkward). And a single roundtrip to AWS Ireland from Asia is already like at least 300ms (multiply by several calls in a pipeline and it adds up to seconds, just for the roundtrips), so having one region only is not an option.
Funny though, in one region we ended up buying our own GPUs and running the models ourselves. Response times there are about 3x faster for the same models than on Bedrock on average (and Bedrock often hangs for 20+ seconds for no reason, despite all the tricks like cross-region inference and premium tiers AWS managers recommended). For me, it's been easier and less stressful to run LLMs/embedders/rerankers myself than to fight cloud providers' latencies :)
>then put all of your data centers there
>You definitely don't need a data center in every continent.
Not always possible due to legal reasons. Many jurisdictions already have (or plan to have) strict data processing laws. Also many B2B clients (and government clients too), require all data processing to stay in the country, or at least the region (like EU), or we simply lose the deals. So, for example, we're already required to use data centers in at least 4 continents, just 2 more continents to go (if you don't count Antarctica :)
Sounds like you're betting that the performance users experience today will be the same as the performance they'll expect tomorrow. I wouldn't take that bet.
You mean that if you were Anthropic, you'd build the data centers on every continent? Can you explain your reasoning?
We're talking about billions of dollars of extra capex if you take the "let's build them everywhere" side of the bet instead of "let's build them in the cheapest possible place" side. It seems to me that you'd have to be really sure that you need the data center to be somewhere uneconomical. I think if you did build them in the cheap place, it's a safe bet that you'll always have at least enough latency-insensitive workloads to fill it up. I doubt that we would transition entirely to latency-sensitive workloads in the future, and that's what would have to happen for my side of the bet to go wrong. The other side goes wrong if we don't see a dramatic uptick in latency-sensitive inference workloads. As another comment pointed out, voice agents are the one genuinely latency-sensitive cloud inference workload we have right now; they do need low latency for it. Such workloads exist, but it's a slim percentage so far.
I believe I'm taking the safe bet that lets Anthropic make hay while the sun shines without risking a major misstep. Nothing stops them from using their own data centers for cheap slow "base load" while still using cloud partners for less common specialized needs. I just can't see why they would build the international data centers to reduce cloud partner costs on latency-sensitive workloads before those workloads actually show up in significant numbers.
They want it, sure. Customers want everything if it's free, but this is about what they value with their money. In this thought experiment, you're Anthropic, not the customer. You're making a choice that's best for Anthropic. Will Anthropic lose customers because the latency is higher? No way. Customers want low cost and lots of usage more than they want low latency. In a cutthroat race to the bottom, there's no room to "give away" massively expensive freebies like a data center near every population center when the customer doesn't value those extras with actual money. It's the same reason we all tolerate the relatively slow batched token generation rate--the batching dramatically lowers the cost, and we need low cost inference more than we want fast generation. If the cost goes up we'll actually leave, for real.
After the initial announcement of "fast mode" in Claude Code, did you ever hear about anyone using it for real? I didn't. Vanishingly few people are willing to pay extra for faster inference.
Remember that the time-to-first-token is dominated by the time to process the prompt. It's orders of magnitude more latency than the network route is adding. An extra 200 milliseconds of network delay on a 5-10 second time-to-first-token is not even noticeable; it's within the normal TTFT jitter. It would be foolish to spend billions of dollars to drop data centers around the world to reduce the 200 milliseconds when it's not going to reduce the 5-10 seconds. Skip the exotic locales and put your data centers in Cheap Power Tax Haven County, USA. Perhaps run the numbers and see if Free Cooling City, Sweden is cheaper.
They’re unwilling to pay for fast mode because of the current step function price increase once you hit your quota. It’s a psychological effect. Because most shops I know in the US currently paying $125/mo per seat for Claude would happily - HAPPILY - pay 2x, and begrudgingly pay 10x that amount for the same service. If fast mode was priced 25% or 50% more they’d happily pay for that too. But it’s just not priced that way currently with weird growth subsidization & psychology.
The only AI use case that cares about latency is interactive voice agents, where you ideally want <200ms response time, and 100ms of network latency kills that. For coding and batch job agents anything under 1s isn't going to matter to the user.
tbh, that's a good point about the voice agents that I hadn't considered. I guess there are some latency-sensitive inference workloads. Thanks for pointing that out.
A customer service chatbot can require more than one LLM call per response to the point that latency anywhere in the system starts to show up as a degraded end-user experience.
Easy solution - use hyperscalers with super expensive API charge only when latency really matters. Otherwise build your own DC. Easy to expect customers don't care latency that much over money.
I'm curious, do you know which virtual machines (i.e. what emulator and what OS) you would want? Does the software exist and it's just a matter of the time to set it up? Or is it harder to get ahold of all the necessary old software (even if you have the emulator)?
Maybe in the modern age someone could make a "polarhome in a box" that offers a similar gamut of systems, but via preconfigured emulators that you can simply download and run.
On Polarhome, I used QNX, SunOS/Solaris, HP-UX, AIX and OSX. Having those running under qemu would be quite the challenge.
Until now, I have used qemu (or rather qemu-system-aarch64 in combination with binfmt-misc) on Linux to emulate e.g a Raspberry pi running on arm64. This works very well, but for e.g. Solaris or HP-UX there is the extra hurdle of getting hold of bootable media that will not freak out in the unfamiliar surroundings of a qemu virtual machine.
I have never tried, and it is possible that I overestimate the difficulty...
Emulators can take you quite far, though you need to research some of them on the net to figure out working combinations of OS versions and emulator versions. Here are examples of things that I have managed to get to work at some point in time. Some for real software development and some for amusement.
KVM (x86 and x86_64): Linux, BSD, OSX, Hurd, Haiku, MSDOS, Minix, QNX, RTEMS, Xenix, Solaris, UnixWare, Windows 95 through 11.
QEMU (for non-x86): AIX 4, Linux (m68k, arm, sparc, powerpc, mips, riscv), OSX (ppc), Solaris 8 (sparc), SunOS 4.1.4 (sparc), Windows NT 4 (mips)
SIMH (for old DEC computers): NetBSD, VMS, Ultrix, RSX-11M, RT-11
Some of them can be quite finicky to get to work. Xenix was especially hard.
Solaris 11 is quite easy to get running in QEMU/KVM though. You can download the media from Oracle.
The only real hardware I routinely run has either Debian Linux, macOS, or Raspberry Pi.
When some component in OP's dedicated server fails, they will find out what that extra DO money was going toward. The DO droplet will live migrate to a healthy server. OP gets to take an extended outage while they file a Hetzner service ticket and wait for a human to perform the hardware replacement. Do some online research and see how long this often takes. I don't believe this Hetzner dedicated server model even has redundant PSUs.
Anyone who thinks DO and Hetzner dedicated servers are fungible products is making a mistake. These aren't the same service at all. There are savings to be had but this isn't a direct "unplug DO, plug in Hetzner" situation.
Hetzner also offers a VPS with superior specs to their old DO server for €374.99/month, or €0.6009/hour. They could just switch to a VPS temporarily while waiting for the hardware fix.
Although since they were running a LEMP server stack manually and did their migration by copying all files in /var/www/html via rsync and ad-hoc python scripts, even a DO droplet doesn't have the best guarantee. Their lowest-hanging fruit is probably switching to infrastructure as code, and dividing their stack across multiple cheaper servers instead of having a central point of failure for 34 applications.
I like it. Genuinely, I think APL only reuses glyphs for dramatically different monadic vs. dyadic behavior because there were limited positions available on a Selectric type ball. Many glyphs are reused as-is for multiple meanings, and they had to build some glyphs by overstriking a second glyph on top of an existing one. None of this is a concern these days.
That said, some of the reuses do make sense. ⍴ as monadic shape and dyadic reshape makes perfect sense. In FIXAPL, shape is △ and reshape is ⍴; the symbols have nothing to do with each other. I think that particular separation is a loss rather than a gain.
My memory from the old days is you can use Win32 hooks to modify the MessageBox. HCBT_CREATEWND gets you the HWND of the MessageBox, and you can subclass it (in the Win32 sense) to insert your own WndProc. Then you're off to the races--it's your dialog now.
But it wasn't just one window, there were lots of controls on that window and the relationship between them wasn't as obvious as you'd assume. Trust me, recreating the whole thing from scratch was easier.
The fun part was making a C++ class that could build up an in-memory dialog template. You had to do it that way because it was dynamically sized based on the message you displayed and the buttons you needed. If you used the default colors, you might be able to tell they were different if you squinted but you wouldn't know which was mine and which was Microsoft's.
I've done this for real, in commercial code that shipped. No trust needed; I have personal experience. For typical minor MessageBox additions, this can be easier than rebuilding the whole dialog yourself. Sometimes, we just wanted to add a "Don't ask again" checkbox which didn't require touching the existing child windows at all. I also used this technique to simply change the labels on the buttons to custom text. I had a MessageBox wrapper that accepted a list of button strings instead of a predefined constant. We've all built various custom message boxes, of course, but not every situation required that level of effort.
These days you can just use a TaskDialog, of course, and it's way more flexible than MessageBox. But it's fun to remember the old techniques.
I understood it as no instructions on what to do, but still a promt with information. I don't know if the title is technically correct, but for me it was simple to understand the meaning.
The name we shipped was even worse than Windows Subsystem for Linux, honestly. At least Microsoft spent some time on it.
reply