Hacker Newsnew | past | comments | ask | show | jobs | submit | fho's commentslogin

"Back in the day" people were afraid that pupils would create CS (beta 6.5) maps of their schools. Gaussian Splatting would have been very convenient for that :-)

you would need a professional artist, several days and professional expensive equipment. It's not an easy task.

Wouldn't be the first time that having nothing better to do (being in school) and dedication lead to amazing results.

Yeah... We had those bulky TI Voyage 200 graphical calculators in school [1]. They could do everything the teachers could throw at us up to the point of having all but a few formulas build in.

I would say that definitely shaped me in a way where I rarely bother with the underlying details and tend to focus on how high level abstractions interact. [2]

[1] German "Mathe-LK", we could chose specializing in two things, for me it was math and computer science, the later being quite novel back in 2003. [2] I _do_ tend to specialize in things, but e.g. for LLMs or GLMMs, while I do have the capability to understand the technical details, I just don't bother.


I am always a bit baffled why Apple gets credited with this. Unified memory has been a thing for decades. I can still load the biggest models on my 10th gen Intel Core CPU and the integrated GPU can run inference.

The difference being that modern integrated GPU are just that much faster and can run inference at tolerable speeds.

(Plus NPUs being a thing now, but that also started much earlier. Thr 10th gen Intel Core architecture already had instructions to deal with "AI" workloads... just very preliminary)


That’s shared, not unified, it’s partitioned where cpu and gpu copies are managed by driver. Lunar lake (2024) is getting closer but still not as tightly integrated as apple and capped to 32GB only (Apple has up to 512GB). AMD ryzen ai max is closer to Apple but still 3 times slower memory.

Shared vs unified is merely a driver implementation detail. Regardless, in practice (IIUC) data is still going to be copied if you perform a transfer using a graphics API because the driver has no way of knowing what the host might do with the pointed-to memory after the transfer.

If you make use of host pointers and run on an iGPU no copy will take place.


My last serious GPU programming was with OpenCL. And if my memory does not fail me the API was quite specific about copying and/or sharing memory on a shared memory system.

I am pretty sure that my old 10th gen CPU/GPU combo has the ability to use the "unified"/zero-copy access mode for the GPU.


I don't think people are crediting Apple with inventing unified memory - I certainly did not. There have been similar systems for decades. What Apple did is popularize this with widely available hardware with GPUs that don't totally suck for inference in combination with RAM that has decent speed at an affordable price. You either had iGPUs which were slow (plus not exactly the fastest DDR memory) but at least sitting on the same die or you had fast dGPUs which had their own limited amount of VRAM. So the choice was between direct memory access but not powerfull or powerfull but strangled by having to go through the PCIE subsystem to access RAM.

The article is talking about one particular optimization that one can implement with Apple Silicon and I at least wasn't aware that it is now possible to do so from WebAssembly - so to completely dismiss it as if it had nothing to do with Apple Silicon is imho not fair.


Back in the 8 and 16 bit home computer days, or game consoles for that matter it was popular enough already.

And yes things like the Amiga Blitter, arcade or console graphics units were already baby GPUs.


Can't confirm. We had students at university (18-20-ish) that had not used a mouse prior to our courses. That was at least 3-4 years ago now and not a single case.


I think their "4-bit multiplier with a single transistor" bit is hinting at them using transistors in the sun-threshold regime.


So something that you can do with PDKs is add your own custom standard cell and tell the EDA tools to use them. This is actually pretty smart, this way you can use most of the foundry cells (which have been extensively validated) and focus on things like this "magic multiplier", that you will have to manually validate. This also makes porting across tech nodes easier if you manage only a handful of custom cells versus a completely custom design.

(I have my guesses as to what that is, but I admittedly don't know enough about that particular part of the field to give anything but a guess).


My "only" experience here is designing ASICs for Neuromorphic Chips. We used sub-threshold exclusively for linearity and energy reduction. No standard cells for us


Just wanted to say thanks one note time!

We have been running Ardour 9 for a while now during band rehearsals. Currently 12 channels that we record and monitor in realtime with some effects on top.


Then let me quickly say: thank you! I used that algorithm three times in different projects during my academic "career" :-)


You might be interested in RWKV: https://www.rwkv.com/

Not exactly "minimal viable", but a "what if RNNs where good for LLMs" case study.

-> insanely fast on CPUs


My personal idea revolves around "can I run it on a basic smartphone, with whatever the 'floor' for basic smartphones under lets say $300 is for memory (let's pretend RAM prices are normal).

Edit: The fact this runs on a Smartphone means it is highly relevant. My only thing is, how do we give such a model an "unlimited" context window, so it can digest as much as it needs. I know some models know multiple languages, I wouldnt be surprised if sticking to only English would reduce the model size / need for more hardware and make it even smaller / tighter.


Started a comment to write basically what you said. I've been commuting like that for five years. At the end I didn't bother trying anything productive anymore.

Losing 2-3h per day commuting is not something I am gone miss anytime soon.


Just nitpicking, but there is at least one ball next to his contraption in his video :-)

Doesn't make the whole thing less remarkable.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: