This is awesome! Using a CLIP or Dino v2 model to produce image embeddings would probably improve the similarity search a lot - kind of similar to http://same.energy/ .
This is very useful! I think the interfaces around models being used in an async manner will look very different than the synchronous chat UIs we are today. Claude Code is the first real “agent” that is providing true economic value, and there’s so much low hanging fruit in making the interfaces far better.
I have a few private “vibe check” questions and the 4 bit QAT 27B model got them all correctly. I’m kind of shocked at the information density locked in just 13 GB of weights. If anyone at Deepmind is reading this — Gemma 3 27B is the single most impressive open source model I have ever used. Well done!
Wow! I'd love to read a more in-depth blog post describing how to create one of these myself, and maybe even contribute my own splats to a collaborative library for iconic landmarks. I could see interactive splats being added to Wikipedia for popular locations.
https://reddit.com/r/GaussianSplatting/ has been slowly talking about the subject for a while now. There are probably several articles and vids in the search bar.
This is great! I wonder how hard it would be to use a pen plotter instead of a thermal printer. You could even use a procedurally generated handwriting font and Claude to make it feel like a handwritten letter.
Agreed. I keep my iPhone in my right pocket and my Ricoh GRIIIx in my left pocket whenever I go out. It’s such a fantastic camera given its size, especially with the APSC sensor in such a compact body.
I also have a Sony A7C but I haven’t used it since getting the Ricoh. Being pocketable is a massive factor in how much I use a camera.
This is awesome! Could you share more details on how you’re storing the image embeddings and performing the KNN search? Is there an on-device vector database for iOS?
Thanks for your attention. I did not use any database, but stored the embedding calculations as Object files. When the user opens the app, the program preload them and turn them into `MLMultiArray`. when the user searches, these `MLMultiArray` are traversed and the similarity is calculated separately.
reply