If you're looking for a good test suite, I wonder if you might be able to adapt any of the tests available in XMage? They have a pretty extensive test suite (such as for copy effects [0]) and if you point your agent at their code, I wonder how many could be usefully adapted to your system?
I've been doing similar experiments lately (using ViT's) to do card recognition, and so far it's been working really well for me. If you want to compare notes, I've open-sourced my code / weights [0] and written some blogs about how mine works [1]. I'd love to see if we can collaborate!
> Push the inference to the client-side (WebGPU / Web Workers).
I have an example of this working in webgpu / wasm here [2] along with a playground environment (demonstrated here [3]). I'm currently training a new version that uses a different ViT backbone more optimized for WASM inference -- it's currently converging, and I hope to have it finish training (or at least reach parity with the previous model) in about a week (took ~200 epochs for my last one to reach the level that it's at, and it takes about an hour per epoch in my current setup).
You mentioned WebGPU -- I've run into issues with the MobileViT-XXS backbone producing bad results in WebGPU on Android, so YMMV in whether or not WebGPU is stable enough to use for this or not. I don't know if it's my problem or a true bug in the platform, but I've fallen back to WASM and things are working much better since then.
I think there is something to be said for monetizing ones' hobbies, but I've recently been taking some forays into this world of "build something amazing and give it away for free" as well. I recently took a very big experimental plunge in this path, and I'm curious how well it will work out for me.
Open-source state-of-the-art Magic: The Gathering card identification pipeline:
I used to do this kind of image recognition for a living, but I've been out of the business for a little while now. I had some ideas for a different approach from what I've done in the past and decided to code it up. This version is far better than anything else I've ever done -- especially for scanning against busy backgrounds or with occlusions, and also for noticing fine differences between otherwise difficult-to-distinguish printings.
I didn't have any interested customers waiting for this, so -- much like the OP -- decided to create an experiment and release it open source. I'm not opposed to having paths to monetize it (for people who want to license it for closed-source commercial projects), but I'm not trying to commercialize it so much as I would love to see how far we can take it with open-source.
I don't know which path I should take with this.
The biggest downside is that I feel like I've had a hard time getting people to be as interested in this project as I would have expected -- I believe this truly is the best identification software available (I've built some benchmarks to test it [0]), and maybe the market is just a bit flooded for such things (?), but I suspect that one very strong problem is that if you don't charge for something, then there is a perceived lack of value.
Sometimes I wonder if I would have more interest in this project if I _weren't_ trying to give it away.
For me, that's been the most negative aspect about releasing this for free so far.
I don’t know how big the market is, but seems pretty commercial-friendly to this old magic player. I have a big box of cards from a few decades ago I’ve held onto. I’ve thought about selling them, but it seems i either take them to a shop and get lowballed, or spend hours meticulously researching each card and then figuring out how to sell it for what it’s worth. taking a pile of photos and having the ID and valuation automated could go a long way! Hard to sell to individuals like me, but i would think a card marketplace would find it invaluable?
> it seems i either take them to a shop and get lowballed, or spend hours meticulously researching each card and then figuring out how to sell it for what it’s worth.
No install -- scan your cards with your phone or desktop (downloads the weights in WASM -- runs 100% local -- the only web request it makes is to look up card names and prices online -- no image data ever leaves your machine), export the list as CSV, take your cards to your friendly local game store, and expect to receive 50-75% of TCG-low for your cards. This app currently only displays TCG Market, so probably about 50% of this price is what you could realistically expect.
> Hard to sell to individuals like me, but i would think a card marketplace would find it invaluable?
Yes -- and part of this might be that this would have been much more amazing several years ago, but by now -- most marketplaces (I used to do work for some of the big ones) have their own recognition tools. If they aren't actively looking to replace their current software, many companies would rather stick with what's currently working "good enough" than expend effort to migrate to something with only incremental benefit that is difficult to quantify. It's possible that would happen, but it's a tricky sales call to make.
I might just be imagining things, but I'm also picturing what one of those sales calls might look like, and it feels like I've opened the kimono a bit. The cat's out of the bag. There's no mystery or allure behind it anymore, and I feel like that puts me on the back foot somehow -- almost like I've played my strongest cards (hah!) first and have nothing left. By being open-source from the beginning (and talking freely about my architecture and what makes my solution different), there's very little sales-pitch build-up. Maybe it's just a part of the problem of how I'm presenting it, but I think people (especially the big houses) are probably just-as (or more) inclined to silently learn from me and improve their own scanners than try to use / build-upon what I've provided.
It's funny -- that angle is almost more about raising expectations and forcing the big houses to improve their own tech and catch up to open-source, more than getting anyone to adopt my solution in particular.
Am I okay with that? Absolutely -- I made that decision when I open-sourced it. I feel like the tech has been stagnating for several years, and I want to increase the quality of scanners across the board. I want to be the rising tide that lifts all boats.
That's one of the strongest arguments in favor of open-sourcing it (it would be very difficult for a closed-source product to have that same effect), and I remain hopeful for that long-term.
As a mtg player with an absurd amount of bulk, this is awesome! I think there is something to be said about the perceived lack of value, I appreciate greatly open source and even hold it to a higher value all things considered. Keep up the good fight :)
This is awesome. I’ve been interested in something like this for some time as I’ve been working on slowly indexing my mtg collection and selling cards I don’t want/need. Will be checking it out this weekend!
It's still super rough (doesn't support foil-toggling yet, still some issues with double-sided cards, crashing on some iPhones), but overall the rough structure is there -- it can create lists and export as CSV.
If you have feedback or feature requests for your needs, please leave them on Github and I'll get to them as soon as I can. I'd love to hear more user feedback!
It's not just about web search though -- there's another element too. I go to Grok to find things I have failed to find with web search.
I agree with GP -- if I want sourced commentary on current events, Grok is my go-to above the other models. For whatever reason, its search feels better and more up-to-date -- whereas the others feel more like filters of media, Grok feels more like filters of sources.
Microsoft releases a new open-weight model that tops the MTEB leaderboard for the largest model (27b), but also includes smaller models that are top of their respect "weight" classes (hah!) -- 0.6b (embedding size 1024) and 270m (embedding size 640).
All have best-in-class context length, and the numbers look very impressive. Very excited to see this release!
> "The agent doesn't need a real filesystem; it just needs the illusion of one. Our documentation was already indexed, chunked, and stored in a Chroma database to power our search, so we built ChromaFs: a virtual filesystem that intercepts UNIX commands and translates them into queries against that same database. Session creation dropped from ~46 seconds to ~100 milliseconds, and since ChromaFs reuses infrastructure we already pay for, the marginal per-conversation compute cost is zero."
Not to be "that guy" [0], but (especially for users who aren't already in ChromaDB) -- how would this be different for us from using a RAM disk?
> "ChromaFs is built on just-bash ... a TypeScript reimplementation of bash that supports grep, cat, ls, find, and cd. just-bash exposes a pluggable IFileSystem interface, so it handles all the parsing, piping, and flag logic while ChromaFs translates every underlying filesystem call into a Chroma query."
It sounds like the expected use-case is that agents would interact with the data via standard CLI tools (grep, cat, ls, find, etc), and there is nothing Chroma-specific in the final implementation (? Do I have that right?).
The author compares the speeds against the Chroma implementation vs. a physical HDD, but I wonder how the benchmark would compare against a Ramdisk with the same information / queries?
I'm very willing to believe that Chroma would still be faster / better for X/Y/Z reason, but I would be interested in seeing it compared, since for many people who already have their data in a hierarchical tree view, I bet there could be some massive speedups by mounting the memory directories in RAM instead of HDD.
We would also be super interested to see that comparison. I agree that there isn't a specific reason why Chroma would be required to build something like this.
If you're looking for a good test suite, I wonder if you might be able to adapt any of the tests available in XMage? They have a pretty extensive test suite (such as for copy effects [0]) and if you point your agent at their code, I wonder how many could be usefully adapted to your system?
[0] - https://github.com/magefree/mage/tree/master/Mage.Tests/src/...
reply