More

iwd · 2026-04-15T11:45:46 1776253546

I just got to see a different species of kleptoplastic sea slugs in the wild last month, on a kayak tour of the mangroves around Key West. Our guide scooped some lettuce sea slugs up in a plastic container (and then returned them safely). They were bigger, about 3 inches long, with a wavy/frilly green border. It made my biologist heart very happy!

throwup238 · 2026-04-15T13:06:36 1776258396

That was likely a sea slug from the Nudibranchia order (they resemble lettuce sea slugs sometimes) which are a bit different from Sacoglassa order slugs like the one in TFA in that they carry symbiotic algae colonies, rather than digesting them and keeping the chloroplasts like Sacoglassa.

iwd · on Jan 4, 2025

Meh. My 16 year old writes shaders for her own games, writes for her VR headset, programs her graphing calculator, etc etc. The next generation will have plenty of self-taught SWEs, just like ours. Most kids couldn’t use computers in the 80s or now, and it wasn’t a disaster either time.

feznyng · on Jan 4, 2025

The issue is it probably won’t be your daughter who creates and passes the legislation that dictates how she and the rest of us get to use our computers (net neutrality, encryption, etc). In a democracy, the majority of the population needs to be computer literate enough to vote for candidates who will support these things.

iwd · on Dec 7, 2024

I’ve been trialing a bunch of these models at work. They basically learn where the DNA has important functions, and what those functions are. Its very approximate, but up to now that’s been very hard to do from just the sequence and no other data.

throwawaymaths · on Dec 7, 2024

> Its very approximate, but up to now that’s been very hard to do from just the sequence and no other data.

the synthetic syn 1.0 project used a promoter search algorithm written in cobol by one of the leaders. one of the professors on the project had a wordperfect macro that found protein sequences, point being they weren't the best programmers in the world. i would hardly say its been "very hard"

dekhn · on Dec 8, 2024

It depends on what sort of model you're implementing.

There's a big implementation (and result quality) difference from direct string searching (fixed pattern matching) and probabilistic methods (everything from simple profile methods to hidden markov models). Finding direct matches is the same as the "string.find()" method, while probabilistic methods usually involve dynamic programming, heuristic approximations, floating point matrices, etc.

But more importantly, techniques like Nucleotide Transformers are much less supervised than existing search techniques. Previously people had to do a fair amount of labelling and QC work to identify patterns that underlying general sequence categories, these methods spontaneously learn them from the data. I could imagine building an entire transformer model in COBOL although it would be cumbersome; building one with a wordperfect macro would be extremely challenging if not impossible. Even a profile-based method would be painful (I don't know if WP macros are turing complete/general purpose programming).

I don't think it's particularly fair or nice to imply that the work being done here is the same sort of work that was being done with a promoter search algorithm; I'm an expert in this area and you're being unnecessarily dismissive. The field has come a long way.

mbreese · on Dec 7, 2024

> from just the sequence and no other data

This is my real question with these... we already have a ton of other data for genomics. So, many of the important regions are already known and studied. And really, the functional importance of any given region/sequence is highly context/cell type specific. So, given this, what are the use cases? What kind of hypothesis generation can these models lead to that we aren't currently doing in genomics?

dekhn · on Dec 8, 2024

The whole idea of unsupervised learning is to find patterns in the data that people wouldn't have easily found by manually looking for categories/labels. So far most of the categories we've identified and manually clustered (to build statistical models that find more of them) have taken extensive discovery biology and curation efforts.

bilsbie · on Dec 7, 2024

That’s really cool. Can you share any insights the models have given you? My biggest point of confusion is what type of practical things these models can do.

(Or Email in profile if you can’t share publicly)

iwd · on Nov 6, 2023

Do you have compression enabled? At least from Pandas, Parquet defaults to compressed and Arrow/Feather default to uncompressed. When I enable zstd compression, I get similar file sizes, and sometimes Arrow is smaller.

slt2021 · on Nov 6, 2023

I was just trying pandas native .to_parquet and .to_arrow() without any extra config knobs

iwd · on Dec 8, 2021

Not an expert, but I believe many papers on other video games make a single decision for the next X frames at once, possibly including a delay factor that governs exactly when to act. I think OpenAI’s Dota2 agent does this.

fxtentacle · on Dec 8, 2021

I have experimented with that, too, but in my case it also multiplies the number of potential actions. If I have 7 actions per timestep, grouping them into 3-timestep blocks means I now have 777 = 343 possibilities to choose from.

From what I understand, the OpenAI Dota 2 AI has a long-term strategy module which was mostly trained by imitating 60,000+ replays played by human professional teams. My problem with doing that for the Borderland competition is that I don't have any data source for replays of someone playing the game really well. You control 3 units simultaneously and it's 2 teams against each other, so I'd need 6 dedicated volunteers playing the game for many hours to create a reasonably-sized corpus of human replays. And who says that those people are good at it?

iwd · on Dec 8, 2021

If you’re doing it for fun, one option is to start with a simplified version of the game. It’s faster to implement and faster to run. And you’ll get insights you can apply to the full game.

That’s what I did when I applied RL to Dominion, because the complexity of the game depends heavily on the cards you include! See part 3 of https://ianwdavis.com/dominion.html

iwd · on Oct 12, 2021

If the hours you spend implementing and maintaining your DevOps exceed the hours of downtime you prevent, you're probably not making good use of your time. The less you build, the less you have to maintain.

I've been running a Python-based, highly custom web store solo since 2007, and it supports multiple people doing fulfillment. I host on Opalstack so as to outsourcing patching, email config, database maintainence, etc. I run directly in a Git repo (actually Hg, it's that old) and releases are "git pull && ./restart.sh". Rollbacks are "git checkout ...".

I've had to migrate/rebuild the VM about every 5 years. Tech changes enough in that time that no automation will still work unmodified. So I just keep good notes about what I did last time, and figure out what the new equivalents are when I finally have to do it again (updating the notes, of course). Database and Conda are easy to port. It's usually DNS and email integrations that are a pain.

As others have said, KISS is key. Industry DevOps is for a work setting with a decent-sized team, where you can afford the overhead of maintaining it all in order to make the overall team more efficient.

iwd · on Feb 4, 2021

This seems like a “problem” that shouldn’t be solved. If the thing keeping you from algorithmic trading is difficulty in deploying, you probably have no business doing it.

Cynically, this seems like a way to draw more naive novices into the market. As the old saying goes, if you don’t know who the mark is, it’s you.

iwd · on Nov 15, 2020

Agriculture PhD plus even modest data science skills is hugely in demand at the giant ag-biotech company where I work. Many of the data engineers I lead are former science PhDs. Literally all of our people strategy discussions in R&D are about how we need more people like you. There are only a handful of big science companies left in ag, but I bet the other ones have similar needs.

iwd · on July 7, 2020

I've written a lot about this as a resource for my colleagues at work, perhaps you would find it useful as well. There's a list of recommended books embedded in there, but a lot of other info as well: https://ianwdavis.com/advice-new-lead.html