More

z3ugma · 2026-04-15T19:35:11 1776281711

Love it! Do you have a BAA with Claude though? Otherwise, your demo is likely exposing PHI to 3rd parties and exposing you to risk related to HIPAA

muchael · 2026-04-15T19:39:46 1776281986

It's a good callout. We have a BAA + ZDR with Anthropic and OpenAI, and if you want to use libretto for healthcare use cases having a BAA is essential. Was using Codex in the demo, and we've seen that both Claude and Codex work pretty well

tanishqkanc · 2026-04-16T03:22:59 1776309779

just adding to michael's reply - we took care to make sure no PHI was exposed in our demo video as well.

z3ugma · 2026-04-15T13:42:16 1776260536

At some point, don't you just end up making a low-quality, poorly-tested reinvention of SQLite by doing this and adding features?

freedomben · 2026-04-15T13:49:48 1776260988

Sometimes yes, I've seen it. It even tends to happen on NoSQL databases as well. Three times I've seen apps start on top of Dynamo DB, and then end up re-implementing relational databases at the application level anyway. Starting with postgres would have been the right answer for all three of those. Initial dev went faster, but tech debt and complexity quickly started soaking up all those gains and left a hard-to-maintain mess.

leafarlua · 2026-04-15T14:21:11 1776262871

This always confuses me because we have decades of SQL and all its issues as well. Hundreds of experienced devs talking about all the issues in SQL and the quirks of queries when your data is not trivial.

One would think that for a startup of sorts, where things changes fast and are unpredictable, NoSQL is the correct answer. And when things are stable and the shape of entities are known, going for SQL becomes a natural path.

There is also cases for having both, and there is cases for graph-oriented databases or even columnar-oriented ones such as duckdb.

Seems to me, with my very limited experience of course, everything leads to same boring fundamental issue: Rarely the issue lays on infrastructure, and is mostly bad design decisions and poor domain knowledge. Realistic, how many times the bottleneck is indeed the type of database versus the quality of the code and the imlementation of the system design?

marcosdumay · 2026-04-15T16:20:28 1776270028

No, when things change fast and unpredictably, NoSQL is worse than when they are well-known and stable.

NoSQL gains you no speed at all in redesigning your system. Instead, you trade a few hard to do tasks in data migration into an unsurmountable mess of data inconsistency bugs that you'll never actually get into the end of.

> is mostly bad design decisions and poor domain knowledge

Yes, using NoSQL to avoid data migrations is a bad design decision. Usually created by poor general knowledge.

james_marks · 2026-04-15T16:44:17 1776271457

If the argument for NoSQL is, “we don’t know what our schema is going to be”, stop.

Stop and go ask more questions until you have a better understanding of the problem.

jampekka · 2026-04-15T18:23:23 1776277403

Oftentimes better understanding of the problem needs trying out solutions. Armchair architectures tend to blow up in contact with reality.

freedomben · 2026-04-15T18:39:21 1776278361

For sure, though with databases it's usually pretty clear even at the start whether your "objects" will be relational in nature. I can't think of a single time that hasn't been the case, over hundreds of apps/services I've been part of. Things like asynchronous jobs, message queues, even object storage, I fully agree though.

ranger_danger · 2026-04-15T21:01:40 1776286900

Even a JSON column would be better in most cases IMO, and on Postgres you can also make indexes on json keys.

leafarlua · 2026-04-15T17:42:28 1776274948

Makes sense. But in this case, why NoSQL exists? What problems does it resolves and when should it be considered? I'm being naive, but fast changing environment has been one of the main advantages that I was taught from devs when it comes to NoSQL vs SQL (nosql being the choice for flexible schemas). So it is more about BASE vs ACID?

marcosdumay · 2026-04-15T17:53:00 1776275580

NoSQL was created to deal with scales where ACID becomes a bottleneck. It also shown itself useful for dealing with data that don't actually have an schema.

If you have either of those problems, you will know it very clearly.

Also, ironically, Postgres became one of the most scalable NoSQL bases out there, and one of the most flexible to use unstructured data too.

freedomben · 2026-04-15T18:41:48 1776278508

Agreed. In my experience (YMMV), there was also a real adoption push in the js world from primarily front-end people that wanted to do some backend but didn't want to learn/deal with SQL databases. I don't say that with malice, I was also on-board the NoSQL train for a bit before I actually gained experience with the headaches it caused. The appeal of "just dump your JSON blob straight in" was (and still is) strong. Software is all about learning, and sometimes that learning is expensive. We've all built something we later regretted.

yfontana · 2026-04-15T22:01:11 1776290471

As a data architect I dislike the term NoSQL and often recommend that my coworkers not use it in technical discussions, as it is too vague. Document, key-value and graph DBs are usually considered NoSQL, but they have fairly different use cases (and I'd argue that search DBs like Elastic / OpenSearch are in their own category as well).

To me write scaling is the main current advantage of KV and document DBs. They can generally do schema evolution fairly easily, but nowadays so can many SQL DBs, with semi-structured column types. Also, you need to keep in mind that KV and document DBs are (mostly) non-relational. The more relational your data, the less likely you are to actually benefit from using those DBs over a relational, SQL DB.

gf000 · 2026-04-15T19:37:39 1776281859

Probably the best use case would be something like a Facebook profile page for a given user.

It may not have a very rigid schema, you may later add several other optional fields.

You need very large scale (as in no of concurrent accesses), you want to shard the data by e.g. location. But also, the data is not "critical", your highschool not being visible temporarily for certain users is not an issue.

You mostly use the whole dataset "at the same time", you don't do a lot of WHERE, JOIN on some nested value.

In every other case I would rather reach for postgres with a JSONB column.

tracker1 · 2026-04-15T16:29:12 1776270552

I think part of it is the scale in terms of the past decade and a half... The hardware and vertical scale you could get in 2010 is dramatically different than today.

A lot of the bespoke no-sql data stores really started to come to the forefront around 2010 or so. At that time, having 8 cpu cores and 10k rpm SAS spinning drives was a high end server. Today, we have well over 100 cores, with TBs of RAM and PCIe Gen 4/5 NVME storage (u.x) that is thousands of times faster and has a total cost lower than the servers from 2010 or so that your average laptop can outclass today.

You can vertically scale a traditional RDBMS like PostgreSQL to an extreme degree... Not to mention utilizing features like JSONB where you can have denormalized tables within a structured world. This makes it even harder to really justify using NoSQL/NewSQL databases. The main bottlenecks are easier to overcome if you relax normalization where necessary.

There's also the consideration of specialized databases or alternative databases where data is echo'd to for the purposes of logging, metrics or reporting. Not to mention, certain layers of appropriate caching, which can still be less complex than some multi-database approaches.

leafarlua · 2026-04-15T17:48:51 1776275331

What about the microservices/serverless functions world? This was another common topic over the years, that using SQL with this type of system was not optimal, I believe the issue being the connections to the SQL database and stuff.

tracker1 · 2026-04-15T18:04:04 1776276244

I think a lot of the deference to microservices/serverless is for similar reasons... you can work around some of this if you use a connection proxy, which is pretty common for PostgreSQL...

That said, I've leaned into avoiding breaking up a lot of microservices unless/until you need them... I'm also not opposed to combining CQRS style workflows if/when you do need micro services. Usually if you need them, you're either breaking off certain compute/logic workflows first where the async/queued nature lends itself to your needs. My limited experience with a heavy micro-service application combined with GraphQL was somewhat painful in that the infrastructure and orchestration weren't appropriately backed by dedicated teams leading to excess complexity and job duties for a project that would have scaled just fine in a more monolithic approach.

YMMV depending on your specific needs, of course. You can also have microservices call natural services that have better connection sharing heuristics depending again on your infrastructure and needs... I've got worker pools that mostly operate of a queue, perform heavy compute loads then interact with the same API service(s) as everything else.

hunterpayne · 2026-04-15T23:36:58 1776296218

microservices are about the fact that administrative overhead for a software system increases exponentially w.r.t. the complexity of the system. Or to put it another way, microservices are a way to make a complex system without having the architecture explode in size. They have nothing to do with making more efficient software systems. They are about making complex systems that trade dev costs for operational costs.

dalenw · 2026-04-15T14:45:55 1776264355

It's almost always a system design issue. Outside of a few specific use cases with big data, I struggle to imagine when I'd use NoSQL, especially in an application or data analytics scenario. At the end of the data, your data should be structured in a predictable manner, and it most likely relates to other data. So just use SQL.

greenavocado · 2026-04-15T14:54:16 1776264856

System design issues are a product of culture, capabilities, and prototyping speed of the dev team

mike_hearn · 2026-04-15T16:18:25 1776269905

Disclaimer: I work part time on the DB team.

You could also consider renting an Oracle DB. Yep! Consider some unintuitive facts:

• It can be cheaper to use Oracle than MongoDB. There are companies that have migrated away from Mongo to Oracle to save money. This idea violates some of HN's most sacred memes, but there you go. Cloud databases are things you always pay for, even if they're based on open source code.

• Oracle supports NoSQL features including the MongoDB protocol. You can use the Mongo GUI tools to view and edit your data. Starting with NoSQL is very easy as a consequence.

• But... it also has "JSON duality views". You start with a collection of JSON documents and the database not only works out your JSON schemas through data entropy analysis, but can also refactor your documents into relational tables behind the scenes whilst preserving the JSON/REST oriented view e.g. with optimistic locking using etags. Queries on JSON DVs become SQL queries that join tables behind the scenes so you get the benefits of both NoSQL and SQL worlds (i.e. updating a sub-object in one place updates it in all places cheaply).

• If your startup has viral growth you won't have db scaling issues because Oracle DBs scale horizontally, and have a bunch of other neat performance tricks like automatically adding indexes you forgot you needed, you can materialize views, there are high performance transactional message queues etc.

So you get a nice smooth scale-up and transition from ad hoc "stuff some json into the db and hope for the best" to well typed data with schemas and properly normalized forms that benefit from all the features of SQL.

alexisread · 2026-04-15T16:38:55 1776271135

Good points, but Postgres has all those, along with much better local testing story, easier and more reliable CDC, better UDFs (in Python, Go etc.), a huge ecosystem of extensions for eg. GIS data, no licencing issues ever, API compatability with DuckDB, Doris and other DBs, and (this is the big one) is not Oracle.

sgarland · 2026-04-15T21:18:41 1776287921

Unless I’ve missed something, Postgres doesn’t have automatic index creation, nor does it have JSON introspection to automatically convert it to a normalized schema (which is insane; I love it). It also doesn’t do any kind of sharding on its own, though of course forks like Citus exist. It definitely doesn’t do RAC / Exadata (not sure which part this falls under), where multiple nodes are connected and use RDMA to treat a bunch of SSDs as local storage.

I love Postgres, and am not a huge fan of Oracle as a corporation, but I can’t deny that their RDBMS has some truly astounding capabilities.

alexisread · 2026-04-16T07:03:06 1776322986

I think that’s the beauty of PG here, you can find solutions to most of this:

Index creation https://stackoverflow.com/questions/23876479/will-postgresql...

JSON->DB schema https://jsonschema2db.readthedocs.io/en/latest/index.html

Pg shared disk failover is similar but RAC is quite unique, you’re not going to use though with a rented cluster?

https://www.postgresql.org/docs/current/different-replicatio...

Personally for me any technical advantages don’t outweigh the business side, YMMV :)

mike_hearn · 2026-04-16T08:04:00 1776326640

RAC is a default part of any cloud Oracle DB, I think! I must admit I'm not an expert in all the different SKUs so there might be some that aren't, but if you rent an autonomous DB in the cloud you're probably running on ExaData/RAC. That's why the uptime is advertised as 99.95% even without a separate region.

> Index creation https://stackoverflow.com/questions/23876479/will-postgresql...

I was ambiguous. That's an answer telling how to create indexes manually, and saying that you get an index for primary keys and unique constraints automatically. Sure, all databases do that. Oracle can create arbitrary indexes for any relation in the background without it being requested, if it notices that common queries would benefit from them.

Forgetting to create indexes is one of the most common issues people face when writing database apps because the performance will be fine on your laptop, or when the feature is new, and then it slows down when you scale up. Or worse you deploy to prod and the site tanks because a new query that "works fine on my machine" is dog slow when there's real world amounts of data involved. Oracle will just fix it for you, Postgres will require a manual diagnosis and intervention. So this isn't the same capability.

> JSON->DB schema https://jsonschema2db.readthedocs.io/en/latest/index.html

Again I didn't provide enough detail, sorry.

What that utility is doing is quite different. For one, it assumes you start with a schema already. Oracle can infer a schema from a collection of documents even if you don't have one by figuring out which fields are often repeated, which values are unique, etc.

For another, what you get after running that utility is relational tables that you have to then access relationally via normal SQL. What JSON duality views give you is something that still has the original document layout and access mode - you GET/PUT whole documents - and behind the scenes that's mapped to a schema and then through to the underlying SQL that would be required to update the tables that the DB generated for you. So you get the performance of normalized relations but you don't have to change your code.

The nice thing about this is it lets developers focus on application features and semantics in the early stages of a startup by just reshaping their JSON documents at will, whilst someone else focuses on improving performance and data rigor fully asynchronously. The app doesn't know how the data is stored, it just sees documents, and the database allows a smooth transition from one data model to another.

I don't think Postgres has anything like this. If it does it'll be in the form of an obscure extension that cloud vendors won't let you use, because they don't want to/can't support every possible Postgres extension out there.

rezonant · 2026-04-15T20:41:33 1776285693

You had me at "is not Oracle"

tracker1 · 2026-04-15T16:32:40 1776270760

I generally limit Oracle to where you are in a position to have a dedicated team to the design, deployment and management of just database operations. I'm not really a fan of Oracle in general, but if you're in a position to spend upwards of $1m/yr or more for dedicated db staff, then it's probably worth considering.

Even then, PostgreSQL and even MS-SQL are often decent alternatives for most use cases.

mike_hearn · 2026-04-15T16:37:41 1776271061

That was true years ago but these days there's the autonomous database offering, where DB operations are almost all automated. You can rent them in the cloud and you just get the connection strings/wallet and go. Examples of stuff it automates: backups, scaling up/down, (as mentioned) adding indexes automatically, query plan A/B testing to catch bad replans, you can pin plans if you need to, rolling upgrades without downtime, automated application of security patches (if you want that), etc.

So yeah running a relational DB used to be quite high effort but it got a lot better over time.

tracker1 · 2026-04-15T17:10:31 1776273031

At that point, you can say the same for PostgreSQL, which is more broadly supported across all major and minor cloud platforms with similar features and I'm assuming a lower cost and barrier of entry. This is without signing with Oracle, Inc... which tends to bring a lot of lock-in behaviors that come with those feature sets.

TBF, I haven't had to use Oracle in about a decade at this point... so I'm not sure how well it competes... My experiences with the corporate entity itself leave a lot to be desired, let alone just getting setup/started with local connectivity has always been what I considered extremely painful vs common alternatives. MS-SQL was always really nice to get setup, but more recently has had a lot of difficulties, in particular with docker/dev instances and more under arm (mac) than alternatives.

I'm a pretty big fan of PG, which is, again, very widely available and supported.

mike_hearn · 2026-04-15T17:18:23 1776273503

Autonomous DB can run on-premises or in any cloud, not just Oracle's cloud. So it's not quite the same.

I think PG doesn't have most of the features I named, I'm pretty sure it doesn't have integrated queues for example (SELECT FOR UPDATE SKIP LOCKED isn't an MQ system), but also, bear in mind the "postgres" cloud vendors sell is often not actually Postgres. They've forked it and are exploiting the weak trademark protection, so people can end up more locked in than they think. In the past one cloud even shipped a transaction isolation bug in something they were calling managed Postgres, that didn't exist upstream! So then you're stuck with both a single DB and a single cloud.

Local dev is the same as other DBs:

    docker run -d --name <oracle-db> container-registry.oracle.com/database/free:latest

See https://container-registry.oracle.com

Works on Intel and ARM. I develop on an ARM Mac without issue. It starts up in a few seconds.

Cost isn't necessarily much lower. At one point I specced out a DB equivalent to what a managed Postgres would cost for OpenAI's reported workload:

> I knocked up an estimate using Azure's pricing calculator and the numbers they provide, assuming 5TB of data (under-estimate) and HA option. Even with a 1 year reservation @40% discount they'd be paying (list price) around $350k/month. For that amount you can rent a dedicated Oracle/ExaData cluster with 192 cores! That's got all kinds of fancy hardware optimizations like a dedicated intra-cluster replication network, RDMA between nodes, predicate pushdown etc. It's going to perform better, and have way more features that would relieve their operational headache.

chrisweekly · 2026-04-15T17:54:52 1776275692

In the spirit of helpfulness (not pedantry) FYI "knocked up" means "impregnated". Maybe "put together"?

mike_hearn · 2026-04-15T18:03:28 1776276208

Ah, this must be a British vs American English thing, thanks for the info.

Yes I meant it in this sense: "If you knock something up, you make it or build it very quickly, using whatever materials are available."

https://www.collinsdictionary.com/dictionary/english/knock-u...

chrisweekly · 2026-04-16T20:55:45 1776372945

Heh, thanks, yes, your meaning was obvious from context alone. I'm really surprised not to have encountered the British usage before (or so rarely). Maybe owing to potential for huge misunderstandings. ("Jimmy knocked her up" -> woke in the night || impregnated and abandoned.)

TIL

tracker1 · 2026-04-15T17:40:20 1776274820

And, again... most of my issues are with Oracle, Inc. So technical advantages are less of a consideration.

danny_codes · 2026-04-15T20:24:41 1776284681

But then you’d have to interact with Oracle.

So.

Yeah no sane person would be that stupid

OtomotO · 2026-04-15T18:02:52 1776276172

If you have an option, never ever use Oracle!

Never!

freedomben · 2026-04-15T16:26:52 1776270412

I wanted to hate you for suggesting Oracle, but you defend it well! I had no idea

ignoramous · 2026-04-15T20:11:43 1776283903

> One would think that for a startup of sorts, where things changes fast and are unpredictable, NoSQL is the correct answer. And when things are stable and the shape of entities are known, going for SQL becomes a natural path.

NoSQL is the "correct" answer if your queries are KV oriented, while predictable performance and high availability are priority (true for most "control planes"). Don't think any well-designed system will usually need to "graduate" from NoSQL to SQL.

Prior: https://news.ycombinator.com/item?id=22249490

AlotOfReading · 2026-04-15T17:16:49 1776273409

There's plenty of middle ground between an unchanging SQL schema and the implicit schemas of "schemaless" databases. You can have completely fluid schemas with the full power of relational algebra (e.g. untyped datalog). You shouldn't be using NoSQL just because you want to easily change schemas.

hunterpayne · 2026-04-15T22:46:01 1776293161

"NoSQL is the correct answer."

No, no it isn't. It never is. Just as building your house on a rubber foundation isn't the correct answer either. This is just cope. Unless your use cases don't care about losing data or data corruption at all, NoSQL isn't the correct answer.

leafarlua · 2026-04-17T12:56:59 1776430619

You are probably correct. I was just parroting what I have seem from the industry. Isn't it common for even big start ups to have decided to go with rubber for their foundation? It seems to not be a wise decision, yet many engineers do take this path. And that is why I end up confused on this type of discussion.

akdev1l · 2026-04-15T18:28:43 1776277723

> end up re-implementing relational databases at the application level anyway

This is by design, the idea is that scaling your application layer is easy but scaling your storage/db layer is not

Hence make the storage dumb and have the application do the joins and now your app scales right up

(But tbh I agree a lot of applications don’t reach the scale required to benefit from this)

tshaddox · 2026-04-15T16:43:55 1776271435

I've never used DynamoDB in production, but it always struck me as the type of thing where you'd want to start with a typical relational database, and only transition the critical read/write paths when you get to massive scale and have a very good understanding of your data access patterns.

icedchai · 2026-04-15T16:32:21 1776270741

Same. DynamoDB is almost never a good default choice unless you've thought very carefully about your current and future use cases. That's not to say it's always bad! At previous startups we did some amazing things with Dynamo.

noveltyaccount · 2026-04-15T14:02:39 1776261759

As soon as you need to do a JOIN, you're either rewriting a database or replatforming on Sqlite.

goerch · 2026-04-15T17:19:53 1776273593

a) Just heard today: JOINs are bad for performance b) How many columns can (an Excel) table have: no need for JOINs

hunterpayne · 2026-04-15T23:53:05 1776297185

Wow, I'm sorry you have to work with such coworkers. For reference, joins are just an expensive use case. DBs do them about 10x faster that you can do them by hand. But if you need a join, you probably should either a) do it periodically and cache the result (making your data inconsistent) or b) just do it in a DB. Confusing caching the result with doing the join efficiently is an amazing misunderstanding of basic Computer Science.

datadrivenangel · 2026-04-15T18:22:10 1776277330

vlookups are bad for performance. recursive vlookups even more so.

pgtan · 2026-04-15T15:57:39 1776268659

Here are two checks using joins, one with sqlite, one with the join builtin of ksh93:

  check_empty_vhosts () {
    # Check which vhost adapter doesn't have any VTD mapped
    start_sqlite
    tosql "SELECT l.vios_name,l.vadapter_name FROM vios_vadapter AS l
        LEFT OUTER JOIN vios_wwn_disk_vadapter_vtd AS r
    USING (vadapter_name,vios_name)
    WHERE r.vadapter_name IS NULL AND
      r.vios_name IS NULL AND
   l.vadapter_name LIKE 'vhost%';"
    endsql
    getsql
    stop_sqlite
  }

  check_empty_vhosts_sh () {
    # same as above, but on the shell
    join  -v 1  -t , -1 1 -2 1 \
   <(while IFS=, read vio host slot; do 
  if [[ $host == vhost* ]]; then
      print ${vio}_$host,$slot 
  fi
     done < $VIO_ADAPTER_SLOT | sort -t , -k 1)\
   <(while IFS=, read vio vhost vtd disk; do
  if [[ $vhost == vhost* ]]; then        
    print ${vio}_$vhost
  fi
     done < $VIO_VHOST_VTD_DISK | sort -t , -k 1)
  }

bachmeier · 2026-04-15T15:58:29 1776268709

Based on what's in the article, it wouldn't take much to move these files to SQLite or any other database in the future.

Edit: I just submitted a link to Joe Armstrong's Minimum Viable Programs article from 2014. If the response to my comment is about the enterprise and imaginary scaling problems, realize that those situations don't apply to some programming problems.

locknitpicker · 2026-04-15T16:02:46 1776268966

> Based on what's in the article, it wouldn't take much to move these files to SQLite or any other database in the future.

Why waste time screwing around with ad-hoc file reads, then?

I mean, what exactly are you buying by rolling your own?

bachmeier · 2026-04-15T16:08:37 1776269317

You can avoid the overhead of working with the database. If you want to work with json data and prefer the advantages of text files, this solution will be better when you're starting out. I'm not going to argue in favor of a particular solution because that depends on what you're doing. One could turn the question around and ask what's special about SQLite.

pythonaut_16 · 2026-04-15T16:13:19 1776269599

If your language supports it, what is the overhead of working with SQLite?

What's special about SQLite is that it already solves most of the things you need for data persistence without adding the same kind of overhead or trade offs as Postgres or other persistence layers, and that it saves you from solving those problems yourself in your json text files...

Like by all means don't use SQLite in every project. I have projects where I just use files on the disk too. But it's kinda inane to pretend it's some kind of burdensome tool that adds so much overhead it's not worth it.

cleversomething · 2026-04-15T16:27:55 1776270475

> what's special about SQLite

Battle-tested, extremely performant, easier to use than a homegrown alternative?

By all means, hack around and make your own pseudo-database file system. Sounds like a fun weekend project. It doesn't sound easier or better or less costly than using SQLite in a production app though.

locknitpicker · 2026-04-15T16:18:25 1776269905

> You can avoid the overhead of working with the database.

What overhead?

SQLite is literally more performant than fread/fwrite.

cleversomething · 2026-04-15T16:24:49 1776270289

That's exactly what I was going to say. This seems more like a neat "look Ma, no database!" hobby project than an actual production recommendation.

ablob · 2026-04-15T16:20:07 1776270007

So you trade the overhead of SQL with the overhead of JSON?

whalesalad · 2026-04-15T16:06:38 1776269198

Reminds me of the infamous Robert Virding quote:

“Virding's First Rule of Programming: Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.”

mrec · 2026-04-15T17:23:35 1776273815

In case you weren't aware, that in itself is riffing on Greenspun's tenth rule:

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

hackingonempty · 2026-04-15T17:03:59 1776272639

Probably more like a low-quality, poorly-tested reinvention of BerkeleyDB.

randyrand · 2026-04-15T16:22:58 1776270178

“You Aren’t Gonna Need It” - one of the most important software principles.

Wait until you actually need it.

dkarl · 2026-04-15T17:20:57 1776273657

I interpret YAGNI to mean that you shouldn't invest extra work and extra code complexity to create capabilities that you don't need.

In this case, I feel like using the filesystem directly is the opposite: doing much more difficult programming and creating more complex code, in order to do less.

It depends on how you weigh the cost of the additional dependency that lets you write simpler code, of course, but I think in this case adding a SQLite dependency is a lower long-term maintenance burden than writing code to make atomic file writes.

The original post isn't about simplicity, though. It's about performance. They claim they achieved better performance by using the filesystem directly, which could (if they really need the extra performance) justify the extra challenge and code complexity.

goerch · 2026-04-15T18:01:02 1776276062

Is this what we do with education in general?

upmostly · 2026-04-15T16:39:12 1776271152

100%.

Premature optimisation I believe that's called.

I've seen it play out many times in engineering over the years.

hunterpayne · 2026-04-16T00:00:22 1776297622

Counterpoint, Meta is currently (and for the last decade) trying to rewrite MySQL so it is basically Postgres. They could just change their code so it works with Postgres and retrain their ops on Postgres. But for some reason they think its easier to just rewrite MySQL. Now, that is almost certainly more about office politics than technical matters...but it could also be the case that they have so much code that only works with MySQL that it is true (seriously doubtful).

You are just mislabling good architecture as 'premature optimization'. So I will give you another platitude... "There is nothing so permanent as a temporary software solution"

trgn · 2026-04-15T17:44:57 1776275097

im sure, but honestly, i would love to have a db engine that just writes/reads csv or json. does it exist?

banana_giraffe · 2026-04-15T17:51:05 1776275465

DuckDB can do exactly this, once you get the API working in your system, it becomes something simple like

    SELECT \* from read_csv('example.csv');

Writing generally involves reading to an in-memory database, making whatever changes you want, then something like

    COPY new_table TO 'example.csv' (HEADER true, DELIMITER ',');

herpdyderp · 2026-04-15T17:47:46 1776275266

I wrote a CSV DB engine once! I can't remember why. For fun?

zabzonk · 2026-04-15T19:30:53 1776281453

Microsoft actually provide an ODBC CSV data source out of the box.

hunterpayne · 2026-04-16T00:01:49 1776297709

Postgres can do that as well.

akdev1l · 2026-04-15T18:29:12 1776277752

SQLite can do it

trgn · 2026-04-15T19:38:24 1776281904

it's storage file is a csv? or do you mean import/export to csv?

akdev1l · 2026-04-17T04:25:49 1776399949

You can import csv files into in memory tables and query them or you can use the csv extensions

$ sqlite3 :memory:

.import myfile.csv mytable

SELECT * FROM mytable;

$ sqlite3 :memory:

SELECT *

FROM csv_read('myfile.csv');

gorjusborg · 2026-04-15T13:46:21 1776260781

Only if you get there and need it.

z3ugma · 2026-04-15T13:58:56 1776261536

but it's so trivial to implement SQLite, in almost any app or language...there are sufficient ORMs to do the joins if you don't like working with SQL directly...the B-trees are built in and you don't need to reason about binary search, and your app doesn't have 300% test coverage with fuzzing like SQLite does

you should be squashing bugs related to your business logic, not core data storage. Local data storage on your one horizontally-scaling box is a solved problem using SQLite. Not to mention atomic backups?

hirvi74 · 2026-04-15T14:21:02 1776262862

Sqlite is also the only major database to receive DO-178B certification, which allows Sqlite to legally operate in avionic environments and roles.

moron4hire · 2026-04-15T14:08:31 1776262111

Came here to also throw in a vote for it being so much easier to just use SQLite. You get so much for so very little. There might be a one-time up-front learning effort for tweaking settings, but that is a lot less effort than what you're going to spend on fiddling with stupid issues with data files all day, every day, for the rest of the life of your project.

tracker1 · 2026-04-15T16:42:44 1776271364

Even then... I'd argue for at least LevelDB over raw jsonl files... and I say this as someone who would regularly do ETL and backups to jsonl file formats in prior jobs.

gorjusborg · 2026-04-15T14:12:40 1776262360

Honestly, there is zero chance you will implement anything close to sqlite.

What is more likely, if you are making good decisions, is that you'll reach a point where the simple approach will fail to meet your needs. If you use the same attitude again and choose the simplest solution based on your _need_, you'll have concrete knowledge and constraints that you can redesign for.

z3ugma · 2026-04-15T18:29:51 1776277791

not re-implement SQLite, I mean "use SQLite as your persistence layer in your program"

e.g. worry about what makes your app unique. Data storage is not what makes your app unique. Outsource thinking about that to SQLite

9rx · 2026-04-15T14:05:11 1776261911

> and your app doesn't have 300% test coverage with fuzzing like SQLite does

Surely it does? Otherwise you cannot trust the interface point with SQLite and you're no further ahead. SQLite being flawless doesn't mean much if you screw things up before getting to it.

RL2024 · 2026-04-15T14:29:04 1776263344

That's true but relying on a highly tested component like SQLite means that you can focus your tests on the interface and your business logic, i.e. you can test that you are persisting to the your datastore rather than testing that your datastore implementation is valid.

9rx · 2026-04-15T14:43:08 1776264188

Your business logic tests will already, by osmosis, exercise the backing data store in every conceivable way to the fundamental extent that is possible with testing given finite time. If that's not the case, your business logic tests have cases that have been overlooked. Choosing SQLite does mean that it will also be tested for code paths that your application will never touch, but who cares about that? It makes no difference if code that is never executed is theoretically buggy.

wmanley · 2026-04-15T15:45:09 1776267909

Business logic tests will rarely test what happens to your data if a machine loses power.

9rx · 2026-04-15T15:53:32 1776268412

Then your business logic contains unspecified behaviour. Maybe you have a business situation where power loss conditions being unspecified is perfectly acceptable, but if that is so it doesn't really matter what happens to your backing data store either.

upmostly · 2026-04-15T13:49:03 1776260943

Exactly. And most apps don't get there and therefore don't need it.

evanelias · 2026-04-15T14:19:29 1776262769

Your article completely ignores operational considerations: backups, schema changes, replication/HA. As well as security, i.e. your application has full permissions to completely destroy your data file.

Regardless of whether most apps have enough requests per second to "need" a database for performance reasons, these are extremely important topics for any app used by a real business.

z3ugma · 2026-04-06T18:54:48 1775501688

Didn't they already have one of these? Like the original FitBit was a screenless fitness band...

z3ugma · 2026-03-30T16:41:07 1774888867

You did all of this in one week?! Pretty cool. Even with LLM-assisted making, it shows that you have a lot of taste and architecture know-how. What would you attribute to being able to think through useful abstracts like the design tokens?

z3ugma · 2026-03-30T15:47:06 1774885626

I have been using this library for a few months alongside Gemini 3.1 Fast

It's really useful to get an iteration loop going with an LLM.

The OCCP viewer extension for VS Code helps make sure you can see and manipulate the resulting model

nakedneuron · 2026-03-30T16:10:09 1774887009

last time i tried, i didnt get the standalone mode to run.. there seems to have been an update in february, so i will give this another try when there's time..

(context: https://github.com/bernhard-42/vscode-ocp-cad-viewer/)

tonyarkles · 2026-03-30T20:43:48 1774903428

data point: standalone mode worked just fine for me a couple of weeks ago on an M4 MBP. Hadn't tried before so can't say if something got fixed.

z3ugma · 2026-03-22T04:01:07 1774152067

See also the recently-posted open source project https://github.com/ForestHubAI/boardsmith

"Text prompt → KiCad schematic, BOM, and firmware. No templates — real wired circuits with computed values."

z3ugma · 2026-03-20T15:37:52 1774021072

"Well There's Your Problem" on the collapse of the St Francis Dam, mentioned in Grady's video https://www.youtube.com/watch?v=hxLgM1vnuUA

Also I love when they refer to it as the "_First_ California Water Wars" in a grim realization of the future of water scarcity in the West

hamdingers · 2026-03-20T16:29:12 1774024152

There is no water scarcity in California, only misallocation. The vast majority of our water is heavily subsidized and used for agriculture, and a substantial amount of those crops are grown for export, yet agricultural exports makes up an insignificant part of California's economy.

We could end all California water scarcity talk today, with no impact to food availability for Americans, by curtailing the international export of just two California crops: almonds and alfalfa.

SCUSKU · 2026-03-20T16:54:14 1774025654

Anecdotally, my friend's grandma was an almond farmer. As they drove past a river in the Central Valley, she exclaimed "Why is there water in that river?! Those could be watering my almond trees!"

dclowd9901 · 2026-03-20T20:08:48 1774037328

In Arizona we grow alfalfa as well -- it's mind boggling to me that in places where water is so scarce we use so much of it on such a low value crop.

crooked-v · 2026-03-20T20:29:34 1774038574

That alfalfa gets extensively exported as livestock feed... and alfalfa is literally mostly water by weight. So the arrangement is literally shipping out local groundwater in bulk to other countries.

zahlman · 2026-03-21T14:00:41 1774101641

I hear about the almonds a lot. Are they more water-intensive than other tree nuts? Are they not commonly grown elsewhere in the world? All I really know about them is that they seem kinda nice, but not really worth the cost.

kccqzy · 2026-03-20T17:18:50 1774027130

So why hasn’t that been done? Have some representatives and senators set limits on almond exports. Surely they wouldn’t be voted out in the next election given how farmers are outnumbered.

patmorgan23 · 2026-03-20T17:31:35 1774027895

Because farmers are making money off of exporting and have significant lobbying power

markdown · 2026-03-20T21:27:29 1774042049

Bold of you to assume that ordinary voters matter more than Billionaires like the Resnicks of The Wonderful Company.

coryrc · 2026-03-20T17:24:59 1774027499

Almonds are climate-appropriate product and valuable. Alfalfa can cheaply be grown off rainwater in the Midwest and it alone frees up sufficient water.

kenhwang · 2026-03-20T18:49:56 1774032596

The problem is alfalfa is expensive to transport (heavy due to desired moisture content). So while it can be cheaply grown in the Midwest, it can't be cheaply transported from the Midwest to where buyers of alfalfa are (typically overseas).

Alfalfa is also a staple for crop rotation, so any farming operation will still grow some alfalfa to maintain rotation for good soil health (or during bad condition seasons since it's hardier to poor conditions and not a permanent crop).

If alfalfa cannot be exported (through policy or economic conditions), the low price attracts more livestock production in-state (which would be even worse for water use).

Those things makes it a hard crop to target for sustainability and export.

coryrc · 2026-03-20T22:34:40 1774046080

> it can't be cheaply transported from the Midwest to where buyers of alfalfa are

Trains.

Alfalfa isn't the only alternative, and they should switch to higher-value crops anyway. They would if they had to pay for water. We simply need to charge everybody for water usage.

kenhwang · 2026-03-20T23:31:45 1774049505

The problem is alfalfa is a high value crop and a water efficient crop relative to value.

So as water/weather gets more unpredictable and beef/dairy rises in price, alfalfa becomes even more attractive to grow.

weaksauce · 2026-03-20T16:52:37 1774025557

to put this to numbers... the exports are just about 0.5% of california's GDP. so yeah pretty much a rounding error.

chrisrogers · 2026-03-20T18:34:59 1774031699

0.5% is a far cry from a rounding error..

panzagl · 2026-03-20T18:53:02 1774032782

0.5% is like the literal definition of a rounding error.

philipwhiuk · 2026-03-21T13:45:33 1774100733

Also this https://youtu.be/5erVH-zk7Uk?si=kDEFEtltdyhp6Nen

z3ugma · 2026-03-19T20:42:32 1773952952

I will say I came upon this same design pattern to make all my chats into semantic Markdown that is backward compatible with markdown. I did:

````assistant

gemini/3.1-pro - 20260319T050611Z

Response from the assistant

````

with a similar block for tool calling This can be parsed semantically as part of the conversation but also is rendered as regular Markdown code block when needed

Helps me keep AI chats on the filesystem, as a valid document, but also add some more semantic meaning atop of Markdown

mncharity · 2026-03-20T16:37:43 1774024663

> AI chats as a valid document

So many formats, with different tradeoffs around readable/parsable/comments/etc. I wish there was a "universal" converter. With LLM's sometimes used to edit chat traces, I'd like ingestion from md/yaml, not merely a "render from message json".

So .json `[{"role": "user", "content": "Hi"}` <-> .md ` ```json\n[{"role": "user", "content": "Hi"}` <-> above ` ```user\nHi` <-> `# User\nHi` <-> ` ```chatML\n<|user|>\nHi` <-> .html rendered .md, but with elements like <think> and <file> escaped... etc.

z3ugma · 2026-03-20T18:07:52 1774030072

Do you know the meme about carcinization ...how in nature, everything tends toward becoming a crab?

I think we are reinventing HTML from first principles. It's semantic structuring with a meaningful render

mncharity · 2026-03-20T23:52:11 1774050731

Hmm. HTML has always had goals and tradeoffs which are in tension with many uses. XML too. Witness the very many versions of "write this instead, and it becomes HTML" - long and widely used and valued. Perhaps we collectively might have done better, but we didn't. Turns out LLMs also find different formats significantly easier to use for different things.

As a tradeoff example, yesterday I again tripped on the KISS "CDATA doesn't support HEREDOC-like prefix whitespace removal". So does one indent, compromising payloads where leading ws is significant, or not, confusing humans and llms.

Re reinvention and first principles, aside from engineering tradeoffs, it can be hard to understand design spaces and to be aware of related work. I suspect there's a missing literature to support these, but professional organizations have been AWOL, and research funding dysfunctional. And commercial conflicts of interest. And it's hard. But now coding LLMs are messing with "don't reinvent wheels" payoff tables. Perhaps we'll someday be able to be explicit about design space structure and design choice consequences too. And perhaps we're already getting transformatively more flexible around format extension and interoperation. TFA isn't just a new format - it's a github repo which will help teach LLMs how to do progressive execution of fenced code blocks, making the next format which does this potentially easier to create. "Merge in what X does, but <change request>". Yay?

IIUC, non-meme carcinization is something vaguely like "similar tradeoffs pressure towards similar forms in diverse contexts". LLMs might help us more easily understand tradeoffs, implement forms, and manage diversity?

z3ugma · 2026-03-12T15:48:34 1773330514

Nah read my own comment history. I tend to phrase things this way a lot too because it positions it as opinion/curiosity rather than arrogant / confident statement?

z3ugma · 2026-03-10T04:08:13 1773115693

This is such a well-written essay. Every line revealed the answer to the immediate question I had just thought of

lovecg · 2026-03-10T05:08:58 1773119338

I can’t get past all the LLM-isms. Do people really not care about AI-slopifying their writing? It’s like learning about bad kerning, you see it everywhere.

crakhamster01 · 2026-03-10T07:33:22 1773128002

I had a similar reaction to OP for a different post a few weeks back - I think some analysis on the health economy. Initially as I was reading I thought - "Wow, I've never read a financial article written so clearly". Everything in layman's terms. But as I continued to read, I began to notice the LLM-isms. Oversimplified concepts, "the honest truth" "like X for Y", etc.

Maybe the common factor here is not having deep/sufficient knowledge on the topic being discussed? For the article I mentioned, I feel like I was less focused on the strength of the writing and more on just understanding the content.

LLMs are very capable at simplifying concepts and meeting the reader at their level. Personally, I subscribe to the philosophy of - "if you couldn't be bothered to write it, I shouldn't bother to read it".

ajkjk · 2026-03-10T08:22:46 1773130966

Alternate theory... a few months into the LLMism phenomenon, people are starting to copy the LLM writing style without realizing it :(

amonith · 2026-03-10T10:52:16 1773139936

This happens to non-native English speakers a lot (like me). My style of writing is heavily influenced by everything I read. And since I also do research using LLMs, I'll probably sound more and more as an AI as well, just by reading its responses constantly.

I just don't know what's supposed to be natural writing anymore. It's not in the books, disappears from the internet, what's left? Some old blogs for now maybe.

crakhamster01 · 2026-03-10T15:08:37 1773155317

The wave of LLM-style writing taking over the internet is definitely a bit scary. Feels like a similar problem to GenAI code/style eventually dominating the data that LLMs are trained on.

But luckily there's a large body of well written books/blogs/talks/speeches out there. Also anecdotally, I feel like a lot of the "bad writing" I see online these days is usually in the tech sphere.

juuular · 2026-03-10T14:29:54 1773152994

Books definitely have natural writing, read more fiction! I recommend Children of Time by Adrian Tchaikovsky

weird-eye-issue · 2026-03-10T05:18:17 1773119897

I think you're just hallucinating because this does not come across as an AI article

lovecg · 2026-03-10T05:43:13 1773121393

I see quite a few:

“what X actually is”

“the X reality check”

Overuse of “real” and “genuine”:

> The real story is actually in the article. … And the real issue for Cursor … They have real "brand awareness", and they are genuinely better than the cheaper open weights models - for now at least. It's a real conundrum for them.

> … - these are genuinely massive expenses that dwarf inference costs.

This style just screams “Claude” to me.

hansvm · 2026-03-10T05:54:17 1773122057

It was almost certainly at least heavily edited with one. Ignoring the content, every single thing about the structure and style screams LLM.

lelanthran · 2026-03-10T06:58:32 1773125912

> I think you're just hallucinating because this does not come across as an AI article

It has enough tells in the correct frequency for me to consider it more than 50% generated.

NetOpWibby · 2026-03-10T05:21:19 1773120079

Name checks out

raincole · 2026-03-10T09:00:12 1773133212

It's really unfortunate that we call well-structured writing 'LLM-isms' now.

Erem · 2026-03-10T05:17:04 1773119824

I don’t see the usual tells in this essay

152334H · 2026-03-10T05:59:05 1773122345

People care, when they can tell.

Popular content is popular because it is above the threshold for average detection.

In a better world, platforms would empower defenders, by granting skilled human noticers flagging priority, and by adopting basic classifiers like Pangram.

Unfortunately, mainstream platforms have thus far not demonstrated strong interest in banning AI slop. This site in particular has actually taken moderation actions to unflag AI slop, in certain occasions...

rhubarbtree · 2026-03-10T07:03:02 1773126182

It is certainly very obvious a lot of the time. I wonder if we revisited the automated slop detection problem we’d be more successful now… it feels like there are a lot more tells and models have become more idiosyncratic.

weird-eye-issue · 2026-03-10T07:27:33 1773127653

Tons of companies do this already. It's not like this is a problem that nobody is constantly revisiting...

rhubarbtree · 2026-03-10T22:46:24 1773182784

What’s one company that has revisited this recently and what’s their detection rate on what sample?

weird-eye-issue · 2026-03-11T01:06:27 1773191187

Companies like Originality.ai are always updating their models and you could use a simple Google search to answer your questions.

rhubarbtree · 2026-03-11T23:49:16 1773272956

You could also have had the courtesy to put that in your original post. But let’s not get meta.

I did a quick test and it detected an AI summary of a random topic, even after two prompts to disguise it. So as expected it may have become a lot easier to detect.

weird-eye-issue · 2026-03-11T23:59:13 1773273553

There are literally hundreds of companies that are doing this. You could have the basic courtesy to do a Google search instead of asking.

sebastiennight · 2026-03-14T19:38:22 1773517102

This is an Internet forum and one of the ways such places are valuable is that it enables you to ask questions to other humans and allows those other humans, if they'd like, to answer.

You will get better results asking questions like GP's than Googling because you're asking the specific person who made a claim to quote an example, so you can judge from the specific example they provide, rather than the Google results. The best answers are often technically interesting niche tools which don't have great SEO.

Case in point: the platform you recommended does not show up anywhere on my first page of Duck.com results.