In the event someone is encountering suffix trees for the first time and thinkin...

abahgat · 2026-03-13T05:41:42 1773380502

I totally fell for the "obscene memory" trap myself. My first encounter with suffix trees outside of a textbook was for an ITA Software 'Instant Search' puzzle. The requirement was sub-0.1ms search on a large string database, I went straight for a generalized suffix tree. Then I realized they had asked for the solution to fit within a 1GB heap. :(

I wrote up the full 'war story' of how I had to profile the heap and optimize the node representation (shaving bytes off the edge storage) just to get it to boot without an OOM error: https://www.abahgat.com/blog/the-programming-puzzle-that-got...

It’s the most tangible example I've run into of where theoretical O(n) space complexity meets the reality of object pointer overhead.

xen0 · 2026-03-12T11:07:46 1773313666

I've never grokkeed suffix trees, but isn't possible for them to be O(n) in space (n total length of all strings)? Is there just an unacceptable constant factor overhead? I can imagine the pointer overhead being painful.

sillywabbit · 2026-03-12T13:37:08 1773322628

Like the other poster said, the rabbit hole continues with suffix arrays (https://en.wikipedia.org/wiki/Suffix_array#Space_efficiency), then compressed suffix arrays (https://en.wikipedia.org/wiki/Compressed_suffix_array).

Also explained by the creator of this: https://www.abahgat.com/project/suffix-tree/

> the human genome can be encoded as a 3GB string constructed out of an alphabet of four characters

> As of 2019, a suffix tree indexing the human genome using state of the art algorithms can easily occupy tens of gigabytes.

vintermann · 2026-03-12T08:37:37 1773304657

Yes, we get the same speed from suffix arrays these days, and much, much less memory usage.

But good luck visualizing what those algorithms do :)