I'm looking forward to it. The 9th is great in its own right and a lot of it is in the 11th. Alfred Newton's nearly 200 articles on bird species and a few classic essays by Macaulay come to mind offhand.
I feel exactly the same way about encyclopedias and dictionaries. And Encarta really was amazing. You'd be surprised how much modern criticism of the 11th amounts to "no entry on the Great War", except in earnest.
By the way, it looks like there's a bug where I can't search for articles when already inside one. To do so, I need to go back to home > articles and then search.
If you're reading an article, just go to the top and type in the left-hand search box. That will search for articles as well as text within articles. The right-hand box searches the text of the article you're reading.
I'm familiar with the Synopticon, which would be fun to structure.
I didn’t do OCR myself, except for the topic index and to fill in a few gaps. I started from existing Wikisource text and then built a pipeline around that: cleaning (headers, hyphenation, etc.), detecting article boundaries, reconstructing sections, and linking things back to the original page images. Most of the effort went into rendering the complex layouts, and handling the cross-linking, not the initial ingestion.
Glad to go into more detail if you’re interested, but that’s the gist of it.
Under the hood it’s not XML-TEI — it’s a relational/data-pipeline approach, with article boundaries, sections, contributors, cross-references, and source-page provenance all reconstructed into structured records. The text itself is public domain, but I haven’t released a bulk structured export yet.
People asking for dataset access has definitely been one of the themes of this thread. I’m taking that seriously. If I do expose it, I’d want to do it in a form that preserves the structure and doesn't just dump plain text.
No doubt. That’s one of the reasons I find the 1911 edition interesting — the authors have more license to express their own opinions, which naturally reflect those current at the time.
"Not, of course, that there is any magic about the past. People were no cleverer then than they are now; they made as many mistakes as we. But not the same mistakes. They will not flatter us in the errors we are already committing; and their own errors, being now open and palpable, will not endanger us."
Yes, that’s one of the things I like most about it. The articles have a personal tone and are less homogenized.
You get that mix of geography, history, and sometimes quite opinionated description all in one place, which makes them much more readable, in my view. My introduction to this version discusses this and other related matters: https://britannica11.org/about.html
Excellent points. There are indeed two Zurich articles. One way to get to the city is to search for Zurich and open the second one, which goes to the city directly. The xref in Zurich (canton) is indeed a disambiguation bug (identically named articles); thanks for catching that.
I haven't tested the article search box on the article viewer in Firefox. I'll look into that as well.
Making the title linkable is a great idea and it will be implemented shortly. Thanks for catching all of this.
reply