Worse-is-better in e-book formats

posted on April 03, 2009

It is the year 2009 and the most popular electronic book format in the world consists of HTML 3.2 in a Palm database. Why?

For those of you who aren't total e-book wonks, I'm talking about the Mobipocket format. The eponymous controlling company first debuted in 2000, proved the most successful of the initial crop of PDA-centric e-book format vendors, and was acquired by Amazon in 2005. The format is now not only the most popular "device independent" commercial e-book format, but is also the format used for the vast majority of Kindle e-books. More devices support Mobipocket than any other commercially-sold format, and although we don't have any actual sales data, I don't think there's much doubt that Amazon is selling more devices and books than any other vendor, at least in the US.

A lot of technical detail follows, but here's most of the conclusion: I think that Mobipocket has managed to hit a sort of technology/complexity engineering/usability sweet-spot. It uses (an albeit bastardized) HTML for markup, which gives it an edge over eReader; it uses an extremely simple container format, which gives it an edge over LIT; and it uses an appallingly simple rendering model, which gives it an edge over EPUB. This is kind of difficult to explain without going into loads and loads of detail about the e-book formats, so bear with me if I get long-winded.

Here's a summary of how Mobipocket stacks up technically against the other commercial reflowable formats:

  • Mobipocket: Container is a Palm database, providing primarily a mapping between a numeric identifier and "record" content. Book text is a single stream of HTML 3.2-ish with proprietary extensions (mostly formatting-oriented), no separate style language, and proprietary compression in 4k chunks. Images are GIF and JPEG, which some viewers limit to 64k in size.
  • eReader: Container is a Palm database. Book text is a single stream of "PML," a proprietary, non-SGML, primarily formatting-oriented markup language. Images are PNG, limited to 64k.
  • Microsoft LIT: Container is an MS "ITOL/ITLS" HTML Help 2.0 file, providing all sorts of exciting features like LZX compression, binary representation of markup content, and arbitrary optional auxiliary data associated with each content stream. Book text is an arbitrary number of OEBPS 1.0 markup streams (essentially HTML 4.0 as well-formed XML) plus a subset of OEBPS 1.0 CSS (no contextual selectors). Images are GIF, JPEG, and PNG.
  • EPUB: Container is a ZIP file with some extra JAR-like metadata. Book text is an arbitrary number of XHTML 1.0 streams1 with CSS 2 styling. Images are JPEG, PNG, and SVG.

Mobipocket and eReader are near the same level technically, but Mobipocket's success — and eReader's continued existence — in the face of LIT and EPUB I think is quite interesting. EPUB is still very new, but Microsoft first released Microsoft Reader in 2000, the same year that Mobipocket incorporated. LIT is a closed format which relies on quite a bit of MS-specific technology, but EPUB is an open standard and composed mostly of other existing and well-supported open standards. Cross-comparison is far from free of contaminating historical/environmental factors, but I think there is something to be learned from Mobipocket — that worse is still better.

Mobipocket and eReader were founded around the same time (eReader in 1998 as Peanut Press) and with essentially the same constraints. Both primarily targeted PDAs running PalmOS and thus needed a format which they could easily render on (by today's standards) woefully under-powered devices. EReader chose to create their own simple, easy-to-parse markup language called PML (Peanut Markup Language). Even after 10 years, the core language remains comparatively elegant, containing only a few dozen tags and no redundancy outside of deprecated features. It simply and directly supports most of the formatting features possible on a Palm PDA with no pretence of semantic connotation.

In contrast, Mobipocket chose to extend the existing HTML 3.2 markup language. It's somewhat difficult to understand all the reasons for this decision without knowing the politics involved or the existing state of HTML rendering on the Palm platform at the time, but certain things are clear. First, rendering HTML 3.2 is more difficult than rendering a PML-like language — HTML is much more complicated, is harder to parse, and contains many redundancies in the formatting-wise interpretation of its elements. HTML 4 and CSS 2 were the cutting-edge standards in 2000, so choosing HTML 3.2 didn't provide much coherence with existing Web standards, a problem furthered by the addition of proprietary features implementing such things as page breaks and paragraph indentation.2

It is unclear to me how much Mobipocket's similarity to HTML aids strict interoperability, but the perception that they are iteroperable clearly exists, and for quick-and-dirty case, for interpretation of texts with no fancy formatting, the correspondence is simply good enough. The correspondence between HTML semantics and Mobipocket formatting is sufficiently weak that when I wrote Mobipocket-generation support for calibre I opted to treat "MobiML" as a completely distinct pure-formatting language. This approach allows for a higher degree of formatting fidelity (although not complete — Mobipocket's formatting limitations are many and baroque), but in 99% of cases, throwing some HTML at the Mobipocket renderer will render well enough to actually read the text.

One of the big simplicity wins a language like PML has over HTML is that it has an explicitly "flat" rendering and markup model. Tags intoducing paragraphs completely determine the paragraph-level formatting of their contained text, the renderer either disregarding or disallowing any previously active formatting state. In contrast, all versions of HTML allow some degree of arbitrary block-level nesting in which the active formatting state combines and merges with the new state. This means that to start accurately rendering at some arbitrary point — at the destination of a hyperlink, or just where the user left off the last time they were reading — a reader application needs to be able to figure out the current formatting state at that point. There are a few solutions to this problem.

One is to do what Microsoft LIT does and put lots of extra information in the container. The LIT container is compressed in 64k chunks, which allows full compression with random seeking to anything in the container with minimal decompression overhead. LIT contains indices of all the hyperlink target elements and all page-breaking elements which specify the positions of all their ancestor elements. These combined with LIT's simplified contextless CSS mean a full LIT renderer3 can figure out the formatting state of anywhere in the book with a minimum amount of extraneous processing. Accurate rendering is just a few index lookups away!

The downside is that all this is very complicated. Microsoft is able to piggy-back off of their HTML Help support libraries, but third-party implementations would need to read the whole mulilayered ITOL/ITLS archive goodness from scratch and incorporate the indices into their renderers. There are many third-party reader applications which can handle Mobipocket, but very few which handle LIT, and none that I know which actually use all the extra information in the LIT container format.4

Another solution to the problem is to do what EPUB does and depend on having fast enough hardware that figuring out the appropriate rending isn't an issue. EPUB is conceptually very simple — XHTML in a ZIP archive. On contemporary hardware with contemporary embedable HTML renderers like WebKit this is almost trivial to implement — I believe Kovid Goyal put together the first version of the calibre EPUB viewer in about a week.

The downside is that your cellphone or e-ink display reader isn't quite so powerful. The EPUB specification places no limitation on CSS complexity or XHTML flow size, allowing for example a CSS sibling selector applied over 100MB XHTML file. Not to mention that you have to decompress that 100MB flow all at once and keep the whole thing around in memory while rendering it. Ouch. There are a few ways to keep processing time generally sane while still rendering most markup correctly, but Adobe's implementation on the Sony Reader of simply refusing to render any flow larger than 300k has forced that simple expedient as the most common method.

And then there's Mobipocket's solution — don't render the markup correctly. Yep, you heard me correctly. The text still displays, but if that hyperlink drops you in the markup right after the italics tag, then the text which was supposed to be in italics won't be. Oh, it has chunked compression LIT so it can seek anywhere with minimal decompression overhead, but when it gets there it just starts rendering with what it's got.

When I first realized this I was completely aghast. "It's wrong! They're letting text be rendered wrong!" But the more I thought about it, the more I came to feel that this was actually a brilliant decision. It means that rendering is always instantaneous, no matter where the user jumps to in the book. Although book-producers have to do some extra work if they want all their links to point to Mobipocket-sensible places in the markup, if they don't then the book is still readable. Pathological cases merely degrade rendering, not disrupt it. Contrast this with the EPUB experience-thus-far, where books which cannot be read on the Sony Reader are an unfortunately common occurrence.

Of course, I still completely detest the Mobipocket format and hope it dies in a fire. Seriously, a Palm database in 2009? What I'd like to see is a new version of the EPUB standard which takes the lessons learned from other formats more to heart. With a reduction in complexity, standardization of certain necessary size limits, and at least reader application guidelines for imposing user stylesheets, I think that EPUB can still be the format to beat going forward.

1 Or technically DTBook, although I've never seen one in the wild and don't know which — if any — viewers support it.

2 Although there is evidence some of these were added later. For example, if the Mobipocket file header indicates one the earliest version of the format then the <hr> tag induces an explicit page break.

3 That is to say, Microsoft Reader.

4 In fact, AFAIK I was the first person to even bother reverse-engineering them when I implemented LIT generation for calibre.

Commentary most sage