February 19, 2008
Follow me on Twitter

Why It's Hard

Joel Spolsky, who used to work on Excel, answers the oft-asked question, "Why Are Microsoft Office File Formats So Complicated?" Why do they need a 394 page pdf to explain the file format — and, even then, leave the spec filled with obscurity:

The Excel file format specification is remarkably obscure about this. It just says that the 1904 record indicates “if the 1904 date system is used.” Ah. A classic piece of useless specification. If you were a developer working with the Excel file format, and you found this in the file format specification, you might be justified in concluding that Microsoft is hiding something.

The crucial point, though, is that user interfaces are terrifically complex, and file formats tend to document every facet of the user interface and every facet of previous versions. If you have a workaround for a bug (as Excel worked around a leap year bug in Lotus 1-2-3), that bug workaround lives forever.

There's a Storyspace preference for Deena's Default Bug, implemented to support one document, by one author, a decade ago. It’s still there. It will be reimplemented forever. Software is like that.

If you're American and you write a date as "1/20", you probably mean "January 20". Which one? This one — not January 20, 1943, even though that was a perfectly nice day (unless you were somewhere near Voronesh or planning the North Africa landings, anyway). But if you write 1/43, you probably mean "January, 1943." It's nice to have software that behaves sensibly with you do this, but each of these cases adds complexity, and adds even more edge cases.