Thursday, January 3, 2008

Format Obsolescence: Right Here Right Now?

time961 at Slashdot reports that:
In Service Pack 3 for Office 2003, Microsoft disabled support for many older file formats. If you have old Word, Excel, 1-2-3, Quattro, or Corel Draw documents, watch out!

Is this yet another format obsolescence horror story of the kind I discussed in an earlier post? Follow me below the fold for reassurance; this story is less scary than it seems.

The field of digital preservation has been heavily focussed on the problem of format obsolescence, paying little attention to the vast range of other threats to which digital content is vulnerable. I have long argued that the reason is that most people's experience of format obsolescence is heavily skewed; it comes from Microsoft's Office suite. Microsoft's business model depends almost entirely on driving its customers endlessly around the upgrade cycle, extracting more money from their existing customer base each time around the loop. They do this by deliberately introducing gratuitous format obsolescence. In the comments LuckyLuke58 makes my point:
Doubt it's really about security at all; I'm guessing it's probably more about 'nudging' the few people still using old versions of the software to upgrade: Those who currently exchange documents with users on newer versions will find suddenly they won't be able to send documents to anyone anymore without getting complaints that people can't open them. Deliberately making it too cumbersome and complex for most people to ever work around this, i.e. leaving it technically (but not really practically for almost everyone) an option, for now at least gives MS an excuse, while still taking a big step towards getting rid of support for those old formats entirely, which is not all that unreasonable I suppose for formats greater than 10 years old.

LuckyLuke58 gets the basic idea right. New instances of Office entering the installed base are set up to save documents in a format that older versions cannot understand. The only way to maintain compatibility is to use a deliberately awkward sequence of commands and ignore warnings. Every time someone with an older version gets one of these new documents, they get a forceful reminder of why they need to spend the money to get upgraded. This works particularly well in organizations, where the people with the power tend to have their computers upgraded most frequently. Telling your boss that you need an upgrade in order to read the documents he's sending you makes it hard for him to deny the request.

Because almost everyone encounters this kind of deliberate format obsolescence regularly, and because the cure for it (buy a more recent version of Office) is essentially forced upon them, they make two natural assumptions:

  • Format obsolescence happens when the software vendor says it does.

  • Format obsolescence happens frequently and regularly to all formats.

Both of these are wrong. The fact that Microsoft has ended support for old formats does not mean they can no longer be read. It just means that you can't use up-to-date versions of Microsoft's tools to read them. Microsoft's annoucement hasn't magically removed the support for these formats from any preserved binaries of the pre-upgrade tools, and these can be run using emulation. The Open Source tools that support these formats still work (see my post on Format Obsolescence: Scenarios).

Even if Microsoft did have magic powers to tamper with old binaries and source, it is pretty much only Microsoft formats that are subject to rapid gratuitous format obsolescence. A business model dependent on driving the upgrade cycle to extract money from existing customers is something that happens only to monopolists; everyone else needs to attract new customers. A reputation for frequent format obsolescence isn't a good way to do that. In fact, it isn't even a good way to keep existing customers. The formidable resistance Microsoft has encountered in trying to "standardize" OOXML in a way that allows them to continue to use proprietary lock-in and gratuitous format obsolescence to milk their customer base shows that even a monopolist's customers will reach their pain threshold eventually.

At first glance, this announcement from Microsoft appears to support those who think dealing with format obsolescence is the be-all and end-all of digital preservation. But it doesn't. Content preserved in these formats can still be rendered, and converted to more modern formats, using easily available tools. There are at least two ways to do this - open source tools, and emulated environments running preserved Microsoft tools. Its hard to construct a scenario in which either would stop working in the foreseeable future. And what has happened is not typical of formats in general, its typical of Microsoft. It is true that access to older content will be a little less convenient, but that is what Microsoft is trying to achieve.


David. said...

In a further blow to those who think that format obsolescence is scariest prospect for the future of digital content, The Register reports that outraged customers have forced Microsoft to back away from their attempt to render older Word formats inaccessible to users of the Office suite.

David. said...

The trainwreck continues - Microsoft is forced to apologize to Corel for implying that the reason support for their formats was disabled was that they were "insecure". The current story is that the reason was that Microsoft's code to parse the formats is insecure and they don't want to fix it. "A file format isn't insecure -- it's the code that reads the format that's more or less secure. The parsers we use for these older formats aren't as robust as the code we've written more recently, which is part of our decision to disable them by default."

The bottom line from Microsoft is now "... we are not removing your ability to read these files. If you need them, the parsers are still there. All we've changed is the default." The formats aren't very obsolete, after all.

The best comment I've seen so far on this is "Remember when Microsoft was all up in arms when Massachusetts decided to standardize on OpenOffice and open document formats?" wrote reader Brian Hoffman in another comment. "[Massachusetts'] concern was that older documents may no longer be accessible at some future date if they continued to use a proprietary system and format. Looks like they were right and Microsoft should be embarrassed instead of spinning it as a feature."

David. said...

The Dutch Open Source foundation NLnet weighs in with a sensible suggestion, that if Microsoft is going to attempt to render older documents formats inaccessible, it should release the specifications for those formats so that others can make up the deficiency. Tip of the hat to SgtChaireBourne at /.

Chuck said...

So how much of this is a nasty side affect of the whole "own the road" business strategy? The concept of everyone having to pay a penny to a patent or tradesecret holder every time they access a document is simply too seductive for any MBA to pass up it seems. Too many people trying to own the patent on paper, sigh.

David. said...

There is considerable discussion on Groklaw of Microsoft's announcement that they are "making binary Office formats (.doc; .xls; .ppt) available under the Open Specification Promise", and that they are sponsoring an Open Source project to build translators for them to OOXML.

The first isn't actually news, since it has been possible for some to obtain access to these specifications, albeit only on application to and approval by Microsoft.

The second is an attempt to get other people to duplicate work Microsoft has already done for Office 2007 but under very restrictive conditions. Microsoft does not appear to be funding the effort, or providing any information other than the specifications that have been available for some time.

Both are interesting as revealing the pressure Microsoft is feeling as their customers rebel against the effort to extend Microsoft's control of proprietary standards into the ISO standards area. But neither is really likely to affect the future readability of documents in Office formats.