Wednesday, December 29, 2010

Migrating Microsoft Formats

Microsoft formats are routinely cited as examples where prophylactic format migration is required as part of a responsible digital preservation workflow. Over at Groklaw, PJ has a fascinating, long post up using SEC filings and Microsoft internal documents revealed in the Comes case to demonstrate that Microsoft's strategic use of incompatibility goes back at least 20 years and continues to this day. Her current example is the collaboration between Microsoft and Novell around Microsoft's "Open XML". This strategy poses problems that the advocates of format migration as a preservation strategy need to address. For details, follow me below the fold.


The essentials of Microsoft's incompatibility strategy remain as they were set out in a 1991 memo that PJ quotes:

Pursue a product development strategy that prevents IBM from claiming Windows compatibility. Prevent Windows applications from running correctly on OS/2.... Reposition OS/2 as impractical and incompatible in the minds of customers.
There are three essential goals:
  1. Using non-Microsoft software to inter-operate with Microsoft software gets an inferior experience. This prevents customers defecting to cheaper or free alternatives, such as Open Office.
  2. Using older Microsoft software to inter-operate with current Microsoft software gets an inferior experience. This drives customers round the upgrade treadmill.
  3. Using newer Microsoft software to inter-operate with older Microsoft software gets the same experience as with the older software. This removes an impediment to the upgrade treadmill.
In the Office space, it is essential to these goals that there be not be a format into which the current Microsoft Office suite can save a document that can be opened by some other software and reproduce the full experience that the user of the current Microsoft Office Suite would see. If such a format existed, the strategy would fail because existing customers would defect to the alternative and stop buying upgrades.

It is easy to see that Microsoft would perceive the advent of ODF, and the enthusiasm customers expressed for a stable open standard, as an existential threat to their strategy. The response was a full-court press in favor of an alternate "standard" Open XML. This looked like an open XML standard, but it allowed for proprietary extensions. Even though it looked like an open standard, if Microsoft was the only source for an implementation, it wouldn't convince customers that it was really a standard. So Microsoft paid Novell to implement Open XML for Open Office. But, as PJ shows, not to implement inter-operability, because they were not to implement the Microsoft proprietary extensions:
There are five milestones in the agreement, and they all say that some features are unsupported, meaning the extensions in the Microsoft product that aren't in the standard. Here's the fifth milestone:

MILESTONE #5
  • Novell OpenOffice can open Microsoft Office 2007-generated Open XML files without failures; M3 & 4 features supported; unsupported features are lost on open.
  • Novell OpenOffice can open Microsoft Office 2010-generated Open XML files without failures; M3 & 4 features supported; unsupported features are lost on open.
  • Novell OpenOffice can save files containing M5 features, scoped to those features supported in Novell OpenOffice, using the Open XML standard.
  • Novell OpenOffice can save files containing-Novell-specific features using the Open XML standard.
"Unsupported features are lost on open." That's Microsoft's version of compatibility -- their stuff works better than yours.
In other words, Open Office's implementation of Open XML ensures that Open XML does not represent a threat to Microsoft's strategy, because it is not "a format into which the current Microsoft Office suite can save a document that can be opened by some other software and reproduce the full experience that the user of the current Microsoft Office Suite would see".

The problem this poses for advocates of format migration is that the whole idea of format migration as a means of preservation requires "a format into which the current Microsoft Office suite can save a document that can be opened by some other software and reproduce the full experience that the user of the current Microsoft Office Suite would see". That's how format migration escapes the trap of an obsolete, proprietary format, by migrating into some other format that reproduces the user's experience. Either that, or the advocates need to accept Jeff Rothenberg's deprecation of format migration as information-destroying.

Of course, each time Microsoft introduces a new version of the Office suite, an archive could step around the upgrade treadmill and migrate to the now-current Microsoft format (i.e. Open XML with a new set of proprietary extensions). But there is little point in doing so. The new extensions won't be present in the preserved content. And Microsoft, after their experience a few years ago of removing support for old formats, isn't going to remove support for the extensions that are present. Doing so would violate the third of their strategic goals. Thus, so long as Microsoft keeps doing a good job, there is no need for format migration. If Microsoft stops doing a good job, there isn't going to be a format to migrate to. Microsoft will have seen to that.

If Microsoft does stop doing a good job, one might hope for some public-spirited reverse-engineer to implement and open-source support for the proprietary extensions. If they did, again there would be no need for format migration. But, even if Microsoft released the specifications, a successful implementation is unlikely. Only emulation of the PC hardware and preservation of the bits of the operating system and the Office suite can reproduce the full experience of content in Open XML.

So we see there is no case in which migrating Microsoft formats is likely to be needed. And if it were needed, it is unlikely to succeed in reproducing the user's experience of the original content. In my view, the advocates of format migration as a preservation strategy need to explain why, for these canonical "widely used formats", their approach is (a) needed and (b) feasible.

3 comments:

  1. Are there really people who advocate "reproducing the user's experience of the original content" as a goal, or is that a straw man? Or wait, is that in fact the goal you are suggesting is the proper one?

    I don't know, this isn't my area, I'm just a casual observer.

    But it seems to me you can take that to extremes of fidelity of reproduction of "experience" that make it impossible for it ever to be accomplished -- and also entirely unneccesary and serving nobody's needs, indeed.

    [If you want to see this old document originally written on a C64, we'll ONLY let you see it on a C64. In fact, migrating formats is intended (regardless of whether its' a good strategy) to AVOID the need for just that, not to guarantee it, right? It's your alternate strategy of always keeping machines, OSs, and application softwares around to read the original files that seems like it's intending to allow a "reproduction of the original user's experience", right?

    Or to take matters to a ridiculous extreme, if you want to read Dickens, I'm sorry we'll only let you read it one chapter per week, and only incorporated in this magazine it was serialized in, and we will provide a horse-drawn carriage to transport you to our offices, that we maintain in order to ensure the reproducibility of the original user's experience.]

    But the real goal is probably more like "preserve access to the content, as close to how it was originally intended by the author as possible." (That is, focusing on fidelity of the content, not of the historical "user's experience").

    It still may very well be that trying to migrate preserved documents is not a good way to do that (too likely to fail, too expensive, there are other cheaper ways to do it, etc.) But arguing about whether "reproducing the user's experience of the original content" is possible seems beside the point to me.

    ReplyDelete
  2. In general I agree with bibwild, making the best be the enemy of the good is far too prevalent in digital preservation.

    But in this particular case, I don't. Lets assume that there is a format into which current Microsoft software can save documents which some other software can open without losing any significant aspect of "preserv[ing] access to the content". For the sake of argument, lets assume that this format is OpenXML without the Microsoft-specific extensions. In other words, the way Novell's Open Office would open the document. Then users defecting to Open Office face no significant degradation of their user experience, and Microsoft's strategy has failed.

    Microsoft has to ensure that the parts of the user's experience encoded in the Microsoft-specific extensions, the parts that are lost in format migration, are significant enough to cause degradation sufficient to ensure that the vast majority of users continue to pay the upgrade costs. Thus it is likely that these parts of the user's experience are things we want to preserve even though in general we are not committed to preserving total fidelity to the original reader's experience.

    ReplyDelete
  3. Hi David,

    I'd just like to say how much I appreciate you're explanation of this problem. It gives great examples of some of the issues I have been raising for a while. I'm sure I will be using this post in future discussions.

    Thanks,

    Euan Cochrane

    ReplyDelete