In a press release Jiang is quoted as saying:
In real data the redundancy can take on more complex forms, such as a text that talks about "raining" may also talk about 'umbrella' or other related things, or the bits in data may satisfy some mathematical equation, but the principle is the same - once we know the bits in data are dependent on each other in some way, we can use that knowledge to correct errors,This work is theoretically interesting but of limited use in practical long-term preservation:
- The threat environment means that future cloud storage systems must be designed to encrypt data at rest (see, for example, Krste Asanović' keynote at FAST14). Effective encryption results in files with no redundancy to exploit. Indeed, most practical encryption systems compress their plaintexts before encryption to remove redundancy.
- The kind of redundancy Wang et al are enhancing is intended to protect against "bit rot". But this is only one of the many threats to stored data. Cloud providers use erasure coding and other forms of redundancy to provide, for example, geographic dispersion to protect against catastrophic loss of a data center.
- The redundancy needed for protection is frequently less than the natural redundancy in the uncompressed file. The major threat to stored data is economic, so compressing files before erasure coding them for storage will typically reduce cost and thus enhance data survivability.