tag:blogger.com,1999:blog-4503292949532760618.post4391983579634794169..comments2024-03-28T07:23:23.408-07:00Comments on DSHR's Blog: Cloud vs. Local Storage CostsDavid.http://www.blogger.com/profile/14498131502038331594noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-4503292949532760618.post-15213162284067988442012-11-06T09:10:15.104-08:002012-11-06T09:10:15.104-08:00I should ahve found this earlier. Back in January ...I should ahve found this earlier. Back in January Amar Kapadia posted a very informative, <a href="http://www.buildcloudstorage.com/2012/01/can-openstack-swift-hit-amazon-s3-like.html" rel="nofollow">detailed analysis of OpenStack Swift vs. S3 costs</a>.David.https://www.blogger.com/profile/14498131502038331594noreply@blogger.comtag:blogger.com,1999:blog-4503292949532760618.post-69387455940910590832012-07-20T14:24:39.965-07:002012-07-20T14:24:39.965-07:00Thank you, Edward.
As regards my use of 100 years...Thank you, Edward.<br /><br />As regards my use of 100 years, this is just a way of discussing storing data "for ever" and how assumptions about the future cost of storage technologies affect the projected cost of doing so. I do not assume that any one technology (or company) will survive for more than a small fraction of that time. <a href="http://blog.dshr.org/2012/02/talk-at-pda2012.html" rel="nofollow">Here</a>, among other places, you can find a discussion of the way my model follows a unit of data as it migrates from technology to technology or, equivalently, from provider to provider.<br /><br />I'm well aware of technologies such as MAID, having co-authored <a href="http://lockss.org/locksswiki/files/ISandT2008.pdf" rel="nofollow">a paper on data lifetimes on idle disks (PDF)</a>. They don't make a significant difference to the results I'm discussing here.<br /><br />I'm also well aware that <a href="http://dx.doi.org/10.1145/1047915.1047917" rel="nofollow">designing systems to resist attack</a> typically adds cost.David.https://www.blogger.com/profile/14498131502038331594noreply@blogger.comtag:blogger.com,1999:blog-4503292949532760618.post-10628695519462861062012-07-20T13:57:30.576-07:002012-07-20T13:57:30.576-07:00100 years is a long time; the only technology that...100 years is a long time; the only technology that I'd count on lasting that long is paper.<br /><br />100 years is also a long time to assume that Amazon will be in business. What if you had pinned your first hopes on Compuserve or AOL?<br /><br />There's a wide range of costs depending on how much and how fast you need access to the data. I've read about an architecture called MAID (massive array of idle disks) where you spin down most of your storage and spin it back up again only when you need it. Other non-RAID architectures would be worth looking at if your access times can be longer.<br /><br />Another unanticipated cost is the cost of securing this data against active attack; it's very different if you want a read-only legacy of boring data vs. a read/write collection of data of value to motivated hacker.Edward Vielmettihttps://www.blogger.com/profile/07421049499752624699noreply@blogger.comtag:blogger.com,1999:blog-4503292949532760618.post-69432544475975611122012-06-28T17:58:09.441-07:002012-06-28T17:58:09.441-07:00Thanks for the comment, Fernando, but I am not ass...Thanks for the comment, Fernando, but I am <i>not</i> assuming SOHO technology here. For the alternative to S3, I am using BackBlaze's build costs for the 4U rackmount storage servers they use in their Petabyte scale data centers. Hands up anyone who has a 4U rackmount at home.<br /><br />And I am assuming that those build costs represent 1/3 of the total cost of ownership. The other 2/3 represents the costs of operating in the San Diego Supercompter Center's Petabyte scale data center, as reported in their paper on SDSC's storage cost history. This proportion (1/3 hardware, 2/3 data center costs) roughly matches the <a href="http://blog.dshr.org/2011/03/how-few-copies.html" rel="nofollow">numbers reported by Vijay Gill for Google's data centers</a>. Does Google spend enough on physical security and safety for your requirements? More to the point, do they spend as much as Amazon?<br /><br />Of course, it is true that if you have much less to store than the 135TB example here, you might well decide to use SOHO technology, say putting say a Drobo at a couple of your friends' houses. And they would be much less secure than in a data center. But they would also be even cheaper. You certainly wouldn't spend twice as much running them as you did buying them. So you could afford a much higher level of replication and still come out ahead of S3. That's the basic LOCKSS concept. The more copies you have the less care you have to take with each copy.David.https://www.blogger.com/profile/14498131502038331594noreply@blogger.comtag:blogger.com,1999:blog-4503292949532760618.post-86891166786830830362012-06-28T17:31:53.540-07:002012-06-28T17:31:53.540-07:00What about physical integrity and safety? I probab...What about physical integrity and safety? I probably don't want to keep my RAID-6 copies in the closet at home and the homes of some geographically distant friends. Theft, flood, fire, earthquake, war, neglect,... Large datacenters like Amazon's have substantial physical security and safety measures. What fraction of S3's costs that is due to those measures?Fernando Pereirahttps://www.blogger.com/profile/05849361902113771573noreply@blogger.comtag:blogger.com,1999:blog-4503292949532760618.post-57954805161110257102012-06-27T16:33:13.873-07:002012-06-27T16:33:13.873-07:00S3 appears to maintain 3 geographically separated ...S3 appears to maintain 3 geographically separated replicas. <a href="https://aws.amazon.com/s3/#protecting" rel="nofollow">They say</a>:<br /><br />"Objects are redundantly stored on multiple devices across multiple facilities in an Amazon S3 Region. To help ensure durability, Amazon S3 PUT and COPY operations synchronously store your data across multiple facilities before returning SUCCESS."<br /><br />and:<br /><br />"Designed to sustain the concurrent loss of data in two facilities."<br /><br />In order for the local storage model to be comparable with S3, the model I used includes 3 geographically separated replicas. I said:<br /><br />"Maintaining three copies in RAID-6 local storage"<br /><br />I should have added "geographically separated" but that doesn't affect the numbers.<br /><br />I don't agree that S3 offers better reliability than the model I used for local storage. It is possible that it would offer better <i>availability</i>, as you suggest, due to better management and infrastructure. But <i>availability</i> rather than <i>reliability</i> just isn't that relevant to digital preservation.David.https://www.blogger.com/profile/14498131502038331594noreply@blogger.comtag:blogger.com,1999:blog-4503292949532760618.post-39390042630831446582012-06-27T16:06:07.392-07:002012-06-27T16:06:07.392-07:00Hmmm. Cost numbers without any associated availabi...Hmmm. Cost numbers without any associated availability/integrity numbers don't seem all that useful. (After all, if I don't care about availability or data loss, why do I need RAID?) Presumably S3 offers better availability due to multiple locations with no shared SPOFs (except for software).Geoffhttps://www.blogger.com/profile/07521391745343783288noreply@blogger.com