Ken Wood

US Library of Congress Update: Designing Storage Architectures for Digital Collections

Blog Post created by Ken Wood Employee on Oct 29, 2013

Thanks Lauren Klein for asking about the US Library of Congress meeting I was invited to. This meeting is an invitation only event that is called Designing Storage Architectures for Digital Collections. MeAtLoC.jpgThe meeting's description is listed here.

 

Designing Storage Architectures for Digital Collections

September 23-24, 2013
Library of Congress, Washington, DC

The DSA meeting brings together technical and industry experts; LoC IT and subject matter experts; government specialists with an interest in preservation; decision-makers from a wide range of organizations with digital preservation requirements; and recognized authorities and practitioners of digital preservation.

There were a number of sessions which included the LoC's updates and situations, technology updates from vendors, other community updates and situations and finally, the session that I was a part of, Developments in Media (and other tape replacement technologies). Some of the community speakers come from an eclectic collection of industry segments concerned with and huge contributors, to long-term digital data preservation strategies. Besides the LoC which hosted the event, some of these community contributors and attendees included,

    • the National Archives and Records Administration
    • Los Alamos National Labs
    • the Academy of Motion Picture Arts & Sciences
    • Family Search
    • National Endowment for the Humanities
    • and several 3 letter government agencies

 

My session capped off the event with three "lightning" presentations on alternative media (something other than electronic or magnetic recording) for long-term data preservation. Upstart companies Group 47 and Cuneiform Technologies both presented cases for their new technology based on commercializing Optical Tape technologies.

LoCCatalog.jpg


Probably the more famous of these two is Group 47 which is in the process of productizing DOTS (Digital Optical Technology System) technology. And of course, I was advocating optical discs including Blu-ray storage disc and future follow-ons. The three of us then finished off the session with a 40 minute Q&A panel. Overall, a very successful event for me and I'm hoping to get invited again to next year's event which sounds promising. The meeting organizer did want to allocate more time to my presentation next year because she wants me to actually play my 31 year old (next year it would be 32 years old) Pink Floyd "Dark Side of the Moon" audio CD.


Here are a few interesting factoids from my notes from the event.


  • End-to-end data integrity standards needed. "This is an archive, we can't afford to lose anything!" - LoC
  • The write policy for these digital archives are "write once, read 3 more times" for a commit. 1 SHA-1 check and twice for each tape copy
  • 3-5 year migration cycle schedule for tape media and device refresh, partly due to obsolescence.
  • Plans for device migrations in 10 years before device data interfaces obsolescence and End-of-Support.
  • Estimates for SSD to replace HDDs. Investment of almost $1 trillion dollars ($864B) to replace ALL HDDs needed and an investment of over $100 billion to replace just enterprise HDDs. These are worldwide estimates.
  • For their most important data, metadata, Family Search stores microfiche of catalogs and metadata.
  • Capacity growth is a bit of mis-information. Of the capacity purchased, about 30-40% is used for data growth, 60-70% of the capacity purchase is used for data migration. So of a 1PB purchase only about 350TB represents data growth.


This last factoid sticks with me for several reasons. I always say we need to break the migration cycle. Organizations are conditioned to assume migration cost into their data growth planning almost without thinking about it. When I asked the question about this last bullet, it took the audience time to think about what they do in this regard. You could tell there was an "awkwardness" in the air like I had stated that the "king has no cloths" or something. My question was blunt, "Of your new capacity purchase, what percentage is used for new data and what percentage is used for migration of existing data?" It was the attendees that figured out that breaking the migration cycle somehow,  was needed. Mission accomplished.



Outcomes