David Merrill

The Economics of Long-term Digital Retention (part 1 of 3)

Blog Post created by David Merrill Employee on Dec 14, 2016

A few years ago, I wrote a paper showing 100-year costs of digital data retention (aka archive), some links to 2014 research and the paper can be found here. The paper showed long-term costs comparing different types of media for long-term retention:

  • Traditional spinning disk
  • Tape
  • Optical media
  • Public Cloud (Amaaon’s glacier)


I am working on an updated version of this paper, and am expanding my scope to include:

  • Flash storage (SSD)
  • Hybrid cloud
  • Private Cloud


I hope to have this next version of the paper done by March 2017. This 3-part blog-series will be a way to document and explain the new findings before the formal version of the paper is released.


The economics of this type (archive) is different than traditional storage economics for a couple of reasons:

  • In this work, we are trying to show very long term cost horizons, up to 100 years.
  • Even though we cannot predict financial factors, vendors and technology of the future, we can look at the past 30-40 years of IT and then create future trends or ‘cost slopes’ that make change how we make plans for today (and perhaps 5-10 years)
  • We are looking at large amounts of digital content, usually measures in hundreds of TB, or many PB and EB.
  • This type of content tends to be file or object-based
  • Performance, retrieval, risk and compliance factors are all very different from operational data that is in the data center today


The models that come out of this type of work are helpful to show ‘cross-over’ points over time. There are trains of thought around the current methods and media that are used for archive today. Perhaps tape is cheaper to start for a decade or two, but then becomes economically unsustainable compared to optical at some future year. Seeing these cross over points, especially with several public cloud offerings, can help IT strategists, planner and archivists determine the best technology option given the cost and performance profiles needed.


In this analysis, we look at several key cost factors that constitute TCO over a very long period of time. The key costs that have to be considered include:

  • Depreciation expense of the hardware infrastructure that has to be replaced every 4-8 years
  • The expense costs of the media that needs to be replaced every 5-15 years
  • Migration and/or remastering costs of moving to the new medium every 5-15 years (or longer)
  • Environmental costs
  • Transport, storage and access costs (think Iron Mtn for tape vaulting)
  • Labor to manage, protect and index data
  • Network access to the data or long-term vaults
  • The introduction of cloud for long-term storage presents some new cost considerations
    • Gets and puts
    • Over-usage tariffs
    • Additional network bandwidth


In addition to the above costs (which tend to be hard costs) there are several important considerations that turn into soft costs that needs to be factored into a long-term calculations

  • Cost of retrieval, waiting for data to be available (Seconds, hours, days)
  • Cost of risk of losing the data, or not being able to read it in the future
  • The cost of buying ahead, or holding reserve for future growth. Most companies need agility to flex up and down with an agile solution


In making TCO comparisons, all long-term retention workloads are not equal…

  • Search/Access frequency
  • Download rates, frequency
  • Data growth rate over time
  • Data sovereignty – are there rules or laws on where the data can be stored
  • Performance – writing & duplicating the data, retrievals
  • Vendor fatigue, how often are we likely to change vendors, technology or media over the next several decades?
  • Compliance risk, opportunity costs


The next blog entry will compare and contrast different media and architecture types using these different total cost factors. Additionally, some actual cost modeling will be shown for a pair or recent case studies for clients with very long-term requirements for very large data stores.