Skip navigation
1 2 Previous Next

IT Economist

20 posts

In my last post, I setup the problem/opportunity analysis to compare very long-term costs (up to 100 years) of different media types that can be candidates for long-term archive.  We will continue with that comparison.


In my original work on this topic in 2014, a table was built to show qualitative differences of the total costs of different media types, over time. In my upcoming version of this paper will have this same table, and here is a sneak peek at the results:



Relative Costs for 100 years





Public Cloud

Priv Cloud













HW Maint.





Very low

Software Maint



Very low


Very low






Very low




Very Low


Very low






Med High

CSP Fatigue







Very Low

Very low

Very low

Med High






Med High


Usage limit
























Outage Risk






Risk of Loss







This table approach above lends itself to an alignment graph (spider chart) to contrast the qualitative and economic benefits of each technology option. For this graphic, and for the sake of clarity, I am only showing the qualitative comparison of Tape, Optical storage and public cloud.




On the graph above, higher numbers are better. Now we can see volumetric views of the goodness of 3 different media/architecture types against the backdrop of 10 comparison areas. The larger the surface area in the graph, the better the media/architecture meets the overall requirements

  • Public cloud is best in the upper right areas (low subscription rate, maintenance, labor and environment built into the subscription price). But on-boarding and over-use fees, with high network costs make it shallow on lower left cost areas
  • Tape is better for lower left area
  • Optical has a better, balanced TCO footprint.

Not all of these above costs are equal in weight over time. For example with Krieder’s law in effect, the price of the media will essential be near-zero over time, whereas usage tariff or migration costs will constitute the bulk of long-term costs. We can now take the spider diagram, overlay individual client requirements on top to get a basic fit-check.


With a qualitative comparison and biasing complete with a customer, we can now apply IT economics. These 10 areas can now be computed, weighted, filtered and applied to a particular vertical market or customer needs basis. My 3rd and final blog entry on this subject will present some of the quantitative analysis comparing different media types and architectures to a long-term time horizon. 

A few years ago, I wrote a paper showing 100-year costs of digital data retention (aka archive), some links to 2014 research and the paper can be found here. The paper showed long-term costs comparing different types of media for long-term retention:

  • Traditional spinning disk
  • Tape
  • Optical media
  • Public Cloud (Amaaon’s glacier)


I am working on an updated version of this paper, and am expanding my scope to include:

  • Flash storage (SSD)
  • Hybrid cloud
  • Private Cloud


I hope to have this next version of the paper done by March 2017. This 3-part blog-series will be a way to document and explain the new findings before the formal version of the paper is released.


The economics of this type (archive) is different than traditional storage economics for a couple of reasons:

  • In this work, we are trying to show very long term cost horizons, up to 100 years.
  • Even though we cannot predict financial factors, vendors and technology of the future, we can look at the past 30-40 years of IT and then create future trends or ‘cost slopes’ that make change how we make plans for today (and perhaps 5-10 years)
  • We are looking at large amounts of digital content, usually measures in hundreds of TB, or many PB and EB.
  • This type of content tends to be file or object-based
  • Performance, retrieval, risk and compliance factors are all very different from operational data that is in the data center today


The models that come out of this type of work are helpful to show ‘cross-over’ points over time. There are trains of thought around the current methods and media that are used for archive today. Perhaps tape is cheaper to start for a decade or two, but then becomes economically unsustainable compared to optical at some future year. Seeing these cross over points, especially with several public cloud offerings, can help IT strategists, planner and archivists determine the best technology option given the cost and performance profiles needed.


In this analysis, we look at several key cost factors that constitute TCO over a very long period of time. The key costs that have to be considered include:

  • Depreciation expense of the hardware infrastructure that has to be replaced every 4-8 years
  • The expense costs of the media that needs to be replaced every 5-15 years
  • Migration and/or remastering costs of moving to the new medium every 5-15 years (or longer)
  • Environmental costs
  • Transport, storage and access costs (think Iron Mtn for tape vaulting)
  • Labor to manage, protect and index data
  • Network access to the data or long-term vaults
  • The introduction of cloud for long-term storage presents some new cost considerations
    • Gets and puts
    • Over-usage tariffs
    • Additional network bandwidth


In addition to the above costs (which tend to be hard costs) there are several important considerations that turn into soft costs that needs to be factored into a long-term calculations

  • Cost of retrieval, waiting for data to be available (Seconds, hours, days)
  • Cost of risk of losing the data, or not being able to read it in the future
  • The cost of buying ahead, or holding reserve for future growth. Most companies need agility to flex up and down with an agile solution


In making TCO comparisons, all long-term retention workloads are not equal…

  • Search/Access frequency
  • Download rates, frequency
  • Data growth rate over time
  • Data sovereignty – are there rules or laws on where the data can be stored
  • Performance – writing & duplicating the data, retrievals
  • Vendor fatigue, how often are we likely to change vendors, technology or media over the next several decades?
  • Compliance risk, opportunity costs


The next blog entry will compare and contrast different media and architecture types using these different total cost factors. Additionally, some actual cost modeling will be shown for a pair or recent case studies for clients with very long-term requirements for very large data stores.

An 11-Step Program

Originally posted by David Merrill on Sep 28, 2012


I know there is a 12-step program for addiction recovery, but I am not writing about this. Instead, I am writing about a structured, methodical approach to reduce the unit cost of IT. These methods and steps have been proven over the last 10-11 years with storage economics, but I am finding applicability to VM cost reduction, big data, unified computing, etc.  My premise is that there are phases, that have predecessory relationships, and these phases or steps can reduce costs.


Over the past 6-8 weeks, I have personally conducted 20 economic workshops for clients in Asia. Some of these were with recurring-assessment customers, so I can trend their results and correlate their success to other customers that have had great success in reducing costs. I built a composite example of a total cost reduction roadmap from these recent customers, and summarized roughly 11 steps within 4 phases that have produced unit cost reduction of IT. Again, this composite is a storage example, but many of my VM TCO examples are close to this 11-step approach.


In this example we show TCO per TB/year, but this could also be unit costs of a VM, VDI, transaction, etc. Notice the categories of costs. We have 34 costs, but there tends to be popular costs that most organizations are interested in measuring and reducing. These popular costs are depreciation, maintenance, labor, environmental costs, outage risks and migration. In achieving these unit costs in the data center, there are (and have been) 4 phases that organizations tend to pass through along this journey.

Phase 1 – Doing the Basics

  • Consolidation and centralization of data center assets
  • Implement disaster recovery capabilities
  • Tech updates to improve performance and replace aging and expensive assets
  • Moderate unit cost reduction in this phase (10%)

Phase 2 – The Golden Age of Virtualization

  • Virtualization applies to servers, desktops, storage (LUNs, arrays, file systems) and network
  • Now the ability to over-provision or over-subscribe capacity
  • Tiering and policy-based asset re-purposing (v-motion, policy-based management)
  • Improved management with new layer of abstraction
  • De-duplication and compression
  • Significant unit cost reductions (30-40%)

Phase 3 – Behavior Modification

  • Basics of best practices, operational changes
  • Chargeback to pay for what is used
  • Service catalogs and referential architecture to limit variability
  • Rewards and punishment systems to drive economic behavior
  • Self-provisioning
  • Moderate unit cost impact (20%) with these actions, but some of these activities are political and can impact organizations

Phase 4 – Challenging and Changing the Norms of Ownership and Locale

  • Utility computing
  • Scale up and scale down
  • Capacity on demand
  • Private, public and hybrid cloud
  • Remote services and sourcing
  • Cost reductions will depend on many parameters, and some reductions are masked by cost shifting to cloud providers
  • Unit cost impact in the 10-30% range

In the short-term, there are many technical and operational investments to have dramatic impact on storage, VM and data center economics. Over time, technical and operational investments will have to give way to behavioral, consumption and remote options to reduce data center economics. My next few blogs will dive deeper into these 4 phases and the 11-step program to reduce IT costs.

My previous 2 blogs have setup the scenarios that require a new, lower total cost per node when considering IoT architectures. Now for some proof points.


One of my observations around driving down the cost of a data lake node hinges on the trend to build VM (or nodes) with large storage pools. 6-10 TB storage pools are not uncommon for an IoT data lake node, and if traditional VM designs are used (RAID, FC, higher-performance servers, data protection, etc) the cost for these data lakes would be unsustainable. And we would see the IoT initiative crushed by the total cost of the platforms. So lets look at some of the cross over points related to storage per node.



In the above graphic, we are comparing the cost of acquisition (price only) of Vblock, HP and HDS converged solutions to a net-new IoT node specially built for data lakes with all the lower-cost features mentioned in the previous blog. There are several cross-over points with the orange line (Hitachi Scale-out Platform or HSP) based on different vendor solutions for this client, but even with different and improving converged solutions, the new IoT node is economically better for nodes with more than 2 TB of storage. Some of the CI solutions lost their economic advantage at only 500 GB per node. The take-away from this one example is that if the data lake nodes only require 500-2,000 TB per node, then the acquisition price may be OK with a traditional CI solution. Now we know that price is only a fraction of the total cost (about 25%), so even this graphic does not tell the entire story of low TCO requirements for IoT nodes.


IoT nodes, and specifically data lake nodes will have high storage content, and without the right design and economics might break the bank for the IoT infrastructure build-out.


Another way of looking for the sweet spot for low-cost IoT nodes is with the total cost view. One recent customer experience allowed us to compare the total cost of traditional VMs that for them, averaged at 115-135 GBP per VM/month. We set a target to deliver IoT nodes in the 20-30 GBP range.



Again the sweet spot for these new types of IoT nodes was in the 1-4 TB per vCore, at a TCO that was substantially lower than current architectures could provide


When making plans for net-new IoT projects, platforms and technology, be sure to spend some time determine best price and best total cost for these nodes. The rate of growth, and the size of data lakes usually require a fundamentally lower unit cost solution than what may be used for traditional virtual machine workloads in the data center.

In my last blog, I outlined the need to build, deploy and manage IoT Data Lake nodes that are 80-90% lower cost (not lower price) than traditional IT server and storage infrastructure. I will now present how this can be done.


First, we need to understand that processing and storage requirements in the data lake are very different from what we typically see in the data center

  • Transient data, state-less
  • Short shelf-life of the data, from a few minutes to a few days.
  • High volume, small objects, log files or M2M elements
  • Low processing throughput is usually OK
  • Virtual machines nodes may only have 1-2 vCPU, but can have very large store pools (1-20TB each has been observed)
  • Commercial hypervisors and operating systems are not necessary


Now, how to build very low cost nodes and storage for the IoT Data Lake --

1. New systems and solutions are required of your IoT vendors

2. New operational and behavioral considerations are needed from the IT department


This table below summarizes various options, and what is required of the vendors and IT departments:


Moving to COTS hardware, open source software, replace RAID, don't protect the data..... all radical ideas and concepts in order to meet a new total unit cost point for IoT nodes.


My next and final blog on  IoT Data Lake economics will review some customer case study data, and cross-over points of capacity and cost that drives us to look for new ways to reduce total unit cost of IoT platforms.

I just got off an interesting IDC Webcast call on "Worldwide Internet of Things (IoT) 2017 Predictions". Some really good material was shown, and excellent alignment in what I am seeing with IT economics from my customers planning for on-premise IoT solutions. IDC presented material on "shifting economics" and a new "platform economy" necessary to drive IoT effectiveness. I will not re-hash the IDC material (you can watch on your own), but will share some of my own similar and supportive findings on this topic.


Point 1: IoT as we all know is relatively new, the demands on capacity, processing locale (central or edge) and rate of growth is still being forecast. But we do know that data rates are vastly different than traditional IT. My first IoT Econ observation is that the platform and infrastructure built specific for IoT has to be radically different from an economic perspective. Key point - data lake systems (nodes, storage) need to be built with 80-90% of the traditional costs taken out. In my opinion (working with clients building 1st generation IoT systems), if we do not find platforms architectures that have  a significantly lower cost footprint, the rate of IoT growth may be unsustainable for most IT budgets.


Point 2: IoT platforms placed in the data center (or the cloud) are part of a larger IoT or M2M eco-system. This simple chart below shows that eco-system, with the sensors and machine-generating sources on the left, through a network transport, through data management and then to the applications and people that can use the results for business value.


In my work, the call for the 80-90% lower cost platforms are in the 'data management' section of the picture above. And more specifically, in the data lake infrastructure. This picture below is an expanded view of the data management section, with the data lake isolated from the regular DB/processing systems.




Point 3: The data lake infrastructure is probably the most volatile in terms of capacity, rate of growth, scale up and out within the data management IT infrastructure. If we simply take existing storage, hosts, VM that exist elsewhere in the data center and make them the "data lake", the total costs will be unsustainable in a very short time. If we push these traditional systems and architecture to meet the explosive growth, the unsustainable costs may have a negative impact on the IoT system as a whole, and negative ROI impact on IoT initiative in general. That is why the nodes, storage and processing architectures of the data lake, with the data management systems, needs to be a fraction of the total cost of other hosts and storage.


In my next blog, I will present several options that are generally available today to design, build and deploy data lake IoT platforms that are economically, radically different from what we tend to build in the data center today. Determining and building to these new cost targets are imperative for IoT projects to deliver positive ROI over time.

So I have re-hydrated some older posts on cloud economics from the ancient past... now some observations on what we see today.


First, people often ask me if, when, why, and under what conditions clouds provide better costs than in-house, on-premise operations. The simple answer is "it depends". There are many points of infection based on workload, security, growth rates, access patterns etc. But there seems to be a few dimensions to consider before we start building total cost models.


  • The Performance Dimension - If the workload required sub-second performance, then you may need to re-think some cloud design assumptions. Network latency is a function of the spend for high-speed connections to the cloud provider, which increase costs to meet performance
  • The Data Security Dimension - how to protect and secure against intrusion, hacks, malware with a CSP. What are the costs and risks?
  • Data Sovereignty - Some clients have legal requirements to keep data and processing within fixed geographic boundaries, and accommodating this with a CSP can add risk and or costs
  • Access Patterns - this applies to cloud storage options. If the recovery time is important to business resumption, then understanding and having RPT and RTO assurances may add to the cloud costs
  • Rate of Change / Elasticity - a powerful aspect of cloud computing is the elastic nature of the infrastructure. But growth is not free, so the rate of change, growth up and down may have additional cost triggers that needs to be understood
  • Time Horizon - long-term or short-term nature of data processing and storage can often impact the PV benefits. If data and systems will indefinitely be cloud resident, then long-term views of cloud migration or cloud-transitions can be factored into a cost model.


I have worked with dozens of clients to help them determine, measure and calculate some of these above dimensions, and then put the results into a financial business case for the present-day. There are many options with cloud (private, hybrid and public) as well as optimizing the on-premise infrastructure to meet the current business demands. Graphs like the following help to cut through all the options and variations to give decision makers a better economic perspective of unit costs.




I still see many customers retracting from cloud moves they made 3-5 years ago due to costs. People tend to confuse a simple subscription rate with the total cost, and over time with all the get/put, migrations, uploads other cost triggers are factored in (including the network on-ramp to a CSP), then they can see the true cost of an old cloud decision. Some that have de-clouded will likely move back to a different cloud architecture, given some better optics into the total costs. Many more people are evolving to hybrids in order to provide the best cost balance and to dictate the cost inflection points on their own terms.


An excellent (but somewhat aging) book on this subject is Cloudonomics by Joe Weinman. The book does not give details about total cost benefits, but more of the business impact/benefits

Cloud Storage Economics – Part 2 - Original Posts November and December 2010


My previous entry introduced some of the cost and architecture parameters that I have been investigating relative to cloud storage architectures. I have shown that looking at cost of acquisition only, it is easy to see that DAS is the winner. Both TCO and TCDO measurements counts only the CAPEX costs associated with storage ownership. But price is only a fraction of costs. Over the life of storage, the OPEX costs can reach 3-5X the purchase price, so it is worthwhile to tool at TCO for Cloud Storage Architectures and determine if there is a cross-over point between DAS and Enterprise arrays.


First, lets take a look at how I break down the types of money between these 4 cost modeling methods:



In the previous blog models we saw that there was not a cross-over point or cost parity between DAS and Enterprise when we look only at total raw capacity. There was a cross over point for TCDO at 900 units, when we look at the costs to storage data (not just present capacity). When we add Hard and Soft costs to the TCDO view, we see that enterprise architectures can become economically better at a certain point of scale, total size, data types, data value, transaction workload, and processing value.


First, the TCO plus (traditionally) hard costs of CAPEX. We can see a cross-over point at around 500 units.



Next, when we consider adding soft costs to the mix, the cross-over point is around 100 units.



Don’t forget that both the X and Y axis are logarithmic. The differences in real dollars is much more significant than what is presented in the graph.




Cloud Storage Economics – Part 3

  1. My previous 2 blogs have shown how price ¹ cost applies to standard SAN and NAS storage architectures, but also to cloud storage architectures. What we have concluded:


  • If you look only at total cost of acquisition (TCA) on a raw TB basis, DAS will always win in your economics models
  • If we expand TCA to and only count primary data store, usable or written-to space, we start to see TCO parity at roughly 1,000 units (nodes, TB, etc)
  • When we apply hard costs such as power and labor to the models, the cross over point is around 500 units
  • And finally, if we consider other soft cost – benefits that are qualitative but not necessarily showing up in a budget report (like carbon footprint, performance), then the cross over point occurs in a few hundred units.

So therefore what?

    1. Don’t be seduced by price alone, even with small to medium cloud installations
    2. Visit with your storage vendor and solution architects to understand the node and storage scale-out, and where barriers exist with scale, availability and manageability.
    3. Understand all the costs that are important to your IT department; this has to be more than just capital costs. Apply all these costs to the cloud design at various scale points. Model the hard costs first. Secondary models that show other problem areas should be considered as well.
    4. Usually there are several cost centers that bear the cost burden for different costs. Don’t be myopic in your cost analysis, look at the total costs for the company and not just your own cost center impact.
    5. Cross over points do exist for enterprise-class and modular storage architectures
      • At scale
      • At performance levels
      • For availability
      • For total environmental costs and carbon emissions


There are economic cross over points that can be defined and modeled for your own IT installation. Not everyone will have the same cost co-efficients, but models and predictions (like the following) can be developed to calculate cross-over points for your cloud initiatives.

Economic Advantages of Pay-as-You-Grow or Utility Storage Services

by David Merrill on Feb 28, 2013


I am working in Australia for several weeks, and find that many sourcing companies (including HDS) have been in the Storage as a Service business for several years. Most companies are aware of these offerings and general acceptance seems to be higher here than in other parts of the world. Part of that may be that these national resources are here in-country, and a threat of data or systems moving off-continent seem to be less likely. The distinctions of utility services compared with traditional outsourcing are mostly well understood. Recently I  met with  a few customers who still have a bad recollection of old-fashioned outsourcing and are skeptical that these new consumption methods are really a disguise for bulky, inflexible outsourcing deals. They also do not see how these options can reduce real costs.


In this blog, I will outline the theory of storage cost savings with a utility (scale up and down) pay-as-you-go storage service. Let’s just call this “storage utility” for now. And for this blog let’s focus on the CAPEX impact of savings/differences.

I will start by describing an overly simplistic, multi-year storage growth model. First, let’s look at the written-to data requirements of a company.


In the above graph, we see several points of interest in the demand curve:

-       Point A is a steady-state growth with new projects and new infrastructure.

-       Point B represents a new project (perhaps an analytics event) where a lot of new data is introduced (machine-generated data) to be analyzed. This might be data that can reside in a lower-tier of storage, but will be on-line for several months.

-       At point C, the burst mode data goes away. Perhaps it is deleted; perhaps it is put back to tape or an archive. But the total capacity demand for written-to data drops.

-       At point D, there is a merger or acquisition, and the storage/data demand grows rapidly for a sustained period of time.

Next, let’s look at a traditional purchase model that would be required to meet the above demand.


The top line represents a usable capacity rate needed to support the written-to levels in the first graph. In this chart let’s also assume:

-       Thin volumes have limited adaptation in the environment.

-       A 5-year depreciation for assets.

-       Once assets are purchased, they stay around until the end of the 5-year depreciation term

-       There is a lag between demand and delivery. This is due to the time it takes to scope, engineer, bid, purchase and install the assets.

-       Engineering with reserve capacity (20%) is common for the storage management team.

-       Utilization of data (written to) compared to allocated is an industry average 40%. Therefore, the white space or wasted capacity of what is allocated has to be added to the reserve capacity defined by the storage team.

As you can see, overall utilization is very poor. The spikes at the end of event B create pools of unutilized storage. As new projects come online, they want to have their own resources and not a hand-me down disk. Utilization rates that start at 30% of this model can easily drop to 15% in a short period of time. And finally, the written-to-raw ratio hovers around 1:6 (which would be very, very good).

Now let’s look at a storage utility approach to the same demand scenario. In this service:

-       Only thin provisioned volumes are delivered to the customer. In this example I have a conservative rate of 110% average oversubscription.

-       Capacity can scale up and down.

-       The lag between requirement and delivery is hours or days, not months.

-       There is no need for reserve capacity. The service provider keeps all the reserve so that the customer pays for only what they need.

-       White space within the allocated volumes may still exist, but over-provisioning will reduce most of this waste.


As you can see, the differences are tremendous. Not only is the total storage footprint different…

-       The written-to-raw ratio turns from 1:6 to around 3:5.

-       Very fast mean time to deliver provides positive impact to the projects.

-       Floor space, power and cooling costs are reduced by 35-50%.

-       With less equipment on the floor

  • Management costs are held in check
  • HW maintenance rates (even as part of the utility rate) are reduced

-       Agility in acquiring and de-commissioning IT assets can bring better business value, just-in-time OPEX spend in place of long term CAPEX commitments.

If you subtract the capacity of the Storage Utility line (green) from the BAU line (brown), you get a sense of the different in total capacity at a point in time that would be needed to meet the business needs of data storage.

Moving from a CAPEX spend to this OPEX storage utility may also present some internal finance and accounting challenges, which we can discuss in the next blog. But for the present view, reducing infrastructure, having the agility to consume what you need when needed, and having a variable rate cost align to business needs are some of the key benefits to this type of it service delivery. Other aspects, benefits and detriments will be covered in my next few blogs.


Cloud Economics from the IaaS Perspective

by David Merrill  on Jul 22, 2013


I came across an interesting article on how IaaS cloud-provider-economics work. While it is a simple article with basic economic concepts, it sheds some light on how commoditization and differentiation in the market will have to change in the near future.


A couple of observations from reading this article depending on your points of view:

From a Cloud Provider:

  • There are many assumptions about the underlying storage and service architecture that providers are using to build and deploy cloud services. The article implies that all hardware and architectures are the same
  • HDS provides some highly-differentiating and economically-superior storage solutions to many IaaS service providers (Telco, Global SI, traditional cloud providers) that present a virtualized architecture that can scale-up and out with the growth requirements
  • Some of the tactics to “race to the bottom” in price tend to leave a sour taste in the mouth of consumers, as so many features and functions are additive and get shifted to the variable rate cost area

From a Cloud Consumer:

  • This article was written from the perspective of the IaaS company economics, and how they are having to respond to commodization and competitive offerings in their business. It was not created to help the consumer side of cloud economics find, evaluate and select the right cloud options that actually can reduce your costs.
  • As I have posted earlier, some cloud offerings and decisions  may actually increase costs, so you need to understand all the options and pricing rates from the IaaS (as outlined in this article) and add in other fixed and variable costs that will make-up your new total cost of ownership
  • What I liked about the article was some insight into pricing differences of various cloud vendors. These vendors are not non-profit organizations, so they have to be creative in the engineering and marketing of their solutions. This kind of transparency in the article is refreshing, but also insightful for those that want to know how the price options really work with IaaA vendors
  • So beyond the economics of vendors as outlined in this article, you the consumer has to be very aware of several factors where examining all costs (some are hidden) when comparing and contrasting cloud offerings to a DIY approach. For example:
    • Network transmission between your site and the provider
    • Cost of change, adds, or deleted
    • Transformation cost, or re-hosting, moving to a different provider (even though it may be a future cost)
    • Cost of latency, performance
    • Risk, in terms of off-site premise, protection, country/legal/compliance areas

It will be fun to see how different providers adjust to new/global competition in this area, but we cannot ignore the consumer impact on total cost requirements and capabilities that need to be assembled to meet the local user requirements

Intermediate Steps to Cloud, and Still Reduce Costs

by David Merrill on Aug 30, 2013


In my previous blog I discussed some of the investments and steps people can take to be cloud-ready, or cloud-enabled, without necessarily moving everything to an off-site or consumption based delivery model. There are key ingredients that can help to get cloud-ready. And by cloud-ready I mean the same technology and processes that cloud providers use to deliver superior price and cost models for their customers. Some of these key ingredients include:

  • Virtualized storage
  • Virtualized servers and networks
  • Unified or orchestrated software to manage virtual systems
  • Billing and metering
  • Self-service portals for provisioning, change requests
  • Very strong link from service catalogs (menus items) to the service delivery, SLA, OLA and eventual chargeback

From the point of view of reducing unit costs, these steps can be done internally or organically, within your current organization, and within the current capitalization processes. At HDS we have demonstrated cost reduction with enterprise-class storage virtualization for many years. We have thousands of customers with many stories of improvement in utilization, tiered distribution, and faster migrations that all add up to a lower unit cost.

Extending these options to a pay-as-you-go model can extend savings even further. And if your situation (security, compliance, latency) allows, off-site location of IT resources can move the unit cost needle down even more. This graphic shows some of the OPEX, CAPEX, and other cost saving results that were achieved from a recent analysis I did for a large European client.


The TOP BAR was business as usual as measured in cost per SAPS. The bottom bar was moving into a private cloud offering with a pay-as-you-consume OPEX model. The customer was interested to know what they could achieve on their own (the middle bar) by implementing these advanced architecture elements, but without going to a vendor-assisted consumption/private cloud model. There was certainly space in the middle for unit cost reduction, but in the end they decided to go to the private cloud due to superior unit cost reduction.

Private, Hybrid and Public storage cloud offerings are becoming very popular these days. We see more interest in the pay-as-you-grow OPEX model for storage. Even if you are not completely ready for this extended reach, you need to understand that you too can use the same ingredients as cloud providers to produce superior unit cost reductions, right now in your current storage environment.

Cloud Economics - Redux

Posted by David Merrill Employee Oct 14, 2016

A few years ago, I posted a series of blogs (on an older HDS blog site) on the economics of cloud computing. Some of these posts are 2-3 years old, but hold some historical significance in defining and understanding how cloud costs behave over time. I will be rehydrating a few of these older posts, and then with that foundation set be able to expand on current trends and costs associated with storage and compute PaaS in the cloud.




Cloud Storage Economics  - Original Posting June 2014



Hybrid Cloud Economics


Over the last 5-6 years, I have done extensive modeling and writing about the economics of storage (and VM) in the cloud. Like most vendors, we have worked to find defensive positions against public cloud offerings. These public offerings offer an extremely low consumption price, and that has been attractive to many, despite the variety of technical and business concerns with public clouds. Most of my own economic work has been to define total costs of cloud options, as compared to hosting infrastructure in your own data center. Not surprisingly, there is a always a cross-over point for where the sweet spot begins or ends for any technology.


Over the years, I have built several public and HDS-use models to compare cloud options for:

  • Tier 4 very low cost storage
  • Archive and backup storage
  • Very long term
  • Cross-over points for private and public cloud architectures
  • And a few others that I have probably forgotten…


In the early days of cloud adoption, there was a significant shift in costs by moving to the cloud, with questionable results relative to if hard costs really went down. Cloud computing introduced net-new costs that have to be considered in TCO calculations (on-boarding, over-use penalty, latency risk, etc.). I have used economic methods and simulations to show where the cross-over point exists for a given technology. Some of these cloud economic cross-over points can be  measured in terms of:

  • Time
  • Growth rate
  • Access rates, frequency
  • Overall capacity
  • Elasticity
  • Risk

The graph below is a simulation to show total cumulative costs with owning an object store solution (HDS HCP) vs. consuming object storage capacity in a cloud offering. You can see the cloud shows economic superiority after 4 years, and this is primarily due to the lack of migration or remastering needed from the cloud offering.



Given cloud shortcoming, there has always existed situations where storage in the cloud made sense from an operational and economic perspective. There are also many situations where it does not make sense.


Recently, the discussion around cloud and DIY has been less adversarial and more cooperative. More customers want the best of both worlds in having fast access to some data, with low cost and elasticity for other kinds of data. The management needs to be unified, using open standards and avoiding vendor lock-in. Simply put, the time is right for highly integrated hybrid clouds. My next few blogs will outline the economic efficiencies of hybrids, and how the economics change with these new options.





In my last few posts, I have shown some trends and costs areas of virtual machines (both in a converged, hyper-converged or DIY mode). One overlooked area of server cost optimization is where we can virtualize physical servers for applications that cannot run on a commercial hypervisor. We always see customers that have good progress with reducing costs with hypervisors, but don't tackle the costs of the existing non-virtual Wintel hosts. My own internal observation is that at least 70-80% of Wintel or Linux workloads are virtualized, but organizations may be unaware of cost reduction efforts related to the remaining physical servers.


I was recently involved with a customer project that had 950 VM running on 100 hosts; we are working with them to reduce the unit cost per VM from $477/mo to $95/month. There are lots of opportunity to reduce costs associated with these 950 VM as we introduce advanced management, storage systems, higher-density servers etc.


After the first round of VM cost optimization, the customer asked what we could do to reduce the costs of the non-virtual servers.  This client has another 750 non-virtual Wintel (and some Linux) servers -  the full operational costs of these systems topped $10M/year. Using similar cost estimating techniques as the VM TCO effort, we were able to show that by moving these physical servers to LPAR,  HDS could save them $7.2M /year in labor, power, floor space, maintenance and depreciation. These 750 hosts could be re-packaged into less that 210 blade servers.


Multi-tenancy with Logical Partitions is a feature of Hitachi's unified Compute Platform (UCP) that can accommodate hypervisor workloads and logical partitions on the same blade server and SAN storage architecture. This is the power of server & workload multi-tenancy buy leveraging the same architecture for physical server consolidation along with hypervisors.

  • Provides better asset utilization, there are less stranded assets (CPU, memory)
  • Single pane of management, provisioning that results in better management /labor cost for different types of workloads
  • Unified management also produces faster provisioning times
  • With fewer dedicated or stranded systems thereby reduces HW maintenance, purchase cost, floor space, power etc.


Some Interesting Links to LPAR capabilities


LPAR is a physical virtualization technology created by HDS, which allows customers to provide physical virtualization, enabling multiple operating system environments to run on a single physical blade, without a hypervisor.


LPAR, Multi-tenancy Case studies…


When the IT department is looking for the next area of low-hanging-cost-reduction-opportunties, you might want to consider what can be done with a server virtualization effort for the remaining hosts and applications that cannot yet move to a hypervisor.

In my previous blog, I showed some common cost ratios (total cost) on an average VM basis. It should not be a surprise, though, that there is no such thing as a standard or common VM, so the costs cannot be too generalized either. Getting to a unit cost by size is important before any VM cost remediate plan gets started.


When we do a VM TCO baseline exercise, we need to know some basics around total # of VM, total # of hosts, age of the assets, total storage etc. Doing a macro-level VM TCO measurement is pretty straight-forward to calculate. The trick comes into understanding and measure the unit cost by VM size. To this end, we ask customers to run a VM script so that we can get some basic information about the size and quantities of all VMs in the estate. This chart is one of the summaries that comes from that script (and some excel pivot tables).



What we learn from this size and quantity distribution:

  • How many of the VM are small, medium, large etc.
  • In this work, we assigned VM size by number of vCPU per VM (1,2,4,8, 16 etc.) It is also practical to define VM size by memory or even storage (less likely)
  • Then by size, we also measure the GB of storage of for each VM in that size. In the above graph, you can see where large and x-large VM have very large storage requirements


To calculate the total cost of the VM estate, we have the client choose from 24 possible options (outlined here). With the total estate costs understood, we can now assign weighted costs per VM. This weighting is usually a combined function of vCPU, Memory and storage per VM. The results for the above example look like the following:



Given the sample chart above, we can now understand the cost structures, and eventual action plans, to reduce VM total costs

  • X-small and small VM shown above have a unit configuration, that may favor a hyper-converged solution. But with very low unit costs for these sizes ($30-$60 per Month) it might be hard to justify moving to a separate infrastructure just to reduce costs
  • The workload of X-large and Large VM have such a high cost structure that they may be some unique opportunities to look at SMP or other HW options to deliver infrastructures at a lower unit cost.
    • Perhaps the type of storage used for these sizes could be re-evaluated
    • We see scale-out systems like hadoop and IOT falling into some of these categories, and the storage architecture tends to be the source of the high costs
    • Larger VM systems could possibly look at options like object store, or different backup schemes to reduce unit costs
  • In this above case, the jump in unit cost between Medium and Large was due to that fact that there was just 1 medium size, but 4 different sizes in the large family. There could be a good case to reduce the number of VM sizes by offering a standard VM catalog.


As we undertake a VM cost improvement program (which is very popular these days with very attractive cloud pricing), it is important not to make too many generalizations. Getting an atomic view of VM definitions, VM cost structures and the reasons behind the costs categories will enable architects to pinpoint the cost problems, and thereby pinpoint better solutions for continuous improvements.

VM TCO Observations

Posted by David Merrill Employee Sep 19, 2016

As I mentioned in my last blog, HDS had worked with a lot of clients in the last few years to help them identify and reduce the cost of Virtual Machines (VMs). Enough time and assessment 'aging' has been done to now be able to report on trends, macro-observations and observe cost ratios.


First, lets look at some aggregate rates of how VM costs tend to stack-up:




Now a couple of caveats from the pie chart above

  • There are 24 cost elements that we have in our VM TCO methodology. The above chart is an aggregation of the most common or popular cost areas
  • Size of VMs impact the TCO, so what you are seeing is the average cost distribution of the average VM in a client environment. Costs by size do differ (larger VM have more memory, storage so the HW depreciation expense will tend to be higher)
  • Geographic location of VMs also matter, so locations where power or labor rates are higher will skew 'average' results
  • VM costs are also related to the workload. Oracle VM will have a different cost profile compared to a VDI or test/dev VM
  • The age of the hardware has a big impact on total costs. Older systems, with a shrinking book value, will see inflated rates for maintenance, power and cooling (on a per VM basis)


With the caveats out of the way, lets talk about the cost distribution a little:

  1. Number one cost (in terms of % of TCO) tends to be labor. VM are still fairly labor-intensive related to
    • Troubleshooting
    • Standardization, catalogs, custom builds
    • Patch management
    • performance mgmt, backup, restore
    • configuration management
    • workload migration
  2. Second highest cost is DR and data protection (backup) of VM. This includes managing the cluster, snaps, replication and backup schemes
  3. Next is the software costs, that include the hypervisor, OS, management tools etc. Some software can be depreciated, others are licensed annual or with a usage utility.
  4. Hardware depreciation (separating storage, software and servers) is next if you count each separately. All combined, depreciation expense tends to be about 25-30% of VM TCO. As the assets age, and book values approach zero, then the depreciation costs will shift to maintenance costs


Breaking from the order of the top 4-5 cost areas, my observations on the rest

  • Maintenance costs (Hardware and software). Most vendors provide 3 years of HW maintenance with blades and storage. After year 3 the HW maintenance cost tends to quietly and quickly increase. Software maintenance or license fees tend to emerge in the 2nd year of ownership.
  • Provisioning time - this is the time a project has to wait for a VM to be presented. This may include purchase, engineering, asset allocation, config etc.
  • Environmentals - data center floor space, power and cooling for servers, storage and network equipment
  • Engineering time, to certify and test (non converged) VM hardware stacks
  • Network - both top of rack equipment, as well as WAN, SAN, IP network for local and remote systems
  • Risk - usually related to schedule or un-scheduled outages


Honorable mention, even if not included in the graph

  • Cost of waste - when we can run a tool to see the CPU and memory utilization, it is not uncommon to find some 10-15% of total VM that are dead (not active). These are wasting hardware, licensing and maintenance resource dollars
  • Cost of performance - if the VM are under-performing, they can be the cause for slow systems, lost revenue, customer satisfaction etc.
  • Cost of growth - how much reserve is needed to keep on-hand for un-forecast growth; or the time and effort to procure more assets. VM sprawl puts a lot of pressure on the costs of growth and cost of waste.


Another key observation that we can observe are the relative TCO results by VM generation. When we work with a customer to see how they have deployed systems, we know what to expect in terms of overall costs. DIY VM systems are usually the highest overall rate. 1st generation CI systems (Flex-pod, VCE) are next, with 2nd generation CI (UCP) with the next best/lower rate. Advanced orchestration and provisioning tools (on top of the converged platforms) tend to provide the best (lowest) overall TCO.



In my next entry, I will talk about the process that we use to create an arms-length VM TCO baseline for a customer environment. With a good baseline defined, IT architects and operations staff are then able to set tactical and strategic plans to reduce the unit costs of Virtual Machines. Every IT shop is different, in their cost sensitivities, VM sizes and quantities and historical deployments. We do baselines not to compare to others, but to help with an individual cost improvement (continuous improvement) program.