When flash storage technologies first came to market, data reduction technologies (deduplication and compression) were a critical feature. The math was simple: If you were going to buy a faster but more expensive storage medium, the only way to rationalize it was to maximize the storage capacity to lower the cost per gigabyte. Data reduction justified the higher raw capacity costs of flash when the costs were dramatically higher.
Enterprises are not moving to flash for the sake of data reduction, they are buying flash because of performance. It’s all about speed. Data reduction is critical as a cost model for flash storage but it is a means, not an end. The business value of flash is performance and the ultimate goal is to maximize that performance against cost.
The argument for data reduction goes something like this: If you could triple your effective capacity of flash, the cost per gigabyte would get closer to the price to spinning disk. Since flash is faster than spinning disk you would get better performance for the same price. So why wouldn’t you just buy an all flash storage array?
From a pure cost of storage perspective this make a lot of sense. But there’s a catch: data reduction technologies typically have a performance cost on both reads and writes. There is no free lunch here.
For most data reduction technologies, if you are deduplicating inline or post-process, you have to offload the write processing of the data and/or have a caching mechanism that maximizes throughput. Some would argue that with deduplication, you are writing less data so writing deduplicated data is actually faster than non-deduplicated data but this is entirely dependent on the speed of the deduplication process. If the deduplication process is the bottle-neck, the faster write performance will not be reflected in the actual system performance. Again, the goal is performance against cost, not deduplication in and of itself.
With flash, reads can be naturally optimized whether the data is deduplicated or not because of the randomized nature of data on flash. But this is where distinctions of data reduction are important to understand. Reconstituting deduplicated data requires additional CPU resources and some level of latency. Some might argue that the performance penalty on reads is a wash because you are reading less data, but there is still a penalty. It’s not enough that the impact is a wash, it has to provide the maximum performance.
Finally, how and when you use data reduction should be an option. Not all data types can be deduped and these workloads can grow meta data tables quite large and effectively reduce the benefit of data reduction. You should be able to turn it on when the cost benefit is greater than the performance detriment, or turn it off when that is not the case. In addition, for low-latency applications or high transaction environments, it’s imperative that the data deduplication functionality can be turned on or off.
Once again, the goal for implementing flash technologies is performance against cost. The different data reduction technologies can help lower cost but the critical feature is performance. Ideally, enterprises are looking for the lowest cost with the highest performance. With the price of flash dropping, data reduction by itself is no longer the defining criteria for flash storage. As the cost of flash storage continues to drop, the defining criteria for flash is quickly transitioning to performance, cost and business value.