Storage Efficiency Trade-offs in an All-Flash World
In storage, as in life, you have to give something to get something. So when a storage vendor tells you that they can provide you with some amazing benefit it raises the question: what, dear sir, will you be asking me to give up in return?
With all-flash arrays this is an especially important consideration when the promise is increased capacities and a smaller data center footprint – all while achieving break neck performance at sub-millisecond latency. Where’s the ‘give’ for all you get?
To take a deeper look at this topic I sat down with Patrick Allaire, one of our chief flash technology junkies. BTW. There will be more on this topic so be sure to click "Follow" at the top right of the page and expect more updates here!
Patrick, what’s the most common misconception you hear about storage efficiency and flash?
That you always get every flash benefit. Always. When I talk to customers they think every all-flash array will give super data reduction at a million IOPS at sub-millisecond latency.
Some all-flash vendors are getting better about telling customers they may not always achieve a mythical 5:1 capacity increase, but they still aren’t telling them how efficiency technologies work and how they can impact performance.
When you say efficiency impacts performance do you mean IOPS? Latency? Bandwidth?
All of them. When an array performs dedupe and compression the CPUs have to crunch the data to reduce it. That takes time and system resources, not just to reduce but also rebuild data.
For a test workload you may not see issues but for real data sets the CPUs and memory can become saturated quickly. At even 50% CPU saturation IOPS drop off and latency goes up. You may achieve good data reduction but high IOPS and low latency? Most architectures will see big problems.
Why do you attribute all-flash slowdowns to the architecture?
Because how many IOPS an array delivers is driven by the number of CPUs and memory. Yes, other things can affect performance but with inline data reduction it’s usually this part of the system architecture that is the bottleneck.
We’ve heard that some leading vendors are only getting ~8000 IOPS per CPU core. So unless you put a ton of processors and memory in the controller – which is expensive – or use a distributed architecture, the array will slow down.
That takes us back to vendors not stating how their efficiency tech works.
Yes. A rep may tell you how compression and dedupe reduce data. But they often don’t tell you how they use system resources and at what load you can expect to see IOPS or response time dropping off. They also don’t tell you how much capacity their storage efficiency technologies use.
You need to understand how they are doing metadata handling and what block size they use – for the application and the all-flash array data reduction. Both influence your performance and the extra capacity you will receive from data reduction.
So if efficiency can impact performance what should a customer do?
One they can look for products that have selectable services. If you need to make a choice or balance between performance and maximum storage, being able to turn off services can help. Then you can choose what you need – best speed or best efficiency. But if you choose speed don’t buy on effective price! Don’t let someone tell you that you that you’ll get 2x or 5x more data stored!
The other is to distribute services. By moving data reduction services to flash devices you remove the load from the controller and spread it across many devices. HDS does this. We run compression on our custom flash module (FMD) to distribute load. This also lets us take advantage of dedicated CPU and memory resources, as well as hardware acceleration, to prevent slowdowns.
To compare, take a look at how many Intel cores are needed to deliver the same compression throughput as a rack full of Hitachi Accelerated Flash FMD DC2 (based on widely adopted storage compression algorithms and public performance results). You can see that in a world where you do everything on the controller, you only go as fast as the commodity server. And that can be very inefficient.
I know we’ve only scratched the surface, but any other key takeaways?
When talking to customers I tell them the all-flash market is similar to the car market. A sales person sells on one set of statistics (IOPS, effective $/GB), but once you own the car you may care about other things (response time and bandwidth). Make sure you look at all factors and don’t only buy on specs.
- How many IOPS / how much bandwidth can an array deliver when data reduction is running?
- What latency can an array deliver when efficiency technologies are running?
- How does block size affect performance and how much benefit you get from data reduction?
- What “burn in” period to use to validate ongoing performance levels versus peak performance?