We’ve all heard the cliché, “If it sounds too good to be true, then it’s probably not true”. In many situations this turns out to be the case, but I’m not writing this blog to throw shade at data reduction. Actually, data reduction is an amalgamation of super useful technologies like compression and deduplication which yield additional “effective” storage capacity out of your existing usable capacity. This proposition is especially attractive when it comes to all-flash storage because data reduction can significantly lower your cost per GB. But there are questions you should be asking and that’s what I want to call out. As a starter, it is important to understand the definitions used in data reduction marketing jargon so here’s a primer.
- Data Reduction – Compression and deduplication technologies
- Total Efficiency – Compression, deduplication, snapshots, thin provisioning
- Raw Capacity – Total disk space available to the array
- Physical Capacity – Capacity available after formatting the media
- Usable Capacity - Physical capacity available after factoring RAID data protection overhead and spare drive capacity
- Effective Capacity – Useable capacity available after deduplication and compression is applied
- Free Capacity - Unassigned space in a volume group or disk pool that can be used to create volumes
So you may be thinking that if compression and deduplication is the basis for data reduction, is one vendors compression and deduplication better than another’s? Flash vendors use very similar compression and deduplication technology. For example, all vendors use the same LZ77 bit compression algorithm as this technology is proven and patented. Deduplication schemes don’t vary much as the premise is the same, to identify patterns and eliminate duplicate copies of data. A pointer to the original data is inserted where a duplicate exists. Sooo… this leads to the question, “Won’t all vendors results be the same if they all use the same compression and deduplication technology for data reduction? The answer is yes, if not quite the same, the results yielded will be very similar.
4 key considerations when evaluating data reduction:
24:1 is the most baby! But this “bug” isn’t going to win any races…
The first is performance or overall system IOPS. If you have already settled on an all-flash solution there’s a good chance that the reason why was to deliver better performance and quality of service to your customers and the business. Regardless of what vendors claim, compression and deduplication can adversely affect system performance because those data reduction operations need to be handled either in silicone or in software. This is called the “data-reduction tax”. Now how big a “tax” you pay is directly related to how efficiently the vendor has implemented the solution. Moreover, everyone’s environment and use case is different so results can vary widely regardless of vendor claims and guarantees.
The second big ticket item to be consider is the type of data are you going to be reducing. The fact of the matter is that some data types compress very well and other don’t at all. For example, remember the first time you tried to “zip” a PowerPoint presentation because the darn email system wouldn’t allow attachments over 10MB. After swearing and cussing at Outlook you thought that zipping that big honkin presentation would be the answer. Then you learned that zipping that .ppt got you nothing! That was a rookie move. What you should know is that that databases and VDI data compress very well. Audio, video, and encrypted data doesn’t compress well at all so there would be very little data reduction benefit on those data types. The point I am making is that you should be aware that your data reduction benefit is going to vary directly based on your data type.
The third item I’m going to ask you to consider is does your vendor give you a choice to configure your storage volumes with or without data reduction? If the answer is “no”, you should be concerned and here’s why. In a “data reduction is always on” scenario you don’t have the choice or ability to balance data reduction with performance. This may be fine if your application can tolerate the latency inherent in a “data reduction always on scenario”, but in most cases all-flash arrays are purchased to break performance barriers, not introduce them. I must point out that with the new Hitachi VSP Series and Storage Virtualization Operating System RF, the user has a choice to balance flash performance and efficiency right down to the LUN Level. The result is a bespoke balance of IOPS and data efficiency tuned perfectly for each individual environment.
Four is probably the most important attribute that few vendors are willing to discuss. Is the data reduction guarantee or claim backed up by an availability guarantee? Do you know that “availability” is the number one selection criteria when purchasing and all-flash array? What is good is a 7:1 efficiency ratio if you can’t get to the data? Hitachi Vantara stood out from the crowd by being the first to offer a 100% Data Availability Guarantee with their Total Efficiency Guarantee.
So in closing here’s what I should suggest when evaluating data reduction claims? Don’t be fooled by the “my number is bigger than your number claims”. The results that you will see are highly dependent on your data and workload. Work with the vendor to assess your environment with a sizing tool that provides a realistic expectation of results. Consider that you may not want to run compression and deduplication on certain workloads to maximize performance. You will want the choice to turn data reduction on or off on different volumes within the same array. Also, beware of any vendor that promises you maximum flash performance with the highest data reduction ratios because if it sounds too good to be true it probably is.
For More Info on Hitachi Vantara Investment Protection and Total Data Efficiency:
Hitachi Vantara Up to 4:1 Total Efficiency Guarantee
Hitachi Vantara 100% Data Availability Guarantee
Hitachi Vantara Flash Assurance Program
You can read more great blogs in the Data Center Modernization Series here:
Nathan Muffin's blog – Data Center Modernization
Mark Adams's blog - Infrastructure Agility for Data Center Modernization
Summer Matheson's blog - Bundles are Better
Paula Phipps' blog - The Super Powers of DevOps to Transform Business
Richard Jew's blog - AI Operations for the Modern Data Center