This is the first of a series of animations/presentations I'm working on...
Greg Knieriemen great to see the materials getting out. I think that the message is pretty compelling. I'd like to add that the other thing we do is that we deduplicate the deduplication index/database... I think that this last part is pretty darn cool and speaks to our concern about returning as much capacity to the user as possible.
Moreover HDS licenses our offerings on used and not raw capacity which is testament to our confidence in the various efficiency features we put into our products. Some of these features include:
I could continue on, but I hope that we all get the point. I think that the fundamental challenge we'll all have to cope with though is the paradox proposed by Jevon.
Great video Greg Knieriemen. Thanks for sharing.
Agreed, great video. Thanks for sharing!
Great format, easy to understand, and digest. Look forward to stalking you on Twitter as well.
Nice One Greg.
I watched the presentation and it was good, but a couple of things just stood out.
- There was really nothing about primary block deduplication. Other vendors offer this. I know the presentation is HNAS centric but maybe this is something to think about?
- The performance section only covers process (of the deduplication of data). The primary reason deduplication isn't recommended by the other vendors relates more to access time considerations of deduped data. This section of the presentation doesn't cover how this.
Cris there really isn't a performance penalty on reads due to how the file system works. Essentially there is no re-hydration for the read case.
As to the point on block HNAS offers CIFS, NFS, iSCSI, etc. Further I thought that when you look at block arrays that offer "primary de-duplication" I was under the impression that there were a lot of conditions for actual usage. Can you confirm?
Michael, the only way there wouldn't be a performance penalty would be IF all the unique blocks/chunks (whatever the unit is) and the corresponding look up tables are stored in some higher tier of memory. If a unique block/chunk was on disk and had a high rate of access (which resulted in seeking) there would be an impact. This is where (historically) PAM provided Netapp arrays performance benefits in deduplicated environments. The performance benefit however was effected by how skewed the requests were, if the PAM cache wasn't large enough there would still be disk access required (essentially a miss). For a netapp arrays (without read cache) this was very problematic, so it was important to get all the ratios right.
I suspect the platform (NHAS) would be intelligent enough to keep the look up tables in memory and most of the frequently used blocks/chunks too. A file systems add a layer of intelligence in regards to access frequency and up-stream caching efficiencies so there is goodness here.
There is general restrictions (guidelines) for this features use depending on the platform or the data set that is deduped. This is also true for other space efficiency technologies such as compression. I know I keep saying this, but nothing is free in a closed system.
Actually not how it works, and we still get the benefit of no performance penalty on reads.
Essentially writes are chunked inline through an FPGA, a background process wakes up with I/O load is low, this process iterates through new chunks, computes a hash, and consolidates the chunks/blocks.
Since chunks = file system blocks then a read case is really about traversing the object-nodes to reconstruct the complete file. (For background HNAS uses an object based file system where objects can point to a multiplicity of blocks.) Since the read case is the read case and it doesn't matter if one or more file system objects reference the same block there effectively isn't any need to rehydrate.
More detail is available in my blog post here: http://bit.ly/16BuL09.
I haven't read this post thanks. It's clearly not how netapp and EMC works
It explains how you get around the need to rehydrate on retrieval (which is cool) and void the issues commonly associated with re-hitting the same block (also cool). I'll need to read more about it (I don't do bluearc) and think about it some more.
Also check out HNAS-HUS FM Primary Deduplication FAQ (072413) and HNAS Deduplication Best Practices Guide.pdf and HNAS Deduplication Estimator Tool Quick Start Guide (revised: 021513).pdf. They may help and I've pinged the engineering team to see if they can weigh in as well.
Michael - those links are in the SSP space, to which only HDS employees and SSP partners have access, despite some of those being non-restricted documents.
Jeff - something to think about - public documents only available in the Community in restricted spaces. Hmmm....
DOH! You're right we need to push all unrestricted docs into the more public forums.
I;ll have a read during the week.
Any idea where non-HDS employees can access these files please?
You don't need to work for HDS, you just need SSP partner access. I've looked at the documents, I don't think there is anything that is deeply propitiatory in them. It looks like Michael Hay has already requested they are pushed to something less restrictive. I think you just need to wait.
I hope "propitiatory" there was a typo for "proprietary".
Well-written blog post Michael, thanks for the link. I'll be using this info in my future HNAS discussions!
Happy to oblige.
Most happy to help and provide some insightful nuggets.
In the latest draft of ISO/IEC 27040 (Information technology - Security techniques - Storage security) the topic of data reduction technologies (compression and deduplication) comes up. The primary concern is centered around encryption; in a nutshell, you want to invoke deduplication before you encrypt because the encryption completely randomizes the data, thereby eliminating the potential benefit. The other concern is that certain types of deduplication (and compression) can wreak havoc on your disaster recovery/business continuity solutions (i.e., assume the deduplication engine is taken out in a "smoking crater" event).
Is this something worth noting in our materials? When 27040 is published as an International Standards in about a year, we'll probably get questions about this.
Lovely video, would it been nice to have a HNAS in the title, for a second I thought you guys have dedupe on block level ... small hopes
Retrieving data ...