Recently we received an inquiry from the field regarding HDI stubbing and the presence of the ‘x’ icon (in Windows Explorer). The specific issue at hand was “why does the ‘x’ icon remain even after I have recalled the file?” The issue is one of “optimization”.
Let us explain:
When HDI stubs a file it marks it internally with the ‘S’bit (as displayed from xfs_wcheck). When a Windows user uses Explorer the ‘S’ bit is displayed as the ‘x’ icon. When a client requests data from a stubbed file HDI optimizes the recall in 2 ways. First it figures out what portion of the file you want to read. Then it performs a byte offset GET from HCP and reads back a 1MB chunk that includes the bytes that your request needs. The 1MB chunk optimizes for the fact that we can stream data back from HCP much more efficiently if we read in bigger chunks (“read ahead”).
What happens if my file is larger than 1MB? Does HDI recall the entire file? It does not. And for good reason: it has no idea if you want to read the entire file or not. Furthermore, it is possible that you open the file, seek to the middle or to the end, and start reading there. HDI optimizes the read only for the data that you are interested in. For very large files if HDI were NOT doing this the performance consequences could be severe. The result of this optimization is that stubbed files could be in a semi-recalled state, with some por\ons of the file vacant while others have filled chunks. As long as the file is in this state HDI will keep the ‘S’ bit on. ‘S’ bit on == ‘x’ icon on.
What happens if the file is small enough so that it fits in a 1MB chunk? When the file is accessed the chunk is recalled, the user IO request is satisfied, but the ‘S’ bit still remains on. The next time the same file is read the IO is satisfied directly from cache. No recall is necessary. The ‘S’ bit remains on because HDI doesn’t check to see if it has recalled all the chunks for a file. If it were to do so it would essentially have to do this on EVERY file access, regardless of file size, creating a lot of overhead. All of this just to turn off the ‘S’ bit. There is no harm in leaving the bit set – all the client needs to know is that any read for any offset of the file will be satisfied. If the client wants to know how much of the file is cached/recalled there are easy ways to ascertain this. On Windows you can inspect the property “Size on Disk”. On Linux you can display the size of the file using “ls –s”.
So when will the ‘S’ bit reset? Whenever the file is updated/changed. When that happens HDI needs to replicate the new file to HCP. To do so it needs to ensure that the entire file has been recalled because that is the way it is written to HCP. In the case of files that are < 1MB that check is satisfied immediately, the ‘S’ bit is turned off, and the file is queued for replication to HCP. For files that have “holes in them” HDI recalls the chunks from HCP, merges them into the file, and turns off the ‘S’ bit.