Hitachi Content Platform​

 View Only

 HCP Namespace capacity utilisation breakdown

  • Object Storage
  • Hitachi Content Platform HCP
Malcolm Gibbs's profile image
Malcolm Gibbs posted 03-23-2020 08:34

Hi, we have some large HCP namespaces used by Hitachi Data Ingestor (HDI).

 

The space utilised by the HCP namespace is significantly higher than the known current contents of the the HDI file-system. We suspect that this maybe caused by old versions and deleted versions.

 

Is there a way to report on the utilisation of a HCP namespace including a breakdown of the space used by the current version, old versions and deleted versions. if not from the HCP Tenancy Console maybe somebody has some nice scripts out there that can extract an object listing and build a summary report.

 

Thanks

Malcolm


#HitachiContentPlatformHCP
Jonathan Chinitz's profile image
Jonathan Chinitz

The short answer is no, at least not without doing some "gymnastics".

The HCP namespace that is used by HDI has 3 parts:

  • Data
  • System Restore (under the directory root)
  • System config, logs and other metadata (under the directories management and system)

All the other directories (which look like "hhh" where 'h' is a hex digit) are used for data. Each object in a data directory could have one or more versions.

If you want to know how much of the namespace is used for data, add up the sizes of 'root', 'system' and 'management' folders and subtract this from the total size of the namespace.

If you want to know how much of the namespace is used for old/deleted versions, then subtract the size of the HDI filesystem from the last calculation.

 

 

Malcolm Gibbs's profile image
Malcolm Gibbs

Jonathan,

 

Thanks for the response and I hope you are doing ok in these crazy times.

 

I understand your maths and we have already totaled up the size of the files in the HDI filesystem and compared them to the Total size of the HCP namespace and we have a large disparity so we suspect we are keeping too many versions or deleted versions are killing us.

 

If we were to dump all the objects metadata information in the 'Data' part and total up the space used by current version and old/deleted versions, can you give us some advice on what the best HCP API call would be to do the dump and what object properties we would be looking at to differentiate the objects.

 

Also would the Content Intelligent Suite offer any assistance in reporting on these operational matters?

 

Thanks

Malcolm

Jonathan Chinitz's profile image
Jonathan Chinitz

I think maybe you did not understand my reply. Example:

 

Let's say that the Namespace is 10GB. Of that 10GB the system, root and management consume 1GB. The size of the HDI filesystem is let's say 4GB. This 4GB represents the current/active version of the object in the namespace. So 10-1-4 == 5GB which will be the amount of VERSIONS that are NOT CURRENT and are consuming storage in the namespace.

 

Make sense?

Malcolm Gibbs's profile image
Malcolm Gibbs

Hi,

 

So in my case the HDI Filesystem is 44TB and the HCP Namespace is 288TB. I do not have access to total up system, root and management but expect that to be small. So not current versions is about 244TB. Do you think there is any worth trying to explore further what the huge disparity is or just pull back the history retention on the HDI.

 

By the way here is our HDI schedule

 

Thanks

Malcolm

Malcolm Gibbs's profile image
Malcolm Gibbs

Sorry image did not paste. here it is HDI versions

Jonathan Chinitz's profile image
Jonathan Chinitz

For each file/directory that you modify every hour you are creating a version. You are keeping versions for 10 years which is 3650 days which is a very long time. Even though the schedule indicates hourly, daily, weekly, etc. NONE of these versions ever gets deleted (there is no way to delete a specific version in HCP). All we are doing with HDI schedules is marking certain versions as ones that we can point to for a weekly or monthly view of the filesystem. So I am not at all surprised that your namespace is bloated with versions.

The only way to get rid of the versions is to change the schedule down to, let's say, hourly only (daily, weekly, monthly, yearly all should be ZERO). HDI will change the versioning to 2 days if I am not mistaken. The next time Disposition and GC run on the cluster ALL the versions older than 2 days from now will get purged and their storage will be reclaimed.

Malcolm Gibbs's profile image
Malcolm Gibbs

I note from memory when we set this up with the consultant that HCP Namespace setting "Prune versions older than ... days" for a HDI Namespace got set to the 10 year period and were told the HDI would prune manually the hourly, daily, etc schedules.

If that is not the case then I can understand what you are saying.

 

So even if we went for a more traditional schedule of having daily, monthly versions and kept monthly for 12 months then we will still keep potentially 365 versions of a file that changed daily (even though only 12 monthly versions would show in .history).

 

Jonathan Chinitz's profile image
Jonathan Chinitz

Correct. That is the way HCP Versioning works.

Malcolm Gibbs's profile image
Malcolm Gibbs

Thanks for the assistance and clarification.