Hitachi Content Platform​

 View Only

 Service responsible for clearing out rehydrated content from storage tier

  • General
  • Development
  • Object Storage
  • Hitachi Content Platform HCP
Data Conversion's profile image
Data Conversion posted 12-04-2018 00:45

Have a situation where an HCP is running out of local managed storage.  Customer doesn't want to add additional storage, but instead perhaps tier content to another tier.  But having issues with frequently used content causing excessive re-hydration and thus cost. The customer would like to tier the unused content to AWS/Azure/Google cloud.  Problem with HCP tiering is that the setting is based on how old the content is from the initial ingestion.  However, there is quite a bit of content that is static but frequently used. Because of this, I believe the frequently used content will eventually cause excessive rehydration and thus cost to the customer. For instance, say there is some content that was ingested into HCP 60 days ago.  The storage tiering policy is set to tier content to the cloud after 30 days.  This content is read frequently, say every other day. After 30 days, I believe this given piece of content will get into a thrashing situation where every day it will be "un-hydrated" (if that is an appropriate term), then rehydrated when the content is used.

Want to come up with a way to deal with this and thinking perhaps some manual management/monitoring might help here.  Question: Is the Storage Tiering service responsible for "un-hydrating" the content? If so, it might be feasible to turn on the Storage Tiering service for initial bulk tiering of the oldest content content, then turn it off.   For the content that is frequently used, that content will rehydrated and remain on the HCP system.    I realize that the Storage Tiering service is managed at the system level, but this HCP is ONLY used for HCP-AW backend storage.

 

Any thoughts on this besides this will be a pain to manage?


#Development
#HitachiContentPlatformHCP
Data Conversion's profile image
Data Conversion

When you configure rehudration, you specify how long you want to keep data rehydrated, so it does not need to "un-hydrate" it every day.

Data Conversion's profile image
Data Conversion

So what you are saying is that under the given example, every 30 days, content will be "un-hydrated" regardless of how recently it has been accessed?

And also, to the original question, is it correct to say the the storage tiering service is what will "un-hydrate" content?

Data Conversion's profile image
Data Conversion

Storage Tiering and Protection services would "un-hydrate".

When the object is rehydrated, it's kept on the primary storage for as long as the rehydration period specifies. Access time is irrelevant (HCP does not track access time).

Joshua Eddy's profile image
Joshua Eddy

Clifford Grimm: No. Content would be rehydrated when it is read, and then the rehydrated object can be kept locally for X number of days; after X many days it would then be re-tiered (to public cloud) per the service plan.

Here is the text from the help and screenshot:

For each storage tier, including the ingest tier, the service plan for a given namespace specifies:

Whether the data for each object stored on the tier is rehydrated (that is, restored on the ingest tier) upon being read from the tier, and if so, the number of days HCP is required to keep a rehydrated copy of object data on the ingest tier.

rehydration

Benjamin Clifford's profile image
Benjamin Clifford

Rehydration is an option, and is not enabled by default. Rehydration works by marking a tiered object to be rehydrated when the object is first read. This means that the first read, and any read prior to rehydration, will be serviced directly from the cloud pool, but that reads subsequent to the object being rehydrated will be serviced from the local disk pool. Rehydration occurs in the background and is managed by the Storage Tiering service. Subsequent reads, after the object has been rehydrated, will extend the rehydration period (8.0+). When the time between the last read and the current time is > the rehydration period, Storage Tiering will mark the IF on the local disk pool as garbage. It is not until GC inspects that object that the data will actually be deleted from disk and free the capacity.

Tiering and Service Plans can be configured to create quite a bit of churn in the form of additional object copies in various pools, and can create a tremendous amount of backlog for GC. When this happens consumed capacity can bloat well beyond the expected capacity. This can be especially problematic on systems with high object counts because GC has to visit every object on every pass, and can take days or weeks to complete a pass when the system has billions of objects and there is a relatively high percentage of the data that is considered garbage. For this reason it is important to carefully think through and plan Service Plan changes before implementing them, and to try to avoid frequent changes to Service Plans.

8.2 plans to improve the situation by enhancing all services do their own GC inline, and not just marking them as garbage for the GC service to clean up later. They will also be adding an option to perform interactive (user requested) deletions inline.

Data Conversion's profile image
Data Conversion

Thank you very much for the thorough answer!