Three Steps To Optimize Your Splunk Data Lifecycle with Hitachi Vantara

By Michael Pacheco posted 07-06-2020 02:47


The Double-Edged Sword Of Big Data

In today’s data driven world, we are now well-aware that there is so much untapped potential in our business data. Machine data has especially become a key strategic asset, as it holds valuable information about your customers, transactions, fraudulent activity, and system behaviors.  It’s no wonder that so many businesses defer to Splunk to help collect, store, investigate, and report on this machine data in real-time to detect attacks and data breaches, as well as to improve incident response and satisfy regulatory compliance.  

The benefits that Splunk can provide to your organization is clear.  We all want to be able to do more with Splunk by adopting new use cases and satisfying new lines of business.  So, the more data the merrier, then… right?  Well – yes.  Larger datasets provide the foundation for the most-effective analytics that power the most intelligent business decisions.  However, as your datasets continue to grow, infrastructure costs and time spent on administration can skyrocket – and it becomes increasingly expensive and complex to manage and maintain compliance.

Like other Big Data technologies, Splunk quickly became successful at what it does by co-locating compute and storage.  These Big Data architecture models were designed to help you gain the insights that you need, as quickly as possible, from your very large datasets.  However, this legacy architecture often lacks the flexibility to meet today’s storage demands as compute must also be scaled to keep up with the ever-growing demands for storing machine data.  

So - it’s a double-edged sword – with users finding themselves trying to balance all the robust capabilities of Splunk with the finite infrastructure resources available to run the solution well.  Let’s dive into this a bit more.

In Today’s Data Driven World, Microseconds Matter

The more Splunk data you have, the more supporting infrastructure you will have to manage.  These large environments pose even bigger challenges when the rubber meets the road during a system outage or node failure.  With traditional colocated compute and storage models, lots of Splunk nodes are likely sitting there idle just to satisfy boatloads of inactive data.  Not only does this waste money and valuable system resources, but this also translates to lots of time deploying new resources and other headaches when you need to recover from problems.  

Furthermore, with commonplace huge datasets and large Splunk clusters, system performance can tank – and this is detrimental to being able to collect and find the data that you need quickly.  Especially when dealing with matters of security and compliance, there are serious ramifications for violations.  Microseconds matter!


I Like Big Data And I Can Not Lie

Big Data.  Like it or not, it’s everywhere.  Let’s take the world’s most-glaring example at the moment.  The world is currently facing a global pandemic with COVID-19.  Researchers and scientists are working feverishly to find a cure and are leveraging Big Data from hospitals and patients around the globe to identify infection trends and patterns to arm providers with the best data to make informed care decisions.  Imagine not having this data – or only having an isolated sample?  The results for the healthcare industry, and to our lives, would be disastrous.

So - more data – yes?  But, we know this comes at a cost, too.  If your business ultimately does not have all the Splunk data it needs, most likely because it has become too expensive and cumbersome to retain, your organization is at risk.  You are at risk for not being able to find the data that you need to pinpoint a data breach, investigate a performance issue, or satisfy your compliance requirements.  Furthermore, as much as you may want to expand the value of Splunk for your organization, operating costs can become overly burdensome and your Splunk environments can become challenging to scale.  


Unpredictable Data Growth Means High And Unpredictable Costs

None of us have a crystal ball.  Unfortunately, we can’t predict the future.  It’s difficult to know exactly how much your Splunk data will grow and how much additional compute and storage infrastructure you’ll need to sustain the growth.  Especially when colocating compute and storage, scaling storage capacity means that you’re also scaling compute, network, power, cooling, rack space, and administration.  Let’s not forget, too, that Splunk typically maintains three copies of data to minimize the risk of data loss.  So, that 3x multiplier factors into the ever-growing storage and infrastructure costs as well.  In fact, it is estimated that that for every $1 you spend on Splunk, you will spend an average of $7 more on associated components.  It can often be a necessary evil.  But, do the math – it adds up very quickly!  

Optimize Your Splunk Data Lifecycle

To ensure that you’re deriving the very best insights and are making the most intelligent decisions with Splunk that your organization needs to be successful, your machine data needs to be treated like any other corporate asset.  From fast ingestion of your data sources, to being able to quickly find the data you need whenever you need it, to ensuring that you can scale, manage, and retain your machine data for all your compliance and troubleshooting needs – the entire Splunk data lifecycle should be optimized to its full potential to ensure that you are getting the most from your data.  From Hot, to Warm, to Cold, to Frozen, it’s imperative to have the most efficient and cost-effective solutions to optimize your Splunk data lifecycle from end to end.
  What is Hitachi Vantara's 3-Step approach to optimize your Splunk data lifecycle?  Watch this video to learn more. 

1.    It Starts With Maximizing Compute And Storage Performance

Hitachi Unified Compute Platform (UCP) and Virtual Storage Platform (VSP) are the Dynamic Duo here for world-class performance and reliability.

With integrated compute, storage, network, and management software, UCP’s enterprise-class infrastructure is designed around simplicity, so you can focus on accelerating the value of Splunk for your organization.  With the flexibility to support any application at any scale, you can dynamically scale Splunk resources independently and simply provision Splunk resources on-demand for the fastest time to production.  

A whopping 97% of Splunk searches are performed over events that occurred within the past 24 hours.  This is your most active, Hot, and thus, most critical data.  Needless to say, you need this data fast (you need it NOW)  – and it has to be available all the time.  

Faster insights to your machine data mean a quicker path to resolution, and Hitachi Virtual Storage Platform is the ideal solution to turbocharge your Splunk Hot bucket.  VSP is the world’s fastest NVMe flash storage array.  With ultra-fast performance of up to 21 Million IOPS, and the lowest latency in its class (70 microseconds), your most critical Splunk data is ingested quickly and available immediately for the fastest real-time analytics.  With a pioneering
100% Data Availability Guarantee, lowest cost per IOPS in the industry, and the Hitachi Accelerated Fabric to accelerate Splunk workloads and store and access your data in real-time like nobody else in the industry can - this trifecta makes VSP the fastest, most proven, powerful, and reliable enterprise storage solution for your most critical Splunk data. 

2.    Reduce Costs By Up To 60% Or More By Decoupling Compute And Storage

Sure – Hot data is the most critical.  But, what about less-frequently-accessed data like Warm, Cold, and Frozen data?  Well, all that data is very important, too - for many reasons.  Data retention and other compliance requirements are two major drivers.  You will certainly need to retain and access this data – considering that the very first order of business when an incident is discovered is to review historical data to determine when the incident started, what systems were affected, and what the business impact might be.

But, should this less-active data reside on your more expensive primary Hot bucket?  Thankfully not, as that does not make economic sense, nor is it sustainable in the long run.  By tiering your Warm data to
Hitachi Content Platform (HCP), the Worldwide Leading Object Storage solution , with Splunk SmartStore , you can retain almost unlimited storage capacity cost-effectively at massive scale, and also independently scale compute and storage resources to optimize resource utilization.  By reserving your critical Splunk infrastructure resources for your most active (Hot) data, the large bulk of all your remaining (Warm) data, is dynamically tiered between Splunk and HCP.  The best part is that you continue to maintain fast, uninterrupted, and seamless access to your data on-demand whenever you need it.  If you happen to search for data that now resides in the Warm bucket on HCP, the SmartStore cache manager quickly retrieves a copy from HCP to the local cache so you are none the wiser and continue to get fast and reliable search results all the time.  Awesome!  

With the new ability to scale storage capacity as you see fit, without having to scale compute alongside it and vice-versa, you’re now essentially doing more with less.  This allows more data to be ingested and will reduce the Splunk infrastructure footprint with even better performance.

Some other not-so-often realized benefits of tiering data with SmartStore to HCP:

•    With this large bulk of data now removed from Splunk, you can recover more quickly from hardware failures and data imbalances.

•    Decoupling compute and storage eliminates lots of Splunk cluster overhead and provides the ability to quickly restore your cluster by bootstrapping indexes from HCP.  Since this bootstrapping process is initially a metadata-only replication operation, it completes faster than having to download the entire contents of the buckets.

•    This integration between Splunk and HCP delivers superior data reliability and durability without requiring the typical 3x data redundancy in Splunk.  A win-win!

 Why Hitachi Content Platform for Splunk?  As the Worldwide Leading Object storage solution for the fourth consecutive year, HCP really sets the bar for object storage and, amongst other things, delivers:

•    5x lower storage costs than public cloud
•    Exabyte scale
•    Support for hundreds of nodes and trillions of objects
•    Satisfies the most-demanding performant workloads
•    100% S3 compatibility
•    The most-robust compliance feature-set in the industry
•    Flexible deployment options:  software-defined, virtual appliance, physical appliance
•    Lower TCO
•    Greater ROI
•    Hybrid cloud tiering

Worth mentioning also is that, with Splunk SmartStore, the concept of Cold buckets goes away.  Since Warm data already resides on a very cost-effective storage tier on HCP, Hot data ages to Warm and then directly to Frozen.  Speaking of Frozen, Hitachi Vantara has got you covered there as well for scenarios where you opt to freeze data as part of your Splunk strategy.  In Splunk, once data becomes Frozen, it is removed from the index and no longer searchable.  If it’s not searchable in Splunk, what good is the data in that state?  Only
Hitachi Content Intelligence can index and search Frozen data for even longer-term retention to streamline your most-demanding compliance and analytics requirements.
In a nutshell, by decoupling compute and storage, you can retain Splunk data for longer periods at much lower costs while all of your data remains searchable – from Hot, to Warm, to Cold, and even Frozen!  Only Hitachi Vantara can deliver this comprehensive data flexibility when it comes to Splunk.  


3.    Predict Your Infrastructure Costs With Confidence

Remember that crystal ball?  It’s here.  EverFlex from Hitachi Vantara is a simple and elastic way to acquire all that you need for Splunk – including compute, network, storage, products, services, and more.  With the flexibility to purchase, lease, pay only for what you consume, or use as a Hitachi-managed service, EverFlex gives you the best of both worlds - with pricing that’s predictable for usage that’s flexible.  

This takes some of the upfront guesswork out of the equation when planning for new Splunk deployments or expansions.  Since you’re now able to start really small, and expand on-demand as needed, you gain better control of your capital and operational expenses – which will undoubtably serve to improve your bottom line.

Hitachi Vantara Knows Data Better Than Anyone

We are your one-stop shop for all your Splunk needs.  We’re not just a piece of your Splunk solution.  We’re all the pieces – with a comprehensive portfolio that will reduce Splunk costs and boost application performance to optimize your Splunk data lifecycle from end to end.
 Want to learn more?  Check out the following resources:
•    Three Steps to Optimize Your Splunk Data Lifecycle –
•    Three Steps to Optimize Your Splunk Data Lifecycle – Webinar

Thanks for reading!

Michael Pacheco
Senior Solutions Marketing Manager, Hitachi Vantara
Follow me on Twitter: