Richard Jew

AI Operations for the Modern Data Center

Blog Post created by Richard Jew Employee on May 15, 2018

Data center modernization isn’t complete without the right IT Operations Management (ITOM) tools to ensure your data center is running smoothly.  Today’s data center operations are under constant change with new systems, technologies and applications being added, moved and fine-tuned.  Most ITOM tools have a domain specific view into the infrastructure that can be further restricted by vendor-specific approaches.  If you’re looking at a silo view of your data center, it can be difficult to ensure your applications are running at peak performance across all the various infrastructure and devices that are needed to support them.

 

To address these IT operation challenges, Gartner has been promoting the need for AI Operations, or Artificial Intelligence for IT Operations, where machine learning and big data are used for a new holistic view into IT infrastructure for improved data center monitoring, service management and automation.  Let’s see if Gartner is onto something here.

 

Gartner: AI Ops Platform*

Gartner AI Ops Image.png

 

AI Operations starts with gathering large and various data sets; lots of telemetry data from across disparate systems (applications, servers, network, storage, etc.) to be analyzed.  Using machine learning (ML) algorithms, this data is mined to gain new AI insights that can be used to optimize across these various infrastructure systems.  For example, an on-line retailer wants to assess their readiness for Cyber Monday workloads.  If they used domain-specific ITOM tools, they would only get a silo view (i.e. server or storage only) into their IT  operations that would limit their insights.  AI Operations tools benefit from aggregating analysis across multiple data sources providing a broader, complete view into the IT infrastructure that can be used to improve data center monitoring and planning.

 

In addition to monitoring, AI Operations can impact other IT operation processes such as decreasing the time and effort required to identify and avoid availability or performance problems.  For example, it’s best to be notified a data path between a server and a shared storage port is saturated and then quickly receive a recommended alternative path with plenty of time to move applications that may be overloading the saturated path.  Compare this approach to where an administrator receives separate notices about performance problems on networking and storage ports, then needs to confirm the two issues are related before trying to find an acceptable solution.  AI Operations provides the opportunity to use machine learning to identify interconnected resource trends and dependencies in order to quickly analyze problems compared to manual, silo approaches that are typically based on trial and error.

 

Hitachi Vantara’s recent announcement to its Agile Data Infrastructure and Intelligent Operations portfolio illustrates how these new machine learning and big data approaches can transform IT operations.  The new releases and integration between Hitachi Infrastructure Analytics Advisor (HIAA) and Hitachi Automation Director (HAD) provide new AI Operations capabilities to establish intelligent operations and the foundation for autonomous management practices:

 

  • Predictive Analytics – New ML algorithms and custom risk profiles to assess future resource (virtual machine, server or storage) requirements that incorporate all resource interdependencies.  It provides a more complete and accurate resource forecast as it includes performance and capacity as well as all dependent resource requirements on the same data path. This helps to ensure you are upgrading all the right data path resources with the proper configurations when adding a new application workload.
  • Enhanced Root Cause Analysis – New AI, heuristic engine to diagnose problems across the data path faster (4x) with prescriptive analytics recommendations.  By providing suggested resolutions to common problems, the effort and expertise required to troubleshoot performance bottlenecks is greatly reduced while further lower mean-time-to-repair (MTTR) objectives.
  • Cross Product Integration – New integration between HIAA, HAD and Hitachi Data Instance Director (HDID) enable new opportunities for AI-enhanced management practices.  HIAA can now directly execute QoS commands or suggested problem resolutions, i.e. required resource configuration changes, seamlessly with HAD's automated management workflows.  Through its HDID integration, HAD  incorporates new data protection policies, i.e snapshots and clones, into its automated provisioning processes for improved resource orchestration based on both QoS and data resiliency best practices.
  • Improved Management Integration – Enhanced REST APIs provide increased flexibility to integrate HAD into existing management frameworks and practices.  For example, HAD can easily be integrated with IT Service Management (ITSM) ticketing systems, such as ServiceNow, to incorporate the right authentication process or be tied into a broader automated management workflow.

 

These new updates help to deliver on Hitachi’s AI Operations approach for intelligent operations based on four key data center management steps to deliver enhanced analytics and automation with a continuous feedback loop:

 

Hitachi's AI Operations Approach for Intelligent Operations

AI Ops Image.png

  • Alert: Utilize ML to continuously monitor across multi-domains (virtual machines, servers, network and storage) and quickly be alerted for performance anomalies while ensuring service levels for business applications.  This helps to filter out unwanted noise and events, so you can keep focused on avoiding problems or issues that might affect your users.
  • Analyze: Leverage algorithms to identify historical trends, patterns or changing application workloads to be better informed on how to optimize resources on the data path or increase utilization of underutilized resources.
  • Recommend: Provide new insights to quickly identify the root cause of problems or analyze evolving requirements to optimally plan for new configurations that may be required for data center expansion.
  • Resolve: Drive action with integrated workflows or orchestration to streamlining adaptive configuation changes or necessary problem fixes. 

 

These new integrated operational capabilities can help you to better analyze, plan and execute change necessary to optimize IT operations.  This ensures data center systems are running efficiently and at the right cost, which is the real promise for AI Operations.  Whether it’s helping to highlight new trends, identifying problems faster or improving delivery of new resources, AI Operations’ greatest impact is to help IT administrators do their jobs better with the right insights so they can focus on projects that have a strategic  impact to their business.

 

You can read more great blogs in this series here:

Hu Yoshida's blog - Data Center Modernization - Transforming Data Center Focus from Infrastructure to Information

Nathan Muffin's blog – Data Center Modernization

Mark Adams's blog - Infrastructure Agility for Data Center Modernization

Summer Matheson's blog - Bundles are Better

Paula Phipps' blog - The Super Powers of DevOps to Transform Business

 

Storage Systems

The specified item was not found.

Richard Jew's Blog

 

*Sources

AIOps Platform Enabling Continuous Insights Across IT Operations Management (ITOM)

Market Guide for AIOps Platforms - Gartner, August 2017

Outcomes