Putting Your Unstructured Data to Work

By Jeffrey Lundberg posted 03-02-2021 16:52

Over the past 3 years, I have accumulated 45GB of data on my internal hard drive. In my first 10 years with Hitachi Vantara, I accumulated 83GB of data. Doing the math, that’s 15GB/year over the last 3 years, versus ~8GB/year over the first 10 years. My own data growth has nearly doubled in the past 3 years, and this is just from presentations, documents, images and videos like this one.
I’m sure that those of you dealing with machine data, IoT and other more data intensive workloads are seeing far more growth than I am. Like a lot of you out there, my older data just sits there doing nothing until I want to reference it for some obscure purpose. This costs me essentially nothing, but data hoarding on an organizational level is expensive while returning little to the business. I can search my trove of older data when I need something, so it offers me a lot of value. However, trawling through the petabytes of your organization’s data for useful information isn’t something that humans can do alone. Yes, there is value in things like compliance based retention and referential access. But when you bring AI, machine learning, and other analytics tools to bear, you can put this lazy data to work and get considerably more value out of it.

Technologies like AI and ML can be game changers. They can help you reduce costs, improve your customers’ experiences with your organization and generate new insights and intelligence that can help you find the next big opportunity. But these tools aren’t as simple as just downloading some free software. Each workflow stage has unique compute, storage, and networking needs as shown in the image below.
Ingestion requires the ability to collect data from diverse sources over multiple protocols. Preparation demands high performance while inference needs low latency, and model training requires both. Across all these stages is the need for massive scale and automated data management at a cost that won’t break the bank. Standing up multiple environments to support these stages leads to silos of infrastructure and creates data management challenges which drive up costs and time to results. Simply throwing more compute power at these applications won't cut it. Yes, GPUs can shrink compute infrastructure, but if storage performance won't keep up your compute investments are sitting there idling and waiting for data.

This is why we have created the newest offering from Hitachi Vantara: Hitachi Content Software for File.
Hitachi Content Platform for File is a high-performance storage solution for AI, ML, analytics and other GPU accelerated workloads. It gives you the speed of a distributed file system (DFS) with the capacity and cloud capabilities of an object store. Its support for file and object protocols makes data ingestion easy. The DFS provides both high performance and low latency for data preparation, model training and inference. The object store provides massive storage capacity at a lower cost and offers powerful data management automation driven by metadata.
This unique integration of a distributed file system and an object store offers you a single solution for an appliance-like experience. It creates a single pool of capacity with independent, linear scale of compute and storage capacity with intelligence metadata-based data management automation to move data between on-premises and public cloud storage for cost, compliance, and business continuity purposes. All from a trusted vendor so you can jumpstart or accelerate your artificial intelligence, machine learning, analytics, and other GPU accelerated workflow projects.

To put this in perspective, have a look at the image below. An autonomous vehicle company was able to cut 76 hours from data validation, data transformation and model training. In a world where the winners are those who can run more models with more advanced algorithms, Hitachi Content Software for File is invaluable.
What's that? You aren't training AIs for self-driving vehicles? Fair enough. Are you doing real-time analysis for point-of-sale fraud detection? Maybe you have Hadoop, Spark, Teradata or other big data environments. Perhaps you are in the business of high frequency trading or market analysis. Are you doing genomics or neural research? Bio imaging like microscopy or digital pathology? You might be trying to stream video for a cloud DVR platform with over an exabyte of data. The bottom line is: If you need blazing speed, with massive capacity at a price that will help you stay competitive, talk to your Hitachi Vantara representative today. Or, if you're feeling shy today, check out the datasheet, solution profile, and other details on our website.

1 comment



05-02-2022 02:09