Skip navigation
1 2 3 Previous Next

Hu's Place

308 posts
Hu Yoshida

Black Holes and DataOps

Posted by Hu Yoshida Employee Jun 14, 2019

This April there was a historic event that was considered impossible to accomplish. That was the capture of an image of a Black Hole.



This was considered to be impossible since black holes are a region of space where the pull of gravity is so dense that nothing can escape, not even light particles, and without light particles it could only be seen as a black void. (Black holes are thought to be created when a massive star collapses into itself). Up to now, scientist had confirmed the presence of a blackholes by its gravitational effect on surrounding material and close by stars. So how were they able to get this image?


It was accomplished through the extreme use of DataOps.


A team of international astronomers set about to capture an image of a black hole’s silhouette.  The challenge was to capture an image of the hot, glowing gas falling into a black hole from millions of light-years away. This team of international astronomers also included computer scientists to achieve this feat, improving upon an existing radio astronomy technique for high-resolution imaging and developing algorithms to develop images from very sparse data.



 A planet-scale array of eight ground-based radio telescopes forged through international collaboration, called the Event Horizon Telescope (EHT) was designed to capture images of a black hole. The EHT links telescopes, located at high altitude locations, around the globe to form an Earth-sized virtual telescope with unprecedented sensitivity and resolution. Although the telescopes are not physically connected, they are able to synchronize their recorded data with atomic clocks which precisely time their observations. Each telescope produced enormous amounts of data — roughly 350 terabytes per day — which was stored on high performance hard drives and flown to highly specialized supercomputers — known as correlators — at the Max Planck Institute for Radio Astronomy and MIT Haystack Observatory to be combined. They were then painstakingly converted into an image using novel computational tools developed by the collaboration of astronomers and data scientists.


The problem for the data scientist was that the data was still very sparse. While the telescopes collected a massive amount of data, this was just a tiny sampling of the photons that were being emitted. Reconstructing the mage with such sparse data was a massive challenge. There were an infinite number of images that could match the data. Algorithms had to be built to sort through the plausible possibilities, cleansing and curating the data to put the puzzle together. This was a classical DataOps challenge. 


Capturing this historic image, is just the beginning of the new innovations that will come out of this project. The algorithms that were used to develop this image will be very valuable for use in other applications, like analyzing MRIs. What is exciting is the possibilities of new applications that could be developed in the future, using these algorithms for reconstructing objects or concepts out of extremely spares data.


Another exciting thing about this project was the use of graduate students in this project. Can you imagine what these graduate students will be able to contribute as they carry this experience into their future careers? Here is a picture of one of those graduate students, Katie Bouman, who worked in MIT’s computer vision lab as part of a postdoctoral fellowship, collaborating on algorithm improvement across many different fields and computer vision applications. The picture was taken when she first saw the results of her efforts.

The global big data market is expected to grow at a CAGR of 22.4% during the forecast period and reach USD 200 billion by 2024 according to MarketWatch estimates. Data analytics is expected to be the key driver for this market. However, the stocks of big data analytics vendors have been tanking in a way that is reminiscent of the dot com bust.


 It was only a week ago when I blogged about the cloud data management company MapR, closing its headquarters and laying off 122 employees. Now Cloudera, another cloud big data management company, announced reduced earnings and reduced outlook which drove its stock down over 38% to around $5. It was only in January of this year when Cloudera and Hortonworks, two of the biggest players in the Hadoop big data space, announced an all-stock merger, which was expected to give new life to these companies in the big data analytics market.


So, what is happening? It didn’t help that the CEO of MapR resigned abruptly and the CEO of Cloudera retired as the latest earnings report was released. Changing CEOs will require several quarters for these companies to rebuild credibility. Analysts point out that they have strong competition from the hyper cloud vendors, AWS, Azure and Google Cloud, which are able to provide more services and applications. The Infrastructure as a Service play has already been won by the large public cloud providers and now the race is on for big data analytics services. The recent acquisition of Tableau by SalesForce and Looker by Google, are indicative of a trend by public cloud providers moving to provide end to end big data analytics solutions across multiple clouds.



While Hybrid cloud provides an opportunity to augment public cloud offerings, companies like MapR and Cloudera will have upfront development and support costs which will impact cash flow. Cloudera is developing a Cloudera Data Platform (CDP) with Horton Works to support multi-function analytics: from streaming and big data ingest to IoT and machine learning and support every conceivable cloud delivery mechanism: private cloud, public cloud, multi-cloud, hybrid, on-prem and containerized deployments, all with a common metadata catalog and schema. Some analysts are speculating that customers want to move to the cloud faster than Cloudera can allow them. These customers do not have the luxury of waiting for and trialing CDP while there are other options that are available today.


Hitachi Vantara customers have several options for accelerating their movement to the cloud and big data analytics for structured and unstructured data. One approach is to develop a data lake with Pentaho and other best of breed data ingestion and data orchestration tools for big data analytics that can span multiple cloud delivery platforms with a common meta data catalog and schema. Pentaho’s low code approach can simplify and accelerate the implementation of big data analytics. Hitachi Vantara has taken this approach internally  for our enterprise data that need to reside within our private cloud


Another option from Hitachi Vantara is to use REAN Cloud ,a global Cloud Systems Integrator (CSI), Managed Service Provider (MSP) and Premier Consulting Partner in the Amazon Web Services (AWS) Partner Network (APN) and Microsoft's Azure Silver Partner membership. REAN Cloud offers consulting & professional services, including cloud strategy, assessment, cloud migration, and implementation to realize our customers’ vision. REAN Cloud provides a REAN Cloud Accelerated Migration Program (RAMP) which can accelerate the migration to public cloud from a matter of weeks to days with their automated services and migration consulting expertise. Migration to the hyper cloud vendors enables the use of their menu of analytics tools. REAN Cloud incudes 47Lining an AWS Advanced Consulting Partner with Big Data Competency designation. 47Lining develops big data solutions and delivers big data managed services built from underlying AWS building blocks like Amazon Redshift, Kinesis, S3, DynamoDB, Machine Learning and Elastic MapReduce.


A full transition to the cloud has proved more challenging than anticipated and many companies are looking to hybrid cloud solutions to transition to the cloud at their own pace and at a lower risk and cost. Companies are looking for DataOps tools and platforms, and systems integrators that can help them create data lakes and deliver big data analytics in a timely manner. They want proven vendors who will be with them for the long term and who already have the platforms and services for hybrid cloud and big data analytics that can work within the ecosystem of public and private clouds.



“Social Innovation addresses the world’s social and environmental needs. It’s bigger than an individual or company. Social Innovation requires businesses and the entire society to work together toward a common goal. The goal of Hitachi’s Social Innovation can be summed 
up with two simple words: "Powering Good.”


We are in the beginnings of a new revolution that is described as the Cognitive revolution. What differentiates this from all previous revolutions, like the industrial revolution and the information revolution, is that it goes beyond the ability of technology to augment our physical capabilities to build things or to communicate things. The cognitive revolution is defined by technology’s ability to augment the cognitive potential of humans. This will be more disruptive to society than all previous revolutions.


The cognitive revolution is viewed by Hitachi as an opportunity to further its corporate goal for Social Innovation which addresses the world’s social and environmental needs. The key technologies driving this revolution are cognitive technologies like AI, ML/DL, NLP, AR/VR, Video Analytics. AI, machine learning and deep learning will help to solve many societal problems, like climate change, crime, disease, and the challenges of enhancing the quality of life and standard of living in megacities.


These types of calculations are based on the calculating the probabilities of many possible choices - combinatorial optimization problems. The number of possibilities grow exponentially and outpaces the compute capabilities of today’s CPU’s and GPUs. The only way to address this is through the use of quantum computers.


Quantum computing takes advantage of the strange ability of subatomic particles to exist in more than one state at any time. I am not about to provide a tutorial here on quantum computing except to refer you to wikipedia. Suffice it to say that a classical, or regular computer, contains a processor which manipulates strings made up of bits. These bits can only have a value of either 0 or 1, depending on the electrical charge applied to them. A quantum computer replaces these binary bits with quantum bits - or qubits for short. These are quantum particles which contain multiple states existing simultaneously in superposition. Their information may be stored as the spin property of that particle, or its momentum, even location. A qubit is not limited like regular binary bits to two states. It can have many states, which gives the quantum computer its exponential processing power. A quantum computer comprised of 500 qubits would have a potential to do 2^500 calculations in a single step. This is an awesome number - 2^500 is infinitely more atoms than there are in the known universe.

This effect of superposition allows a qubit to perform many times more calculations at a time than a standard computer. It is ideal for combinatorial optimization problems. While quantum computers are suited for these types of problems, they are not replacements for standard computers which are better suited for transactional problems or playing Youtube videos.


In 2015, Google and NASA reported that their new 1097-qubit D-Wave quantum computer had solved an optimization problem in a few seconds. That’s 100 million times faster than a regular computer chip. They claimed that a problem their D-Wave 2X machine processed inside one second would take a classical computer 10,000 years to solve.


While quantum computers are available from a few vendors for test and development purposes. Many are dependent on super conducting materials which require temperatures in the near absolute zero temperature range to minimize the movement of electrons and atoms. The D-Wave quantum computer referenced above, is cooled to 15 millikelvin, which is approximately 180 times colder than interstellar space. It is not likely that you will see a D-Wave quantum computer in your data center soon even if you could afford it.


Hitachi has come up with an innovative way to solve combinatorial optimization problems, which are key to solving many societal issues. Hitachi’s innovation is the use of a complementary metal oxide semiconductor (CMOS) circuit which is similar to what we use in CPUs today. This computer can solve these problems without the need for a quantum computer and super conducting materials.The key to solving these problems is the use of a mathematical model called the Ising Model  which is used to research magnetic properties in the field of statistical mechanics. The Ising model consists of spins that can be arranged in a lattice and the spins can be oriented up or down. Each spin interacts with the spin of its neighbors creating energy H.

Solving combinatorial optimization problems usually consists of testing various combinations and searching for the best solution. Instead of testing every combination, the problem we want to solve is to first map it to the Ising model. By conducting a process called annealing we can converge the combination of spinning directions and thereby optimize the energy expenditure, H, of the Ising model. (Annealing is a method for removing distortions on the inside of iron and steel by slowly cooling it after heating it to high temperatures.) The computer that Hitachi Invented reproduces the convergence behavior through a CMOS circuit. The Ising model that was mapped with the combinatorial optimization problem, is converged to a state that expends a minimal amount of energy. This minimal state represents an optimal solution to the combinatorial problem without the need to test every combination.


This CMOS annealing computer with the Ising model is available in our research lab and is solving problems today. Engineers here in Santa Clara code the problems in python and upload them to the research lab cloud in Japan where the researchers do the mapping to the Ising model.  Researchers are working on developing a higher level language to make the model more accessible. The mapping language is similar to the use of machine coding before the advent of higher level languages in standard computers. The use of CMOS annealing will make it possible to deliver an Ising computer to solve combinatorial optimization problems at costs which would be similar to standard computers.


Meanwhile, a number of companies and research institutes around the world are working on “universal” quantum computers. These use quantum gates to handle quantum bits, or qubits, in more complex ways. This enables them to run more sophisticated algorithms than CMOS annealing or quantum annealing machines and deal with a broader range of applications. Hitachi continues to invest in quantum computer research at their Hitachi Cambridge Laboratory (“HCL”), working in collaboration with academic partners at the University of Cambridge and University College London. However, today, quantum computers need to maintain near-zero temperatures and remain free from magnetic interference, thermal noise, and mechanical vibration in order for qubits to maintain superposition—the dual states of both 0 and 1—which forms the basis of quantum calculations. This means that the availability of commercial quantum computers are still some years away.


There is a dark side to quantum computers. Its ability to factor large prime numbers allows them to break asymmetric encryption which is used by internet communications schemes like SSL and its blazing speed could brute force symmetric encryption which is used for encryption of data at rest. If we believe that quantum computers will be available in the next 10 years, we need to think about protecting the data that we encrypt today to ensure that it is protected beyond the time that quantum computers become available. While researchers are looking at how to prevent this with future quantum proof encryption schemes, NIST recommends that we increase the key sizes of symmetric encryption and hashing algorithms for data at rest to make it harder for brute force attacks. The current key size in use is 256, which represents 2^256 which is an astronomically high number of combinations, however a quantum computer with enough Qubits could blow through that in an instant. The National Security Agency (NSA) proposes a “Rule of Two” to double encrypt sensitive data with two keys that are generated completely independently.


In the meantime, Hitachi will be helping our customers solve many of the problems that are beyond the capabilities of standard computers with their innovative CMOS Annealing computers. These computers will be used for “powering good” helping to solve many societal issues.



Our Chief Marketing Officer Jonathan Martin, published a blog to call attention to the fact that the old ways of data management are getting in the way of transforming data into Value. He called for customers to shift their strategies towards more collaborative, unified, and automated processes that better leverage data to deliver outcomes. The way to accomplish this is through DataOps, which he defines as data management for the AI era.


It unleashes data’s ultimate potential by automating the processes that enable the right data to get to the right place at the right time – all while ensuring it remains secure and accessible only to authorized employeesWe know it works because we use DataOps in our own operations.”


In the last half of last year we applied DataOps to consolidate our enterprise reporting systems into one data lake. 32 data domain owners from governance, pricing, services, procurement, partners, sales, marketing, supply planning and IT were involved. Collaboration was the key as they focused on these four pillars of our data management program.


Under the leadership of Ram Roa, Vice President, Technology and Analytics, and the collaboration of the domain users, the data lake was in production in 7 months and this data management program was in place.  This was a major accomplishment due to the lack of documentation, legacy tribal knowledge, lack of awareness of the differences between operational and analytic reporting, multiple definitions from different domains, lack of basic data discipline, and whole new technology stacks for volume and scale processing. Key to the success of a data lake or DataOps program is the active participation of the business domain owners. You can’t sit back and let some one else make decisions for you about security, data definitions, and business rules and you can’t make these decisions in isolation. Once you define and agree on the business requirements, the data engineers, analysts and architects can help with DataOps tools to automate and deliver data quality, business intelligence and data architectures.


Business owners reported immediate benefits from this implementation. Fatima Hamad, Sr. Director, Strategic Pricing and Analytics, reported: “Leveraging our new Data Lake, we are enabling our Sales, Finance and Business users and customers to effectively track our performance and predict our forward-looking business health.  Using the new, data-driven pricing guidance and approvals capabilities will allow more quotes to be approved at the account, district and country levels, while driving better pricing consistency, execution and outcomes for Hitachi Vantara”


We ran measurements across various use cases that included, Pricing Analytics, Master Data of Accounts, Demand Planning, and Data Integration Hub/ Data-as-a-Service (Supply Chain Services). The results speak for themselves:

  • Improved data connectivity across 11 data sources from core to cloud
  • The Data Catalog had over 10k data tables with 350k attributes
  • Scalable processing with over 1000 ETL jobs a day
  • Platform scalability enabled 1.8M queries over 6 months
  • Enhanced analytics for 110+ users with 2.2k executions over 6 months 


DataOps is the Art of Harmonizing People, Process and Technology


DataOps is a process-oriented methodology that combines existing tools like data warehouses and data lakes with new technologies like AI and ML. And, like DevOps, it merges analytic, data, and business teams together to improve quality and predictability of business decisions and reduce time to value. One of the key enablers for the DataOps process was Pentaho’s data integration and pipeline orchestration. However, the primary key to success is leadership and the active participation of the business domain users, working with the data engineers, data scientists, and data analysts.


Greater harmony between your people, processes and technology means greater value from your data. Visit our website to see how you can gain your DataOps advantage today.

Following the revenue misses announced by Pure and NetApp, in the prior week, Dell and Nutanix reported missed revenue targets on May 30.


Dell Technologies fiscal Q1 earnings missed the consensus revenue target of $21.9 billion compared to the Wall Street expectation of $22.44 billion. Dell’s Infrastructure Solutions Group revenue for the first quarter was $8.2 billion, a 5 percent decrease year-over-year. This was driven by a 1 percent decrease in storage revenue to $4.0 billion and a 9 percent decrease in server and networking revenue to $4.2 billion.


Nutanix reported their Q3 FY19 earnings, missing revenue estimates, and declining slightly y/y, but down 14% q/q, at $287.2M, versus estimates of $297.2M.  Product revenues were down 22% q/q at $184.8M showing that hyperconverged infrastructure storage was not immune to the slow down.


An International Data Corporation (IDCWorldwide Quarterly Enterprise Storage Systems Tracker report that was published on March 4, 2019, seemed to indicate that customers were moving to hybrid storage and that enterprises are starting to look at their total storage environment and looking at the operational aspects of their data to maximize their business outcomes.  That report also indicated that revenues for original design manufacturers (ODMs) selling directly to hyperscale datacenters (public cloud) declined 1.5% year over year in 4Q18 to $2.7 billion due to significant existing capacity. Enterprises are not rushing to the public cloud for storage capacity. This new report from Nutanix also shows weakness in the hyper-converged storage market, as a result, all the areas of storage from All Flash Array vendors like Pure, midrange storage vendors like NetApp and Dell, and Hyperconverged storage vendors saw a decline in sales.


Several of the vendors indicated that the sales cycles have extended. Dell reported that server/network/storage was down “due to linearity of orders” which appears to mean that deals did not accelerate towards the end of the quarter as they usually do. This is not surprising since Dell is going through a massive product effort to consolidate their multiple midrange products; Unity, XtremIO, SC and ScaleIO into a single product, which they say will be delivered this year.


The sales organizations came under criticism. NetApp blamed some of the reasons for their shortfall on suboptimal sales resource allocation. Pure expanded their sales force to go after larger accounts, but the results have not followed. Nutanix referenced “sales friction” due to changes in Americas sales leadership and lack of familiarization with their new subscription model.


Enterprises are beginning to extend their purchase decisions, thinking twice about their storage options and evaluating hybrid storage solutions that give them more choice. This indicates that they realize that their problem is not about storing data, but about unlocking the information that exists in the data they have. This takes DataOps to help them eliminate data silos and dark data, and consolidate onto data lakes that are curated and cataloged for easier search and analytics. Fast All Flash Arrays and low cost hyperconverged arrays will not solve that problem.


Yesterday, MapR, a pioneer in cloud data management since 2009, announced that it plans to cut 122 jobs and close its headquarters which is located just a few miles from our location in Silicon Valley. Just a few years ago, MapR was considered one of the Unicorns (startups that were valued at a billion dollars or more) in the Big Data Analytics market which is a booming market. MarketWatch estimates that the global big data market is expected to grow at a CAGR of 22.4% during the forecast period and reach USD 200 billion by 2024. MapR is not the only Big Data Analytics vendor that is laying off workers. Cloudera and Horton Works recently merged and went through layoffs earlier this year.


Some analysts attribute this to a natural consolidation of the “surplus of enterprise Hadoop companies” after the hype and frenzy of the VC community reached its peak. The cost of sales and services are also very high and the pay per use model puts a squeeze on cash flow. The difficulty of monetizing a business based upon open-source software is challenging. Analysts are also predicting that the Big Data Analytics ecosystem will converge around AWS, Azure, and Google Cloud and many of these smaller companies will be acquired or displaced by the large public cloud vendors.


While layoffs are disturbing, there should be plenty of opportunities for these workers to find work, especially since most companies are looking to Big Data Analytics to build competitive advantage. I am hopeful that these smaller companies survive since we need them as part of our hybrid cloud ecosystem. While we count on public cloud for our operational systems like marketing and CRM, we keep our enterprise systems like finance and supply chain on private cloud and integrate many of these Big Data Analytics products with our own products like Pentaho to provide best of breed solutions. Our enterprise data systems need to be behind our firewalls to ensure security and governance of our critical enterprise data. Many Big Data Analytics products also enable movement of applications and data across public and private clouds and leverage public cloud services like containers and server-less computing. We incorporate many of these products in our DataOps offerings.


There should be plenty of opportunities for these companies to grow and prosper if they can survive the early startup costs. Since unicorns are funded by VC investors, many will be bought out as soon as they become successful. However, the people who work in these startups, even those that don't survive, will continue to work and bring the innovative skills they developed in Big Data Analytics to enrich the overall market. 


Today’s May 28, 2019, Wall Street Journal reports that data challenges are halting AI projects. They quoted IBM executive Arvind Krishna as saying, “ Data-related challenges are a top reason IBM clients have halted or canceled artificial-intelligence projects”.” About 80% of the work with an AI project is collecting and preparing data. Some companies aren’t prepared for the cost and work associated with that going in”, he added.


This is not a criticism of IBM’s AI tools. Our AI tools would have the same problems if the data was not collected and curated properly. This is supported by a report this month by Forrester Research Inc. which found that data quality is among the biggest AI project challenges. This report said that companies pursuing such projects generally lack an expert understanding of what data is needed for machine-learning models and struggle with preparing data in a way that’s beneficial to those systems.


At Hitachi Vantara, we appreciate the importance of preparing data for analytics, and we include that in our DataOps initiatives. DataOps is a framework of tools and collaborative techniques that enable data engineering organizations to deliver rapid, comprehensive and curated data to their users. It is the intersection of data engineering, data integration, data governance and data security that attempts to unify all the roles and responsibilities in the data engineering domain by applying collaborative techniques to a team that includes the data scientists and the business analysts. We have a number of tools like Pentaho Data Integration, PDI, and Hitachi Content Platform, HCP, but we also include other best of breed tools to fit different analytic and reporting requirements.


Its time to press your DataOps Advantage

The latest International Data Corporation (IDCWorldwide Quarterly Enterprise Storage Systems Tracker, was published on March 4, 2019. It showed vendor revenue in the worldwide enterprise storage systems market is still increasing: 7.4% year over year to $14.5 billion during the fourth quarter of 2018 (4Q18). Total capacity shipments were up 1.7% year over year to 92.5 exabytes during the quarter. The total All Flash Array (AFA) market generated just over $2.73 billion in revenue during the quarter, up 37.6% year over year; and the Hybrid Flash Array (HFA) market was worth slightly more than $3.06 billion in revenue, up 13.4% from 4Q17.


The Revenue generated by the group of original design manufacturers (ODMs) selling directly to hyperscale datacenters (public cloud) did decline 1.5% year over year in 4Q18 to $2.7 billion due to significant existing capacity. The report noted the increasing trend to hybrid clouds as enterprise customers place a higher priority on ensuring that storage systems support both a hybrid cloud model as well as increasingly data thirsty on-premise compute platforms. OEM vendors selling dedicated storage arrays are addressing demand from businesses investing in both on-premises and public cloud infrastructure. The move to hybrid storage means that enterprises are starting to look at their total storage environment and looking at the operational aspects of their data to maximize their business outcomes.


As a result, the revenue misses reported this week by Pure and NetApp were not surprising.


On Wednesday May 22, 2019, Pure Storage announced disappointing Q1 results and reduced their fiscal year guidance downward. The stock has tumbled down more than 20% in after-hours and early next-day trading following the release of the report. Pure Storage is simply that: purely storage and their prospects are directly tied to the storage market as that is the only thing they sell. It is even more restricted in that it is an all flash play which is less than 19% of the 14.5B enterprise storage market in 4Q 2018. As companies start to look at their total data environments, pure play companies such as Pure will not be as relevant to customers in the future.


After the market close on May 22, 2019, NetApp announced disappointing Q4 and full fiscal year 2019 results, missing on consensus revenue estimates, consensus earnings per share estimates, and providing lower-than-expected guidance for both revenue and EPS for the upcoming quarter. NetApp blamed their revenue performance on a variety of issues - sub-optimal sales resource allocation, declining OEM business, decreased ELA renewals – but also currency and macroeconomic headwinds, extended purchase decisions and sales cycles. While NetApp has a broader portfolio than Pure, it is still primarily a midrange storage play with a lot of legacy storage in the market.


Customers expect more than a place to store their data. While a faster flash storage array can shave milliseconds off an I/O response time, it doesn’t help your bottom line if the right data is not in the right place at the right time. The fact that enterprises are extending their purchase decisions, thinking twice about purpose built OEM solutions, and evaluating hybrid storage solutions, indicates that they realize that their problem is not about storing data, but about unlocking the information that exists in the data they have. This takes DataOps.


DataOps is needed to understand the meaning of data as well as the technologies that are applied to the data so that data engineers can move, automate and transform the essential data that data consumers need. Hitachi Vantara offers a proven, end-to-end, DataOps methodology that lets businesses deliver better quality, superior management of data and reduced cycle time for analytics. At Hitachi Vantara we empower our customers to realize their DataOps advantage through a unique combination of industry expertise and integrated systems.

Hu Yoshida

AI and Solomon's Code

Posted by Hu Yoshida Employee May 19, 2019

There once was a king of Israel named Solomon, who was the wisest man on earth. One day he was asked to rule between two women, both claiming to be the mother of a child. The arguments on both sides were equally compelling. How would he decide this case? He ordered that the child be cut in halve so that each woman would have an equal portion of the child. One mother agreed while the other mother pleaded that the baby be spared and given to the care of the other women. In this way King Solomon determined who was the real mother.


If we had submitted this to an AI machine would the decision have been any different?


Solomon’s Code is a book that was written by Olaf Groth and Mark Nitzberg and published in November of last year, so it is fairly up to date on the recent happenings in the AI world. “It is a thought provoking examination of Artificial Intelligence and how it will reshape human values, trust, and power around the World.” I greatly recommend that you read this book to understand the potential impact AI will have on our lives, for good or bad.


The book begins with the story of Ava who is living with AI in the not too distant future. AI has calculated her probability of developing cancer like her mother and has prescribed a course of treatment tied to sensors in her refrigerator, toilet, and ActiSensor mattress. Her wearable personal assistant senses her moods. The Insurance company and her doctor put together a complete treatment plan that would consider everything from her emotional well-being, her work activities, and even the friends that she would associate with. Her personal assistant makes decisions for her as to where she goes to eat, what music she listens too, and who she calls for support.  


As we cede more of our daily decisions to AI, what are we really giving up? Do AI systems have biases? If AI models are developed by data scientists whose personality, interests and values may be different than an agricultural worker or a factory worker, how will that influence the AI results? What data is being used to train the AI model? Does it make a difference if the data is from China or the United Kingdom?


The story of Solomon is a cautionary tale. He built a magnificent kingdom, but the kingdom imploded due to his own sins and it was followed by an era of violence and social unrest. “The gift of wisdom was squandered, and society paid the price”


The Introduction to this book ends with this statement.


‘Humanity’s innate undaunted desire to explore, develop, and advance will continue to spawn transformative new applications of artificial intelligence. The genie is out of the bottle, despite the unknown risks and rewards that might come of it. If we endeavor to build a machine that facilitates our higher development -rather than the other way around – we must maintain a focus on the subtle ways AI will transform values, trust, and power. And to do that, we must understand what AI can tell us about humanity itself, with all its rich global diversity, its critical challenges, and its remarkable potential.”


This book was of particular interest to me since Hitachi’s core strategy is around Social Innovation.Where we will operate business to create three value propositions: improving customer’s social values, environmental values, and economic values. In order to do this we must  be focused on understanding the transformative power of technologies like AI for good or bad.                     



Rule-based fraud detection software is being replaced or augmented by machine-learning algorithms that do a better job of recognizing fraud patterns that can be correlated across several data sources. DataOps is required to engineer and prepare the data so that the machine learning algorithms can be efficient and effective.


Fraud detection software developed in the past have traditionally been based on rules -based models. A 2016 CyberSource report claimed that over 90% of online fraud detection platforms use transaction rules to detect suspicious transactions which are then directed to a human for review. We’ve all received that phone call from our credit card company asking if we made a purchase in some foreign city.


This traditional approach of using rules or logic statement to query transactions is still used by many banks and payment gateways today and the bad guys are having a field day. In the past 10 years the incidents of fraud have escalated thanks to new technologies, like mobile, that have been adopted by banks to better serve their customers. These new technologies open up new risks such as phishing, identity theft, card skimming, viruses and Trojans, spyware and adware, social engineering, website cloning and cyber stalking and vishing (If you have a mobile phone, you’ve likely had to contend with the increasing number and sophistication of vishing scams). Criminal gangs use malware and phishing emails as a means to compromise customers’ security and personal details to commit fraud. Fraudsters can easily game a rules-based system. Rule based systems are also prone to false positives which can drive away good customers. Rules based systems become unwieldy as more exceptions and changes are added and are overwhelmed by today’s sheer volume and variety of new data sources.


For this reason, many financial institutions are converting their fraud detection systems to machine learning and advanced analytics and letting the data detect fraudulent activity.Today’s analytic tools with modern compute and storage systems can analyze huge volumes of data in real time, integrate and visualize an intricate network of unstructured data and structured data, and generate meaningful insights, and provide real-time fraud detection.


However, in the rush to do this, many of these systems have been poorly architected to address the total analytics pipeline. This is where DataOps comes into play. A Big Data Analytics pipeline– from ingestion of data to embedding analytics consists of three steps


  1. Data Engineering: The first step is flexible data on-boarding that accelerates time to value. This requires a product that can ETL (Extract Transform Load) the data from the acquisition application which may be a transactional data base or sensor data and load it using a data format that can be processed by an analytics platform. Regulated data also needs to show lineage, a history of where the data came from and what has been done with it. This will require another product for data governance.
  2. Data Preparation: Data integrationthat is intuitive and powerful. Data typically goes through transforms to put it into an appropriate format, this can be called data engineering and preparation. This is colloquially called data wrangling. The data wrangling part requires another set of products.
  3. Analytics: Integrated analytics to drive business insights. This will require analytic products that may be specific to the data scientist or analyst depending on their preference for analytic models and programming languages.


A data pipeline that is architected around so many piece parts will be costly, hard to manage and very brittle as data moves from product to product. 


Hitachi Vantara’s Pentaho Business Analytics can address DataOps for the entire Big Data Analytics pipeline with one flexible orchestration platform that can integrate different products and enable teams of data scientists, engineers, and analysts to train, tune, test and deploy predictive models.


Pentaho is open source-based and has a library of PDI (Pentaho Data Integration) connectors that can ingest structured and unstructured data including MQTT (Message Queue Telemetry Transport) data flows from sensors. A variety of data sources, processing engines, and targets like Spark, Cloudera, Hortonworks, MAPR, Cassandra, GreenPlum, Microsoft and Google Cloud are supported.  It also has a data science pack that allows you to operationalize models trained in Python, Scala, R, Spark, and Weka.  It also supports deep learning through a TensorFlow step.  And since it is open, it can interface with products like Tableau, etc. if they are preferred by the user. Pentaho provides an Intuitive drag-and-drop interface to simplify the creation of analytic data pipelines. For a complete list of the PDI connectors, data sources and targets, languages, and analytics, see the Pentaho Data Sheet.


Pentaho enables the DataOps team to streamline the data engineering, data preparation and analytics process and enable more citizen data scientists that Gartner defines in “Citizen Data Science Augments Data Discovery and Simplifies Data Science” . This is a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics. Pentaho’s approach to DataOps has made it easier for non-specialists to create robust analytics data pipelines. It enables analytic and BI tools to extend their reach to incorporate easier accessibility to both data and analytics. Citizen data scientists are “power users” who can perform both simple and moderately sophisticated analytical tasks that would previously have required more expertise. They do not replace the data science experts, as they do not have the specific, advanced data science expertise to do so, but they certainly bring their individual expertise around the business problems and innovations that are relevant.


In fraud detection the data and scenarios are changing faster than a rules based system can keep track of, leading to a rise in false positive and false negative rates which is making these systems no longer useful. The increasing volume of data can mire down a rules based system, while machine learning gets smarter as it processes more data.  Machine Learning can solve this problem since it is probabilistic and uses statistical models rather than deterministic rules. The machine learning models need to be trained using historic data. The creation of rules is replaced by the engineering of features which are input variables related to trends in historic data. In a world where data sources, compute platforms, and use cases are changing rapidly, unexpected changes in data structure and semantics (known as data drift) require a DataOps platform like Pentaho Machine Learning Orchestration to ensure the efficiency and effectiveness of Machine learning.


You can visit our website for a hands on demo for building a data pipeline with Pentaho and see how easy Pentaho makes it to “listen to the Data.

Power of two.png

It’s been just over two months since our strategic partner Cisco and Hitachi Vantara announced the further strengthening of our 15+ year relationship with the launch of our jointly developed Cisco and Hitachi Adaptive Solutions for Converged Infrastructure.

I thought it would be good to provide you with some recent updates as well as answer some questions that have come up from our customers and partners.

Why this solution now?

Well, with all sincerity, it’s all about you. At Hitachi Vantara, we realize that to best serve our customers and enable them to meet their IT and business and data management objectives, we must embrace and recognize complementary technologies.

In this case, we chose to partner with Cisco for its industry-leading technologies, combined with our customer-proven Hitachi Virtual Storage Platform (VSP) all-flash and hybrid arrays and AI operations software - the result is a comprehensive converged solution for truly demanding virtualized workloads and enterprise applications.

It’s this “Power of Two” philosophy that also encompasses a company-wide and executive commitment in the partnership to benefit our customers, for the long term.

This is especially critical in the dynamic business environment that our customers are facing today – from compliance demands, resource limitations, and resource constraints.

According to Enterprise Strategy Group research, “38% of organizations have a problematic shortage of existing skills in IT architecture/planning”

In a previous blog I wrote about our Continuous Business Operations capability that enables customers to achieve strict zero RTO/RPO requirements.

We’ve extended this capability to Cisco and Hitachi Adaptive Solutions for Converged Infrastructure, specifically for VMware vSphere virtualized environments.

We’ve enabled the disaster recovery orchestration to be much easier for customers via Hitachi Data Instance Director, which removes the complexity and simplifies Global Active Device deployment to a series of clicks (vs complex CLI and scripting).

Cisco Hitachi Adaptive .png

Cisco and Hitachi Adaptive Solutions for Converged Infrastructure. Meet in the Channel For Flexibility

Cisco and Hitachi Vantara have a select group of channel partners that can customize this solution to specific customer requirements, thereby enabling you to choose the validated Cisco networking, Cisco servers, and Hitachi Virtual Storage Platform configurations that best fit your needs, all with the assurance of a fully qualified and supported solution by Hitachi and Cisco.

l invite you to join my colleague, Tony Huynh, our solutions marketing manager for Hitachi Vantara, team up with the Enterprise Strategy Group for an upcoming webinar on May 15th, 2019 at 9AM PST to 10AM PST where we will discuss this and other items of interest.

Webinar Registration:

ESG Analyst Report:

Read this ESG white paper to know how Cisco and Hitachi Adaptive Solutions for Converged Infrastructure can help organizations achieve digital transformation milestones on a reliable, secure infrastructure that ensures access to their data.

More information on Cisco and Hitachi Adaptive Solutions for Converged Infrastructure

Last year around this same time, I posted a blog about the women of Hitachi Vantara, and featured four of the women that I worked with on a regular basis here in Santa Clara. This year, I thought I would like to introduce three other women who I have known and worked with internationally. While these three women represent different countries and cultures, they all share the same attributes of the four that I profiled last year. They all know how to lead, innovate, and succeed.



Merete Soby has been very successful as the Country Manager for Hitachi Vantara in Denmark for the past 11 years. When she joined, what was then, Hitachi Data Systems-Denmark, we were a solid storage company with 15 -20% market share. Within 5 years, under her leadership, HDS Denmark was able to grow market share to 45-50% and became the number 1 storage vendor in the Danish market. Over the years, the Hitachi Vantara team in Denmark has won many big named accounts and created a strong winning culture in the company. The journey continued with new solutions, expanding beyond storage to converged solutions, solution for unstructured data as HNAS and Object solutions, analytics solutions, and REAN cloud services.


When I visit Denmark and talk to people in the industry, they always have great things to say about our team in Denmark. The first thing they comment on is the team’s commitment to customer support and engagement. A lot of credit is given to Merete who is described as an engaging, passionate, involved executor who empowers people to become better at what they do. When one of her team members that I worked with fell ill, Merete sought me out at a busy conference to assure me of that team member’s recovery, showing her awareness and concern for people and relationships.


I asked Merete if she ever felt limited in her career because she was a woman. She replied that she did not feel limited. “I believe that due to my relative young age, first as sales manager (26 years old) and later on as country manager in HDS (32 years old) I felt I needed to be a bit better and more prepared in every aspect of my business, but not directly because of my gender.”


Merete is a mom to three kids, 11 year old twins and an 8 year old boy. Her children have made her very focused on having the right work life balance, which she feels has increased her performance at work. She says that she does not mentor her children directly, “I show them how to behave and act in life by my own behavior. I show them to prioritize family and our values, by living them myself.” I believe that same philosophy extends to her leadership at work.



Basak Candan joined the Hitachi Vantara team in Turkey two and half years ago as Office Manager and Marketing Coordinator. Last September she was promoted to be the Field Marketing Manager for Turkey and Middle East. She has the awesome responsibility to drive the end-to-end Field Marketing planning and execution in Turkey and the Middle East working very closely with the sales teams to win new business and grow revenue. She is also taking the lead in the Emerging Marketing team to support Brand Leadership Programs to ensure that we are building consistent and relevant messages across Emerging EMEA markets for our entire portfolio.


I recently worked with Basak when I was invited to participate in the World Cities Congress in Istanbul. She helped me prepare for my panel discussions at the conference and arranged for me to visit customers in Istanbul and Ankara. She was very helpful in helping me understand the marketing environment in Turkey. On the day before I was to fly to Ankara, she noticed that I did not have a top coat and she expressed concerned for my well-being. That evening, much to my surprise, the hotel concierge delivered a top coat to my room that they loaned to me for my trip to Ankara. I was very touched by Basak’s concern, creativity, and attention to detail.


Basak told me that If anyone had asked her as a child what she wanted to be when she grew up, it probably would not have been anything to do with technology. Before joining Hitachi, she had different sales and marketing roles in a number of large luxury hotel chains for more than 7 years. Her formal education was in hospitality and marketing, but she has been able to transfer those skills into a technology career.


Thanks to some strong, positive, influential women in her life who steered her in that direction, the real transformation started for her when she began working at Hitachi Vantara. She said working with Hitachi Vantara on storage, cloud, IoT, and Big Data Analytics, was like discovering a new planet. With the help of her Hitachi Manager, she applied to Boğazici University, which is among the top 200 universities in the world and was accepted into a Digital Marketing and Communication Program. There she worked on a project analyzing JetBlue Airways’ marketing campaigns on how they could digitally transform their marketing. Her project was judged by a jury and won a special prize. That gave her encouragement to grow and show her power in technical marketing. Basak typifies the type of person who is a self-starter. Someone who is capable of recognizing and seizing new opportunities. Self-starters immerse themselves in new endeavors and remain passionate about pursuing their vocation and honing their skills.



When I need help in understanding the tough technical question about 3 data center disaster recovery or the latest mainframe features for Geographically Dispersed Parallel Sysplex™, I call on the expert, Ros Schulman, and she is always up on the latest technologies and business processes for disaster recovery.


Ros has been with Hitachi Data Systems and Hitachi Vantara for over 20 years. In the last 9 years she has filled director level roles in product management, technical sales support, business development, and technical marketing with extensive skill sets around Data Protection (Replication and Backup), Business Continuity and Resiliency. Her experience in analyzing customer requirements, technologies and industry trends have helped to maximize revenue growth in these areas. She is always in demand to speak at customer events and industry forums.


Ros was born in London and went to school there. She started her career as a computer operator at the age of 18 in local government and later became a system programmer on MVS at a time when very few women were in that field. She later moved to the United States and continued her technical career working for both the vendor and customer sides. When I joined Hitachi Data Systems, Ros was already recognized as the technical expert in operating systems and disaster recovery. She is passionate about our storage and systems technology and is generous in sharing her experience and insights with others. She is not shy. I have seen this petite lady going toe-to-toe with several heavyweight MVS systems programmers debating the benefits of different systems.


When I asked her what her advice would be for women considering a technical career, she said, “It’s something you have to be passionate about. I still believe it’s much harder to move ahead, so you have to be willing to love what you do. I would also recommend that you take some business classes, as in today’s digital age, you need a lot more than just technical skills. My motivation is learning and growing, this industry fascinates me, when I started, we used disk drives that were 20MB in size and MF had less than 2GB memory and look where we are today. I do not know of another career where things have changed so radically and continue to change and have now been embraced in every facet of our lives.”


It is one thing to have the knowledge and skills to be technical. However, it requires passion and enthusiasm to excel in a technical area; and be recognized as the go-to expert. Ros Schulman is my go-to expert.


Hitachi Vantara recognizes the value of diversity. The Women of Hitachi play an important role in defining our culture and contributing to our success as a technology company. Women are well represented in our sales and marketing organizations, as well as in product management and technical support roles. Our CIO, CFO, and Chief Human Resource Officer are women. Women account for more than 25% of our IT team – just over the industry average – according to CIO Renée McKaskle.


A recent Wall Street Journalblog reports that:


“She (Renee McKaskle) credits the Hitachi Inc. subsidiary’s “double” bottom-line goal, saying “a healthy bottom line is important but doing what is right for society is important, too.”

To that end, she said, the company supports several global and local diversity initiatives, including women’s summits and mentoring programs.

“These programs have been critical to forging the diversity we have in place today, with positive indicators that this will continue to increase,” she added.”


One of the things I enjoy most about my job is the ability to work with wide variety of people, and see them in action, celebrate their successes, and hear their stories. I hope you enjoyed hearing about these women who have inspired me and will perhaps inspire you as well.

According to the Harvard Business Review, "Cross-industry studies show that on average, less than half of an organization’s structured data is actively used in making decisions—and less than 1% of its unstructured data is analyzed or used at all. More than 70% of employees have access to data they should not, and 80% of analysts’ time is spent simply discovering and preparing data. Data breaches are common, rogue data sets propagate in silos, and companies’ data technology often isn’t up to the demands put on it." That was in a report back in 2017. What has changed since then?


Few Data Management Frameworks are Business Focused

Data management has been around since the beginning of IT, and a lot of technology has been focused on big data deployments, governance, best practices, tools, etc. However, large data hubs over the last 25 years (e.g., data warehouses, master data management, data lakes, Hadoop, Salesforce and ERP) have resulted in more data silos that are not easily understood, related, or shared. Few if any data management frameworks are business focused, to not only promote efficient use of data and allocation of resources, but also to curate the data to understand the meaning of the data as well as the technologies that are applied to the data so that data engineers can move and transform the essential data that data consumers need.


Introducing DataOps

Today more customer are focusing on the operational aspects of data rather than on the fundamentals of capturing, storing and protecting data. Following the success of DevOps (a set of practices that automates the processes between software development and IT teams, in order that they can build, test, and release software faster and more reliably) companies are now focusing on DataOps. DataOps can best be described by Andy Palmer, who coined the term in 2015, “The framework of tools and culture that allow data engineering organizations to deliver rapid, comprehensive and curated data to their users … [it] is the intersection of data engineering, data integration, data quality and data security. Fundamentally, DataOps is an umbrella term that attempts to unify all the roles and responsibilities in the data engineering domain by applying collaborative techniques to a team. Its mission is to deliver data by aligning the burden of testing together with various integration and deployment tasks.”


At Hitachi Vantara we have been applying our technologies to DataOps in four areas: Hitachi Content Platform, Pentaho, Enterprise IT Infrastructure, and REAN Cloud.


  • HCP: Object storage for unstructured data through our Hitachi Content Platform and Hitachi Content Intelligence software. Object storage with rich meta data, content intelligence, data integration, and analytics orchestration tools enable business executives to identify data sources, data quality issues, types of analysis and new work practices needed to use those insights

HCP DataOps.png


  • Pentaho: Pentaho streamlines the entire machine learning workflow and enables teams of data scientists, engineers and analysts to train, tune, test and deploy predictive models.

Pentaho DataOps.png

  • IT Infrastructure: Secure Enterprise IT Infrastructure that extends across edge to core to Cloud, based on REST APIs for easy integration with third party vendors. This gives us the opportunity to not only connect with other vendor’s management stacks like ServiceNow, but also apply analytics and machine learning and automate deployment of resources through REST APIs.


IT Data Ops.png


  • REAN Cloud: A cloud agnostic managed services platform for DataOps in the cloud. Highly differentiated offerings to migrate applications to the cloud, modernize applications to leverage the cloud offerings for data warehouse modernization, predictive agile analytics, and real time IoT. REAN Cloud also provides ongoing managed services.

REAN Data Ops.png


  • Big Data systems are becoming a center of gravity in terms of storage, access and operations.
  • Businesses are looking to DataOps, to speed up the process of turning data into business out comes.
  • DataOps is needed to understand the meaning of the data as well as the technologies that are applied to the data so that data engineers can move, automate and transform the essential data that data consumers need.
  • Hitachi Vantara provides DataOps tools and platforms through
    • Hitachi Content Platform,
    • Pentaho data integration and analytics orchestration,
    • Infrastructure analytics and automation
    • REAN Cloud migration, modernization, and managed services.


Blak Hole Pic.png

Grad Student Katie Bouman uses DataOps to capture first picture of a black hole.

Hu Yoshida

Tastes Like Chicken

Posted by Hu Yoshida Employee Apr 2, 2019

In 2050, the population of the world is expected to be 9 billion versus the 7 billion today. The challenge will be to feed 2 billion more people with less arable land, less water, and less farmers. One of the increasing demands will be for protein as people demand richer foods.


If you are over 50 years old, you may remember a satirical American comic strip that appeared in many newspapers in the United States, Canada and Europe, featuring a fictional clan of hillbillies in the impoverished mountain village of Dogpatch, USA. Written and drawn by Al Capp. In one of the episodes of this series, the young hero Lil Abner discovers the Shmoo in a hidden valley and introduces them to Dog Patch and the rest of the world. The Shmoo was a lovable creature that laid eggs, gave milk, loved to be eaten and tasted like any meat desired, chicken when fried, steak when broiled, pork when roasted, and catfish when baked. They multiplied like rabbits but required no feed or water, only air to breath. The perfect solution to world hunger.




Today we have something that is close to the Shmoo. That is today’s broiler chicken. In 1957 the average chicken weighed about 1 KG or 2.2 lb. Today a commercially grown broiler chicken weighs 9.3 lbs. after 8 weeks. It only takes 2.5 lbs. of feed and 468 gallons of water to produce one lb. of chicken meat, which is much more efficient than the production of a lb. of pork or beef, with much less waste, less space and less CO2 emissions.




IT appears that chicken will be the meat for the masses.


IoT will help to increase agriculture efficiencies, reduce spoilage, and increase the freshness and nutritional content of healthy foods. The problem will not be about the production of foods, but how to build the infrastructure to provide equal access to that food to all 9 billion people. According to a BoA, Merrill Lynch, Global Investment Strategy report, populous countries like Nigeria, Pakistan and Kenya spend 47 to 57% of their household expenditure on food compared to 7% in the US and the UK.


Food Costs.png


To finish the story of the Shmoo. The Shmoo became so popular that people no longer needed to go to the stores to buy food. This caused a series of images reminiscent of the Wall Street Crash of 1929, and the Captains of Industry banded together to exterminate the Shmoo. Two of the Shmoo managed to escape to go back to their hidden valley in the mountains. Wikipedia described the Shmoo sequence as “massively popular, both as a commentary on the state of society and a classic allegory of greed and corruption tarnishing all that is good and innocent in the world. In their very few subsequent appearances in Li'l Abner, Shmoos are also identified by the U.S. military as a major economic threat to national security.”


Mr. Higashihara, our Hitachi CEO, always reminds us of the Light and Shadow of Digital Transformation. With every advancement in digital transformation we must be mindful of the possible shadows which may negate our vision for social innovation.

Data Hole.png

The internet is normally accessed like a pyramid where a URL may be accessed by hundreds, thousand, even millions of users. Now, with the Internet of Things, we have a multitude of “things” sending millions of records to the Internet. In a sense the internet is being turned inside out with millions more data points being ingested than being served up thanks to the sensors that enable IoT.


An IoT device like an autonomous vehicle may have hundreds of sensors generating thousands of GB of data. It is estimated that a single autonomous car will collect over 4000 GB of data per day! The reason for this large amount of data is that IoT devices are concerned with change. In order to track change, the data must be collected as a time series, where new data is always added and not updated. It allows us to measure change and analyze how something has changed in the past, how it is changing in the present, and predict how it may change in the future. By focusing on change, we can understand how a system, process, behavior changes over time and automate the response to future changes.


The down side is that time series data generates a lot of data very rapidly. More data than can normally be absorbed by transactional or NoSQL data bases. This has spawned a rapidly developing market for time series data bases (TSDB). TSDBs are fine tuned for time series data. This fine tuning results in efficiencies around performance improvements, including higher ingest rates, faster queries at scale, and better data compression. TSDBs also include functions and operations common to timeseries data analysis such as data retention policies, continuous queries, flexible time aggregations, which results in improved user experience with time series data. You know that time series data bases are mainstream when AWS gets in the game. AWS has announced Amazon Timestream a fast, scalable, fully managed time series database service for IoT and operational applications that makes it easy to store and analyze trillions of events per day at 1/10th the cost of relational databases. The following chart from DB-Engines, November, 2018, shows the growing acceptance of TSDBs compared to other forms of data bases. In 2018,



TSDB graph.png


If an autonomous automobile can generate 4000GB of data per day, imagine what a more complex system like an oil refinery would produce. We recently worked with a large oil refinery in Europe which had thousands of sensors installed on equipment including heat exchange networks, power plants, pipelines, and many other systems collectively generating millions of data points every second. Their operators, process engineers, IT, and data scientists were collecting the data manually from these systems as well as Oracle, SQL Server, and SAP and used tools such as Excel to derive their insights. Data silos inhibited the collaboration between management, scientists, engineers and IT resulting in short-sighted and/or incorrect decisions. This was neither efficient, re-usable, nor scalable.

Oil Refinery.png


A TSDB, OpenTSDB, was used to collect all the sensors into a data lake. Pentaho Data Integration was used to connect to OpenTSDB, eliminating the need for 3rdparty vendors and leverage distributed compute. OpenTSDB has extensive, REST based, open APIs which gave our Pentaho engineers huge flexibility to retrieve data extremely fast and parse within Pentaho.The kind of analytics used varied from simple correlation, visualization and ML for predicting values. That being said, Pentaho’s value proposition was more on the data integration part; the data acquisition, extraction, blending, data science etc.; which consumes 80% of a data scientists time spend over mining and modeling. Pentaho also enabled the process engineers, IT, and data scientists to work as a team and enabled business users with self-service consumption of operational data. This not only led to better decisions, but also reduced the lead time from 2 days to less than 10 minutes.


Time seriesdata are simply measurements or events that are tracked, monitored, down sampled, and aggregated over time. This could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data. Time series data can be analyzed to understand the underlying structure and function that produce the observations. A mathematical model can be developed to explain the data in such a way that prediction, monitoring, or control can occur. As the internet turns inside out with Time series data bases, Hitachi Vantara’s Pentaho will be there to scale with the explosion of data and provide integration, analysis, and visualization for greater insights into current and new time series applications.


A TSDB, OpenTSDB, was used to collect all the sensors into a data lake. Pentaho Data Integration was used to connect to OpenTSDB, eliminating the need for 3rdparty vendors and leverage distributed compute. OpenTSDB has extensive, REST based, open APIs which gave our Pentaho engineers huge flexibility to retrieve data extremely fast and parse within Pentaho.The kind of analytics used varied from simple correlation, visualization and ML for predicting values. That being said, Pentaho’s value proposition was more on the data integration part; the data acquisition, extraction, blending, data science etc.; which consumes 80% of a data scientists time spend over mining and modeling. Pentaho also enabled the process engineers, IT, and data scientists to work as a team and enabled business users with self-service consumption of operational data. This not only led to better decisions, but also reduced the lead time from 2 days to less than 10 minutes.


Time seriesdata are simply measurements or events that are tracked, monitored, down sampled, and aggregated over time. This could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data. Time series data can be analyzed to understand the underlying structure and function that produce the observations. A mathematical model can be developed to explain the data in such a way that prediction, monitoring, or control can occur. As the internet turns inside out with Time series data bases, Hitachi Vantara’s Pentaho will be there to scale with the explosion of data and provide integration, analysis, and visualization for greater insights into current and new time series applications.