View Only

Pentaho 9.1 Delivers the Promise of Multicloud

By Anand Sagar Rao Vala posted 11-03-2020 21:40

by Anand Rao Vala
Principal Product Marketing Manager, Pentaho and Lumada DataOps Software

The move to the cloud continues to accelerate at a stunning pace, with more organizations exploring options and migrating their data infrastructure to AWS, Microsoft Azure, and Google Cloud. Deployments of Hadoop in the cloud outstripped on-premise by a factor of five, with DataProc, Google’s implementation of Hadoop in the cloud growing even faster than that, exceeding 60% in just the last year. The pressure of global events is further pushing the accelerator on these trends.

As organizations embrace the many benefits of the cloud, they also quickly discover challenges. For some, it has been shocking service bills. Others struggle to marry the multiple cloud services in use after a merger or when conducting business across borders.

What is the answer when one wants to, or must, keep multiple cloud providers, but also hopes to blend the data from each to generate powerful insights?

That is the premise of Pentaho 9.1, a solution purpose-built to help Hitachi-Ventara clients take full advantage of all the potential of the cloud, supporting their transition from on-premise to the cloud, at a pace that matches their timetable and achieves their goals. Pentaho 9.1 enables organizations to choose the service, or services, that best suit their operational, budgetary, and management needs.

The Multicloud without Borders
No one cloud solution is right for every organization. In fact, many organizations will be best served by working with multiple cloud providers. To that end, Pentaho 9.1 offers support for all of the most preferred platforms including AWS, Azure, and now Google Dataproc, the Hadoop and Spark cluster for Google Cloud.

More significantly, Pentaho 9.1 makes it possible to create a data infrastructure that is independent of any single cloud provider. Pentaho 9.1 ends cloud lock-in by offering the flexibility to create data pipelines and transformations across whichever combination of services benefits them the most.

As we’ll explain below, Pentaho Data Integration is tightly knit with analytics functionality, so organizations no longer need to navigate the complexities inherent in having multiple vendors for different parts of their data operations. Pentaho 9.1 features exclusive adaptive execution capabilities which further make it possible to switch execution engines without changing the underlying pipelines. The Hadoop infrastructure can be on-premise or upgraded to the cloud. By abstracting away the transformation, it is then possible to use Pentaho native execution or Spark as needed. Now when a developer creates pipelines on their laptop and then moves to a production environment in the cloud, there is no need for extensive re-testing because all that is changed is the execution engine.

This unprecedented freedom will power faster execution of data integration projects while significantly reducing complexity and management costs. A large bank with $250B+ of assets under management has been able to develop a five-year plan to migrate from on-premise to Google Cloud Hadoop, thanks to Pentaho 9.1 Dataproc support, and execute their move at a measured pace that enables them to add capability without disrupting what works.

Data Discovery in the Multicloud
The data discovery that precedes the building of a data pipeline is essential, infamously frustrating, and a critical hurdle that must be overcome to make effective use of data. In fact, Forrester reported that proper curation of data alone increased the use of big data within organizations by 50%.

To meet this need, Hitachi Vantara tightened the bond between Pentaho Data Integration and Lumada Data Catalog to bring meaningful analytics insights ever closer to hand within multicloud environments by providing visibility into the data at a very abstract level.

Instead of forcing users to know precise file names, login information, or file location names, the Lumada Data Catalog makes it possible to search for data based on intuitive business nomenclature. When a new dataset is exposed to the catalog, the content is analyzed and profiled, differentiating, for example, between customer information and product information so that each can be mapped to similar data within the catalog for easy access.

This is a significant advantage that is unique to Pentaho 9.1. When building a pipeline with Pentaho, one no longer needs to endure the time-consuming processes of naming the database, table, schema, and other minutiae. Instead, one simply picks and chooses columns, for example, customer phone number or email address. With this level of abstraction, it is then possible to later move that data to a more advantageous location to take advantage of lower-cost storage, for example, without breaking the pipeline. The catalog keeps track of the change and operations continue undisturbed.

A Streamlined Upgrade Path to Pentaho 9.1
These enhancements promise to have a significant impact on an organization’s ability to realize the full potential of the cloud, and we have also dramatically shortened the path to get you there. The upgrade path to Pentaho 9.1 is impressively streamlined resulting in significantly reduced downtime for testing and production. For many organizations, the upgrade experience will be as effortless and automated as a typical Windows service pack.

One of the key innovations for Pentaho 9.1 is that the upgrade will preserve invaluable, pre-existing customizations and connectors, automatically transitioning those elements to the new location. The net result is a dramatic reduction of the impact on the production environment, with minimal testing needed both before and after the upgrade.

Putting Pentaho 9.1 to Work
The ultimate reward for creating a seamlessly coordinated multicloud environment is to put those far-reaching data pipelines to work. Beginning in December ‘20, Hitachi Vantara will begin to offer an updated web companion to Pentaho. This reimagined Dataflow Manager will empower data analysts or data scientists with less technical expertise to leverage the web-based data pipeline templates created by data engineers. Its interface will further enable them to perform low-level customization, execute and monitor pipelines to completion, and review logs for errors and metrics without assistance from IT engineers.

The new Dataflow Manager promises to dramatically expand access and utilization, freeing more people to perform ad-hoc data analysis, exploration, and experimentation in a containerized setting using a cluster of servers in the cloud for execution, and all without burdening the regular operational pipelines or IT staff. A large US bank with $250B+ in assets and holders of one of the largest datasets in the financial sector has already begun enabling business users to self-serve data flows this way and looks forward to significantly lowering operating costs and improving governance thanks to this solution.

And that really is the crux of the story of Pentaho 9.1. Greater flexibility to make the most of the cloud while expanding the overall organizational ability to leverage data.

If you are interested in more details, please contact your account representative or reach out to us here





05-04-2022 13:51


05-03-2022 01:56

Good read