Author: Rakesh Saha, Sr. Product Manager, Pentaho Product Line
In the world of enterprise IT, managing data in multiple clouds is now the new normal — whether it’s the result of a deliberate strategy or from shadow IT doing their own thing. Enterprises are not only moving data to the cloud at an unprecedented pace, but they are also embracing different cloud platforms from different vendors at the same time for good business and technical reasons. That means IT leaders need a plan to manage multiple clouds uniformly. But it’s not just about maintaining resource utilization views anymore. If left unchecked, multi-cloud sprawl can put your data assets at tremendous risk.
According to a study by Forrester Research, 65 percent of IT leaders believe “data integration becomes more complex in the public cloud”. To give you some perspective, these cloud data integration challenges came in behind only security and compliance challenges.
With Pentaho 8.1, we are continuing to enhance our data integration and analytics platform to be more cloud-friendly so that enterprises can develop data pipelines on and with data in any of the leading cloud platforms without the complexity. Now, following our support for AWS and then Microsoft Azure, Pentaho 8.1 supports Google Cloud platform.
By supporting Google Cloud, Pentaho 8.1 is a significant step toward helping our customers with their multi-cloud strategies. We now provide even more choice regarding which public cloud vendor to use for their data management.
Pentaho 8.1 also delivers new capabilities which directly and indirectly support multi-cloud data strategies. With Pentaho, for example, you can:
- Visually manage data in multiple-cloud storage environments, now using Google Cloud storage (see Figure 1)
- Load data in bulk to Google BigQuery (see Figure 3)
- Visualize and analyze data in Google BigQuery
- Elastically deploy Pentaho in the cloud to scale up and down based on workload
- Use Spark in the Cloud (AWS EMR) for visual data processing
- Load & download data files from Google Drive
Figure 1: Job spanning on-premise to multi-cloud
Each cloud platform offers their own services, but data integration platforms like Pentaho also need to support a set of common components, like those shown in Figure 2. What also differentiates us from the data integration tools specific to the vendors themselves is our flexible deployment architecture. This means you can use Pentaho to access and process data where it lives, whether the data is in the cloud or on premises, and whether it’s in AWS, Azure or Google Cloud platform – rather than needing to move data around – thereby reducing latency.
Figure 2: Job spanning on-premise to multi-cloud
Now Pentaho can also be used to move files from on-premise to one cloud, and then to another cloud vendor with any data format because of the seamless integration of different cloud storage technologies via VFS (see figure 4). Pentaho encapsulates security and other integration details and makes it easy to load data into the appropriate cloud data management or warehouse services with new and existing capabilities.
Figure 3: Data Loading to Google Cloud Storage
Figure 4: Data loading to Google BigQuery
After loading data in cloud data warehouses, data can be consumed in data pipelines running in Pentaho data integration and directly by data analysts using Pentaho’s Business Analytics. With all these cloud data sources and our data management services, we can facilitate end-to-end ETL, analytics solutions and help solve even more problems.
With the emergence of multi-cloud IT deployments, data professionals need to work with data they understand and trust, and now more than ever need a platform to harmonize the data with transformation processes, across different cloud and on-premise environments. Data integration platforms like Pentaho have an enormous role to play for those enterprises and for our cloud future. Pentaho’s multi-cloud capabilities squarely address this enterprise need – especially with the new capabilities introduced in 8.1 release.