Champions Corner

4 people like this.
Introduction Organizations are working with a lot of data to boost insights and drive critical business decisions. However, they often struggle to create a common data vocabulary. This results in the inconsistency of business definitions across the organization's data. A common data vocabulary is the key component of a successful data governance program. Data catalog tools like Lumada Data Catalog use features like Business Glossary to enable companies to develop a common data vocabulary. There are many benefits of building a business glossary and why an organization should invest in building a business glossary. However, building a business glossary ...
1 comment
Be the first person to like this.
In the Southern Apennine region, the District Basin Authority governs the physical environment and protects its water resources. It is responsible for monitoring the appropriate use of these resources, forecasting the region’s water supply, and preventing natural disasters and human-made hazards, such as illegal abstraction, discharges, and spills. With Hitachi Vantara, the Authority is developing a system for sampling data from the field through specific multiparametric sensors, including video, thermographic, lidar, and others. Integrated with a GIS system and meteorological forecasts, that data flows into a big data analytics and data science system that ...
0 comments
4 people like this.
Scope of this document Main purpose of this document is to help you (our customers) in all the required steps to make a successful deployment of Pentaho server using Kubernetes technology for such purpose. You will have details about: Introduce Pentaho options for container-based deployment Provide definitive guide from zero to deployment stage Facilitate best practices to approach your Kubernetes deployment Introduction Now a days majority of enterprises are deploying their applications in the cloud to take benefit of the different *aaS offerings (IaaS, SaaS, and PaaS) due to its flexibility and seamless workload ...
7 comments
3 people like this.
Scope of this document Main purpose of this document is to help you (our customers) in all the required steps to make a successful deployment of Pentaho Data Integration using Kubernetes technology for such purpose. You will have details about: Introduce Pentaho options for container-based deployment Provide definitive guide from zero to deployment stage Facilitate best practices to approach your Kubernetes deployment Introduction Now a days majority of enterprises are deploying their applications in the cloud to take benefit of the different *aaS offerings (IaaS, SaaS, and PaaS) due to its flexibility and seamless ...
4 comments
3 people like this.
by Anand Rao Vala Principal Product Marketing Manager, Pentaho and Lumada DataOps Software The move to the cloud continues to accelerate at a stunning pace, with more organizations exploring options and migrating their data infrastructure to AWS, Microsoft Azure, and Google Cloud. Deployments of Hadoop in the cloud outstripped on-premise by a factor of five, with DataProc, Google’s implementation of Hadoop in the cloud growing even faster than that, exceeding 60% in just the last year. The pressure of global events is further pushing the accelerator on these trends. As organizations embrace the many benefits of the cloud, they also quickly discover challenges. ...
2 comments
2 people like this.
Recently I had the opportunity to give a keynote speech during the IIoT World Days Virtual Event on one of my favorite subjects: the economics of data. You can see a recording of this presentation   here . Manufacturing organizations are notorious for storing lots and lots of data from the shop floor, quality, suppliers, maintenance and business systems, while only using a small percentage of this to provide real value to the company.   Most organizations lack a methodology for determining the economic value of their data and analytics. But data and analytics are unique economic assets that not only never wear out, but can actually appreciate, ...
3 comments
2 people like this.
Written by Gwyn Evans and Elena Salova When building machine learning solutions, data scientists are often following their noses to select appropriate features, modelling techniques and hyperparameters to tackle the problem at hand. In practice, this involves drawing on experience (and code!) from previous projects, incorporating domain knowledge from subject matter experts, and many (many, many, many, many, many….!?!) iterative development cycles. Whether the experimentation phase is fun, excruciatingly tedious or somewhere in between, one thing is for sure; this development process does not lend itself well to fast solution prototyping, ...
3 comments
Be the first person to like this.
Overview A small data analytics company, specializing in retail analytics, collected sales transaction data from one of the world’s largest retailers. The data was collected every hour from each of the 10 stores in one geographical region and stored in relational data marts. Every night the data from the previous day was transformed and processed from all the datamarts and pushed to a reporting data warehouse. The team that handled the daily data ingestion used Pentaho to implement and manage the ETL processes. This approach works so well, the analytics company expanded the contract to include approximately 150 of the retailer's stores across the United ...
1 comment
1 person likes this.
Organizations desiring to modernize applications and migrate to the cloud, are often inhibited with the ability to migrate data from legacy tabular databases to modern NoSQL data platforms like MongoDB. To help organizations speed this process, Hitachi Vantara is working with and supporting MongoDB’s announced general availability of a modernization solution that will allow customers to better move data from traditional data platforms like Oracle to MongoDB.    This recent announcement from MongoDB includes a Migration Scorecard to evaluate the suitability of MongoDB for both new applications and application migrations. It scores MongoDB and other ...
1 comment
1 person likes this.
Organizations seek to run any workload from any location without the burden of re-architecting or refactoring applications, including data integration pipelines. For storage, they want to leverage their existing on-premise Hadoop investments and provide a seamless experience to data consumers when they migrate to the cloud to take advantage of the usability, scalability and elasticity of cloud-native solutions. Also, Hadoop, in the cloud with the complicated set-up already taken care is ready to be used immediately and on-demand. A quick primer on multi-cluster support What is multi-cluster support? The transition of Hadoop from on-premise to cloud ...
1 comment
1 person likes this.
In this day and age, with plenty of new technology buzzwords, it's becoming increasingly difficult to keep up with trends. So how do you separate signal from noise? One way is when the buzzword lands on Gartner's Hype Cycle. If you follow Gartner's Data Management Hype Cycle, you will find buzzwords like DBPaaS, Catalog, Streaming and DataOps. True to their names, they all create a buzz in the market. People focus about how cool they are and often overlook their weaknesses. To demystify “DataOps”, we engaged with Matt Aslett from 451 Research to do a "State of DataOps" survey. We surveyed 300 companies across North America targeting ...
1 comment
1 person likes this.
Big data has become a household name over the last decade and Hadoop is synonymous with it. If you have a big data problem, the answer has been Hadoop. As we start the 2020 decade, let us take stock of how Hadoop has changed over the last decade and how to modernize it for the future.        The rumors of Hadoop’s demise in the market started over the last few years. In 2017, Strata Hadoop World conference by O'Reilly changed its name to    Strata Data   to expand beyond Hadoop signaling a market fatigue with Hadoop.  In 2018, Hortonworks joined forces with its arch-rival  Cloudera  to create a single company which took ...
1 comment

PMI v1.5 is Here!

1 person likes this.
Plugin Machine Intelligence Version 1.5 is Available! The Plugin Machine Intelligence version 1.5 is now available for download and use from the Pentaho Marketplace from Hitachi Vantara Labs. Version 1.5 includes several key features and add-ons including,   TensorFlow support for Deep Learning and other Machine Learning algorithms Transfer learning eXtreme Boosting Classifier and Regressor algorithm Spark MLlib 2.4 And as part of the PMI data science suite, PMI Visuals for data exploration You can install the PMI v1.5 plugin, as well as the PMI Visualization plugin ( see this blog for more details ), directly for the Marketplace ...
2 comments
1 person likes this.
PMI Visualization – 3D Exploration and Scatter Plot Matrix   There is a new Plugin Machine Intelligence (PMI) plugin available from the Pentaho Marketplace for data science data exploration called “PMI Visualization” from Hitachi Vantara Labs . This is a separate plugin from the core PMI plugin for machine learning, but is part of the PMI suite of data science and machine learning tools. This PMI feature uses a spoon perspective to visualize your data from just about any step in your transformation. It provides a 3D viewing perspective of your data and what I call “flying through your data”, and a Scatter Plot Matrix of every data point pairing of your ...
2 comments
1 person likes this.
by Arik Pelkey  Sr. Director, Product Marketing    “Can you please just send me the data?”  Last September, our chairman Toshiaki Tokunaga, issued a “ Better Together ” blog reiterating Hitachi Vantara’s vision of help ing our customers extract more value from their data and improve their businesses, while powering good for our society.   It sounds trite, but to become a more data-driven enterprise and society, analytics data pipelines are often the name of the game.  This is where the latest release of our Pentaho suite, Pentaho 9.0, comes in.        Pentaho 9.0 is an exciting and shining example of how ...
1 comment
1 person likes this.
Hidden Results from Machine Learning Models – Blog 2 of 2 As mentioned in the first blog of this 2 part series , there are hidden results that needs to be interpreted and extrapolated from the initial or assumed prediction of your results. Just like the previous blog shows that a binary class can have in 3 results, YES, NO and a state that should be interpreted as UNKNOWN, regardless of the Class max prob  answer.  This is an area of research called "learning to abstain from making a prediction" which is closely related to "active learning machine learning"  where, in a semi-supervised scenario, a classifier can identify which cases/examples ...
1 comment