Eliminating Machine Learning Model Management Complexity
Last year in 4-Steps to Machine Learning with Pentaho, we looked at how the Pentaho Data Integration (PDI) product provides the ideal platform for operationalizing machine learning pipelines – i.e. processes that, typically, ingest raw source data and take it all the way through a series of transformations that culminate in actionable predictions from predictive machine learning models. The enterprise-grade features in PDI provide a robust and maintainable way to encode tedious data preparation and feature engineering tasks that data scientists often write (and re-write) code for, accelerating the process of deploying machine learning processes and models.
“According to our research, two-thirds of organizations do not have an automated
process to update their predictive analytics models seamlessly. As a result, less than
one-quarter of machine learning models are updated daily, approximately one third
are updated weekly, and just over half are updated monthly. Out of date models can
create a significant risk to organizations.”
- David Menninger, SVP & Research Director, Ventana Research
It is well known that, once operationalized, machine learning models need to be updated periodically in order to take into account changes in the underlying distribution of the data for which they are being used to predict. That is, model predictions can become less accurate over time as the nature of the data changes. The frequency that models get updated is application dependent, and itself can be dynamic. This necessitates an ability to automatically monitor the performance of models and, if necessary, swap the current best model for a better performing alternative one. There should be facilities for the application of business rules that can trigger re-building of all models or manual intervention if performance drops dramatically across the board. These sorts of activities fall under the umbrella of what is referred to as model management. In the original diagram for the 4-Steps to Machine Learning with Pentaho blog, the last step was entitled “Update Models.” We could expand the original "Update Models" step and detail the underlying steps that are necessary to automatically manage the models. Then relabel this step to "Machine Learning Model Management" (MLMM). The MLMM step includes the 4-Steps to Machine Learning Model Management, “Monitor, Evaluate, Compare, and Rebuild all Models” in order to cover what we are describing here. This concept now looks like this diagram.
The 4-Steps to Machine Learning Model Management, as highlighted, include Monitor, Evaluate, Compare and Rebuild. Each of these steps implements a phase of a concept called a "Champion / Challenger" strategy. In a Champion / Challenger strategy applied to machine learning, the idea is to compare two or more models against each other in order to promote the one model that performs the best. There can be only one Champion model, in our case the model that is currently deployed, and there can be one or more Challengers, in our case other models that are trained differently, use different algorithms and so forth, but all running against the same dataset. The implementation of the Champion / Challenger strategy for MLMM goes like this,
- Monitor - constant monitoring of all of the models is needed to determine the performance accuracy of the models in the Champion / Challenger strategy. Detecting a degraded model's performance should be viewed as a positive result to your business strategy in that the characteristic of the underlying data has changed. This can be viewed as the behaviors you are striving for are being achieved, resulting in different external behaviors to overcome your current model strategy. In the case of our retail fraud prediction scenario, the degradation of our current Champion model's performance is due to a change in the nature of the initial data. The predictions worked and is preventing further fraudulent transactions, therefore new fraud techniques are being leveraged which the current Champion model wasn't trained to predict.
- Evaluate - an evaluation of the current Champion model needs to be performed to provide evaluation metrics of the model's current accuracy. This evaluation results in performance metrics on the current situation and can provide both a detailed set visual and programmatic data to use to determine what is happening. Based on business rules, if the accuracy level has dropped to a determined threshold level, then this event can trigger notifications of the degraded performance or initiate automated mechanisms. In our retail fraud prediction scenario, since the characteristic of the data has changed, the Champion model's accuracy has degraded. Evaluation metrics from the evaluation can be used to determine that model retraining, tuning and/or a new algorithm is needed. Simultaneously, all models in the Champion / Challenger strategy could be evaluated against the data to ensure an even evaluation on the same data.
- Compare - by comparing the performance accuracy of all the models against each other from the evaluation step, the Champion and the Challenger models can be compared against each other to determine which model performs best, at this time. Since the most likely case is that the current Champion and all the Challenger models were built and trained against the initial state of the data, these models will need to be rebuilt.
- Rebuild - by rebuilding (retraining) all the models against the current state of the data, the best performing model on the current state of the data, is promoted to Champion. The new Champion can be hot-swapped and deployed or redeployed into the environment by using a PDI transformation to orchestrate this action.
This 4-Steps to Machine Learning Model Management is a continuous process, usually scheduled to run on a periodic basis. This blogs describes how to implement a Champion / Challenger strategy using PDI as both the machine learning and the model management orchestration.
The new functionality that provides a new set of supervised machine learning capabilities and the model management enablers to PDI is called Plug-in Machine Intelligence (PMI). PMI provides a suite of steps to PDI that gives direct access to various supervised machine learning algorithms as full PDI steps that can be designed directly into your PDI data flow transformations with no coding. Users can download the PMI plugin from the Hitachi Vantara Marketplace or directly from the Marketplace feature in PDI (automatic download and install). The motivation for PMI is:
- To make machine learning easier to use by combining it with our data integration tool as a suite of easy toconsume steps that do not require writing code, and ensuring these steps guide the developer through its usage. These supervised machine learning steps work “out-of-the-box” by applying a number of “under-the-cover” pre-processing operations and algorithm specific "last-mile data prep" to the incoming dataset. Default settings work well for many applications, and advanced settings are still available for the power user and data scientist.
- To combine machine learning and data integration together in one tool/platform. This powerful coupling between machine learning and data integration allows the PMI steps to receive row data as seamlessly as any other step in PDI. No more jumping between multiple tools with inconsistent data passing methods or, complex and tedious performance evaluation manipulation.
- To be extensible. PMI provides access to 12 supervised Classifiers and Regressors “out-of-the-box”. The majority of these 12 algorithms are available in each of the four underlying execution engines that PMI currently supports: WEKA, python scikit-learn, R MLR and Spark MLlib. New algorithms and execution engines can be easily added to the PMI framework with its dynamic step generation feature.
A more detailed introduction of the Plug-in Machine Intelligence plug-in can be found in this accompanying blog.
PMI also provides a unified evaluation framework. That is, the ability to output a comprehensive set of performance evaluation metrics that can be used to facilitate model management. We call this unified because data shuffling, splitting and the computation of evaluation metrics is performed in the same way regardless of which of the underlying execution engines is used. Again, no coding is required which, in turn, translates into significant savings in time and effort for the practitioner. Evaluation metrics computed by PMI include (for supervised learning): percent correct, root mean squared error (RMSE) and mean absolute error (MAE) of the class probability estimates in the case of classification problems, F-measure, and area under the ROC (AUC) and precision-recall curves (AUPRC). Such metrics provide the input to model management mechanisms that can decide whether a given “challenger” model (maintained in parallel to the current “champion”) should be deployed, or whether champion and all challengers should be re-built on current historical data, or whether something fundamental has been altered in the system and manual intervention is needed to determine data processing problems or to investigate new models/parameter settings. It is this unified evaluation framework that enables PDI to do model management.
Implementing MLMM in PDI
The PDI transformations below are also included in the PMI plugin download complete with the sample datasets.
The following figure shows a PDI transformation for (re)building models and evaluating their performance on the retail fraud application introduced in the 4-Steps to Machine Learning with Pentaho blog. It also shows some of the evaluation metrics produced under a 2/3rd training - 1/3rd test split of the data. These stats can be easily visualized within PDI via DET (Data Exploration Tool), or the transformation can be used as a data service for driving reports and dashboards in the Business Analytics (BA) server.
The following figure shows a PDI transformation that implements champion/challenger monitoring of model performance. In this example, an evaluation metric of interest (area under the ROC curve) is computed for three static models: the current champion, and two challengers. Models are arranged on the file system such that the current champion always resides in one directory and challenger models in a separate directory. If the best challenger achieves a higher AUC score than the current champion, then it is copied to the champion directory. In this way, hot-swapping of models can be made on-the-fly in the environment.
PMI provides the ability to build processes for model management very easily. This, along with its no-coding access to heterogeneous algorithms, automation of “last mile” algorithm-specific data transformations, and when combined with enterprise-grade features in PDI – such as data blending, governance, lineage and versioning – results in a robust platform for addressing the needs of citizen data scientists and modern MI deployments.
Installation documentation for your specific platform and a developer's guide, as well as, the sample transformations and datasets used in this blog can be found at here. The sample transformations and sample datasets are for demonstration and educational purposes.
It is important to point out that this initiative is not formally supported by Hitachi Vantara, and there are no current plans on the Enterprise Edition roadmap to support PMI at this time. It is recommended that this experimental feature be used for testing only and not used in production environments. PMI is supported by Hitachi Vantara Labs and the community. Hitachi Vantara Labs was created to formally test out new ideas, explore emerging technologies and as much as possible, share our prototypes with the community and users through the Hitachi Vantara Marketplace. We like to refer to this as "providing early access to advanced capabilities". Our hope is that the community and users of these advanced capabilities will help us improve and recommend additional use cases. Hitachi Vantara has forward thinking customers and users, so we hope you will download, install and test this plugin. We would appreciate any and all comments, ideas and opinions.