Last summer, one of my healthcare clients asked my team if there was a better way to project the costs from surgery. Patients who experience complications go through supplementary cares for recovery, increasing the overall cost of care. In healthcare, the set of services to treat a clinical condition from start to finish is defined as an episode of care.
Everyone from providers to patients desire the best outcomes in an episode of care. However, outcomes from major resections of vital organs or joint replacements can be unpredictable. Successful surgery depends on many factors such as the patient’s conditions before and during surgery.
Machine learning is a perfect candidate in cases like this providing insights on issues which have multiple factors. The ability to predict outcomes from surgery can improve patient experiences and also allow practices to better manage the costs of care.
The machine learning solution for this article was developed with Plugin Machine Intelligence (PMI) for PDI and used a publicly available surgery data from University of California—Irvine’s data archive. In the dataset, one of the variables flags whether or not a patient survived beyond one year after surgery. With this in hand, I instructed the machine learning algorithms to predict survivability. The resulting solution features a family of tree algorithms which visualize how the machine came to a certain prediction.
A map of decision tree algorithm is illustrated in Fig. 1. Navigating through the map is very easy. Each node (depicted as brown circles) represents a medical condition that the algorithm determined important when it classified patients. Each path ends with leaves (depicted as green rectangles) that represents the two classifications: patients who are predicted to survive or those that are not.
One path in the tree is quite interesting (Fig. 3). The algorithm is able to predict accurately even when the medical conditions seem fairly benign compared to other paths.
The algorithm predicted correctly that three patients did not survive within one year of their operations. This set of patients had diabetes and experienced weakness before surgeries which elevated their risks. Other serious conditions such as hemoptysis and dyspnea were not observed. Hypothetically, surgeons may have looked at the conditions of these patients and determined that the risks are low enough to proceed with the surgeries. They did not have an objective way to weigh how different factors contributed to the overall risks of each patient.
By studying the data, the model determined that certain conditions are particularly good at inferring the outcomes. In machine learning terminology, this is called feature importance. Forced vital capacity (FVC) and TNM are the features that appear the most in the decision tree. Forced vital capacity is the maximum volume of air a person is able to exhale. This is one of the metrics used by doctors to diagnose patients and determine severities of respiratory illnesses. TNM codes measure the size of the original tumor observed in cancer patients. The two features appear most frequently because they are closely linked to the severity of the illness.
One strength of machine learning is its ability to learn or adjust weights of each condition as new data stream in. Here is another way the decision tree algorithm adapted with a different set of data.
The algorithm is able learn on its own as the underlying data changes. The hierarchy of decision tree changed with diagnosis appearing as the root node. Compared to before, a new feature FEV1 plays a major role in classifying patients. FEV1 is similar to FVC where patients exhale maximum amount of air in one second. The two metrics are used in conjunction for diagnosis. The algorithm is adapting in order to maintain its predictive capabilities.
Hitachi Vantara Labs recently unveiled Plugin Machine Intelligence (PMI) for PDI which vastly accelerates development of machine learning models. From a data scientist’s point of view, what makes this solution unique is that the complete stack was developed with very minimal coding.
Before the announcement of PMI, I have been developing the machine learning solution via the traditional methods, meaning writing many lines of code. Careful code management was required so that if issues arises I or my colleagues can address them. All of these nuances are taken care by PMI under the hood. The complexity of managing a machine learning models and the benefits of PMI is explained in Mark Hall and Ken Wood’s article “4-Steps to Machine Learning Model Management”.
From a business analyst’s point of view, PMI allows machine learning to be a natural addition to one’s analytical toolkit. Analysts often have deep insights in their domains. Once folks are familiar with the thought process of articulating questions that machine learning is designed to solve, analysts can produce powerful insights by employing machine intelligence models. This is because PDI + PMI are fundamentally visual tools: drag-and-drop steps to manage data and machine learning models.
Fig. 7 Neural Network Editor in PMI: Artificial neural network used in gradient boosted tree algorithm
The synergy produced by the technologies is explained in Hu Yoshida’s article, “Orchestrating Machine Learning Models and Improving Business Outcomes”. Yoshida notes, “tools can be used in a data pipeline built in Pentaho to help improve business outcomes and reduce risk by making it easier to update models in response to continual change. Improved transparency gives people inside organizations better insights and confidence in their algorithms.” I can attest to this from my experience working in the platform.
The PMI toolkit allows people to explore capabilities of machine learning and see their relevancies in solving specific business problems. With PMI, machine learning is no longer a mysterious black box. Machine intelligence is now available for everyone.