Code Grand Rounds
Hospital Readmissions: Decision Trees | Code Grand Rounds
Hospital Readmissions: Decision Trees
Not started
Project
58 min.
Intermediate
zipped_data/hospital_readmissions.csv.zip
Build a decision tree and random forest model to determine whether or not a patient will get readmitted based on clinical data.
G
H

Decision Trees


Introduction

Alright lets move into our building our models. We are going to start with decision trees and then make the very logical transition into random forests.


The nice part about decision trees is that you already know how they work, you just don't know you know! A decision tree is an algorithmic approach to arriving at a final decision using if/ else statements. Imagine you're trying to diagnose an illness based on a set of symptoms. You might start with a symptom, say, "Is there a fever?" If yes, you then ask if the patient has a cough. Depending on that answer, you continue down a path of inquiry until you reach a probable diagnosis. This process of decision-making, based on asking a series of questions, is the essence of a decision tree in data science.



At each "node" of the tree, the algorithm evaluates a feature and decides on which path to take based on the outcome. This continues until we reach a "leaf" node, which provides the final decision (or prediction).


While the tree metaphor might seem simplistic, decision trees are powerful tools and form the foundation for more complex algorithms (such as random forests and XGBoost). They are also highly transparent and can be visualized and interpreted easily. This makes decision trees particularly appealing for scenarios where understanding the reasoning behind predictions is highly important, such as medicine!



It is important to keep in mind that the same principles of machine learning and data science apply here. We will train, test, deploy, avoid data leakage, and use similar metrics such as precision, recall, and area under the ROC curve (AUROC) to evaluate them. There will be some new metrics as well that are more specific to decision trees, but don't lose sight of the big picture (or... don't lose the forest through the trees....).

First let's import the data!

This is the same preprocessing code from the data exploration section from earlier. Make sure you run this cell, but it is a repeat of earlier content. If you would like to try different preprocessing techniques you can do so here, however.

Build Decision Trees

Let's start with just making a decision tree using sklearn so you can see what they are all about and observe the building block of what will become the random forest.


In this lesson, we will use all of our data to train and test the model. As decision trees can tell you which features they used in their decision making process, you oftentimes start with all of your features when using decision trees and random forests.


We will start by creating our predictor features- all the data minus readmitted- and our target feature-readmitted- using pandas. We will then split the data into training and testing sets as per-usual.

Now we will train a decision tree classifier. The goal here is not to build the optimal decision tree, but rather just to illustrate practically how to make one. Lets briefly discuss the 'under the hood` mechanics of how decision tree models actually make their decisions. It is important to understand this principle well as it is the foundation for so many other machine learning models.


Gini Impurity

We know that decision trees start at the top of the tree, or root node. This node is determined by the feature in the data that has the lowest 'Gini Impurity' (information gain is another metric you can use, but we will use Gini here).


Gini impurity is a number that is between 0-0.5, and gives the likelihood that new, random data would be misclassified if it were given a random class label. For example, if we introduced a new patient into the dataset and tried to use a decision tree with .5 Gini impurity to predict readmission, we would essentially be flipping a coin.



The goal of the decision tree in training is thus to minimize the Gini impurity. It does this by starting at the root and traversing down the tree, making decisions at each node. Once it reaches a leaf node (or the final output), it returns a predicted class based on the maximum number of samples at that particular leaf.


The process of deciding where to split a node involves calculating the Gini impurity for every potential split of the data, and the one that results in the largest decrease in impurity is chosen.


In a nutshell, the model will iteratively try to make a split based on every feature at each node in the tree using the Gini impurity metric as the deciding factor. Here is a great video that discusses how Gini impurity is calculated.


Decision Tree Hyperparameters

There are three main hyperparameters that you will want to tune for decision trees. You can do this manually, or via automated methods which we will discuss in the random forests section, but it is important to at least have an idea what these hyperparameters mean:


  1. Max Depth: This parameter sets the maximum depth of the decision tree, controlling how deep the tree can go. A deeper tree can capture more complex patterns but risks overfitting by capturing noise. Conversely, a shallow tree may be too simplistic, failing to capture important patterns in the data. Balancing this parameter helps in managing the trade-off between underfitting and overfitting.
  2. Min Samples Split: This parameter specifies the minimum number of samples required to split an internal node. A higher value prevents the tree from making splits based on small sample sizes, which can capture noise rather than meaningful patterns. This helps ensure that splits are made only when there is sufficient data, contributing to better decisions.
  3. Min Samples Leaf This parameter sets the minimum number of samples required to be at a leaf node. By enforcing a minimum number of samples per leaf, you reduce the risk of leaves making decisions based on a very small subset of the data, which can be unreliable. This helps in smoothing the model and making it less sensitive to noise.



OK! Let's now do it. As always, notice that the code to do these things is easy. It really is the conceptual understanding that is important. First lets make a Decision Tree with a max_depth of 3. We will also fit the Decision Tree with the training data and make some predictions using the test set

Now we can get the evaluation metrics from our decision tree. To get the AUROC, we have to get the probabilities using the predict_proba method from scikit-learn. predict_proba returns the probability estimate for each class instead of a vector of 1s and 0s denoting the class label.



Once we have that, we can use the probabilities to calculate the AUROC curve and the binary predictions to build the classification report using the built in method from sklearn.

Even though we have talked about the interpretation of precision and recall values ad nauseam before on Code Grand Rounds, it is still very important to look at these and know what's going on, so here's a detailed breakdown:


Classification Report

  1. Precision for Class 0 : Out of all the instances predicted as not readmitted 61% of them were accurate.
  2. Precision for Class 1: For those predicted as readmitted, 60% were correct. This suggests that a notable number of predictions for this class might be false positives.
  3. Recall for Class 0: Among the true not readmitted instances, the model accurately predicted 71%. This implies that the model missed 28% of the true negatives, categorizing them as false negatives.
  4. Recall for Class 1: For the actual readmitted instances, the model managed to identify 49% correctly. This indicates potential room for improvement in detecting readmissions, given the relatively lower recall rate for this class.
  5. Accuracy: The overarching prediction accuracy of the model is 61%. This denotes that for every 100 predictions, the model got 61 of them right.


AUROC

  1. The AUROC score was 0.6274. While this is higher than 0.5, it signifies that the model has a moderate capability to differentiate between readmission and not


Reading Decision Trees

One of the nice things about decision trees is that they are very interpretable, meaning we can directly visualize the branches it made. The buzzword for this is Explainable AI. Lets plot the decision tree in matplotlib.

This can admittedly be a lot when you look at it for the first time so lets break it down, starting from the root (top) node.


  1. We see the first feature was n_inpatient (number of inpatient visits) with a Gini impurity of .498. 20,000 samples are at this level of the tree. This makes sense because we did an 80/20 training test split with 25,000 total observations.
  2. Of these 20,000, 10,588 who were not readmitted and 9412 were readmitted. The root node is saying 'If you have 0 (<.5) inpatient visits, move to the left into the not readmitted branch
  3. On the other hand, if you have 1 or more inpatient visits (>.5) move to the right of the tree to the readmitted side.


The logic is the same for all the nodes at the other branches. After the initial population has been funneled from the root, on the left branch, if a person had 0 outpatient visits they will go to the left (not readmitted), and if they had 1 or more outpatient visits they will to the right (readmitted, with a very high Gini impurity that is basically a coin toss).


By looking at the right side of the plot we can see that the model clearly thought a higher number of inpatient visits in the past year are very important predictors of being readmitted, which makes sense. On the left side, the model thought that having less outpatient visits and not taking any diabetes medications were important predictors of not being readmitted, which also makes sense.


So while these kinds of trees can be a lot at first glance, when you really start to break them down they are actually quite intuitive and easy to read. The ability to see which features are important within your dataset is also a very important aspect of decision trees. You can use a decision tree to aid in feature selection, similarly to how we used recursive feature elimination in our Logistic Regression lectures.


Moving On

The nice thing about Decision Trees is that they can be easily made more powerful by using multiple decision trees at once in the same model. We will see you in the next lesson on Random Forests to show you how this is done.