Hướng dẫn heatmap correlation python
If you are reading this blog, I am sure you have already seen heatmaps. They are beautiful, yet they reveal just about as much as they conceal. When done right, they are easily readable. When not, they are still great to look at, just maybe not as much functional. From now on, we are going to take a look at one of the many great uses of heatmaps, the correlation heatmap. Correlation matrices are an essential tool of exploratory data analysis.
Correlation heatmaps contain the same information in a visually appealing way. What more: they show in a glance which variables are correlated, to what degree, in which direction, and alerts us to potential multicollinearity problems. Let’s see how we can work with Seaborn in Python to create a basic correlation heatmap. For our purposes, we are going to use the Ames housing dataset available on Kaggle.com. This dataset contains over 30 features that potentially affect the variance in sales price, our y-variable. Since Seaborn had been built on the Matplotlib data visualization library and it is often easier to use the two in combination, besides the usual imports we are going to import Matplotlib.pyplot as well. The following code creates the correlation
matrix between all the features we are examining and our y-variable. Basic Seaborn Heatmapsns.heatmap(dataframe.corr()); About as pretty as useless.Seaborn is easy to use, hard to navigate. It comes with a flood of inbuilt features, and excessive documentation. It can be hard to figure out exactly which arguments to use if you do not want all the bells and whistles. Let’s make our basic heatmap functional with as little effort as possible. Take a look at the list of the Seaborn heatmap arguments:
# Increase the size of the heatmap.plt.figure(figsize=(16, 6))# Store heatmap object in a variable to easily access it when you want to include more features (such as title). A diverging color palette that has markedly different colors at the two ends of the value-range with a pale, almost colorless midpoint, works much better with correlation heatmaps than the default colormap. While illustrating this statement, let’s add one more little detail: how to save a heatmap to a png file with all the x- and y- labels (xticklabels and yticklabels) visible. plt.figure(figsize=(16, 6))Stronger correlation on both ends of the spectrum pops out in darker, weaker correlation in lighter shades. Triangle Correlation HeatmapTake a look at any of the correlation heatmaps above. If you cut away half of it along the diagonal line marked by 1-s, you would not lose any information. Let’s cut the heatmap in half, then, and keep only the lower triangle.
Let’s use the np.triu() numpy function to isolate the upper triangle of a matrix while turning all the values in the lower triangle into 0. (The np.tril() function would do the same, only for the lower triangle.) Using the np.ones_like() function will change all the isolated values into 1. np.triu(np.ones_like(dataframe.corr())) plt.figure(figsize=(16, 6))# define the mask to set the values in the upper triangle to Truemask = np.triu(np.ones_like(dataframe.corr(), dtype=np.bool))heatmap = sns.heatmap(dataframe.corr(), mask=mask, vmin=-1, vmax=1, annot=True, cmap='BrBG')heatmap.set_title('Triangle Correlation Heatmap', fontdict={'fontsize':18}, pad=16); Correlation of Independent Variables with the Dependent VariableOften, however, what we want to create, is a colored map that shows the strength of the correlation between every independent variable that we want to include in our model and the dependent variable. The following code returns the correlation of all features with ‘Sale Price’, a single, dependent variable, sorted by ‘Sale Price’ in a descending manner. dataframe.corr()[['Sale Price']].sort_values(by='Sale Price', ascending=False) Let’s use it as the data in our heatmap. plt.figure(figsize=(8, 12))heatmap = sns.heatmap(dataframe.corr()[['Sale Price']].sort_values(by='Sale Price', ascending=False), vmin=-1, vmax=1, annot=True, cmap='BrBG')heatmap.set_title('Features Correlating with Sales Price', fontdict={'fontsize':18}, pad=16); I hope you found what you were looking for in this article. |