Compute Contributions with Shap - Summarize Them With Shapash¶
Shapash uses Shap backend to compute the Shapley contributions in order to satisfy the most hurry users who wish to display results with little lines of code.
But we recommend you to refer to the excellent Shap library.
This tutorial shows how to use precalculated contributions with Shap in Shapash
Contents: - Build a Binary Classifier - Use Shap KernelExplainer - Compile Shapash SmartExplainer - Display local_plot - to_pandas export
We used Kaggle’s Titanic dataset
[1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import shap
[2]:
from shapash.data.data_loader import data_loading
[3]:
titan_df, titan_dict = data_loading('titanic')
del titan_df['Name']
[4]:
titan_df.head()
[4]:
Survived | Pclass | Sex | Age | SibSp | Parch | Fare | Embarked | Title | |
---|---|---|---|---|---|---|---|---|---|
PassengerId | |||||||||
1 | 0 | Third class | male | 22.0 | 1 | 0 | 7.25 | Southampton | Mr |
2 | 1 | First class | female | 38.0 | 1 | 0 | 71.28 | Cherbourg | Mrs |
3 | 1 | Third class | female | 26.0 | 0 | 0 | 7.92 | Southampton | Miss |
4 | 1 | First class | female | 35.0 | 1 | 0 | 53.10 | Southampton | Mrs |
5 | 0 | Third class | male | 35.0 | 0 | 0 | 8.05 | Southampton | Mr |
Create Classification Model¶
[5]:
y = titan_df['Survived']
X = titan_df.drop('Survived', axis=1)
[6]:
varcat=['Pclass','Sex','Embarked','Title']
[7]:
categ_encoding = OrdinalEncoder(cols=varcat, \
handle_unknown='ignore', \
return_df=True).fit(X)
X = categ_encoding.transform(X)
Train Test split + Random Forest fit
[8]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.75, random_state=1)
rf = RandomForestClassifier(n_estimators=100,min_samples_leaf=3)
rf.fit(Xtrain, ytrain)
[8]:
RandomForestClassifier(min_samples_leaf=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(min_samples_leaf=3)
[9]:
ypred = pd.DataFrame(rf.predict(Xtest),columns=['pred'],index=Xtest.index)
Use Shapash With Shapley Contributions¶
[10]:
from shapash import SmartExplainer
Differents ways to compute Shapeley values with Shap¶
Let Shapash choose the method for you¶
[11]:
xpl = SmartExplainer(
model=rf,
backend='shap',
preprocessing=categ_encoding,
features_dict=titan_dict
)
xpl.compile(
y_pred=ypred,
y_target=ytest, # Optional: allows to display True Values vs Predicted Values
x=Xtest
)
INFO: Shap explainer type - <shap.explainers._exact.ExactExplainer object at 0x7f0c1702dca0>
ExactExplainer explainer: 224it [00:42, 4.62it/s]
Let Shap choose the method for you and give the masker you want¶
[12]:
xpl = SmartExplainer(
model=rf,
backend='shap',
explainer_args={'model': rf.predict_proba, 'masker': Xtest},
preprocessing=categ_encoding,
features_dict=titan_dict
)
xpl.compile(
y_pred=ypred,
y_target=ytest, # Optional: allows to display True Values vs Predicted Values
x=Xtest
)
INFO: Shap explainer type - <shap.explainers._exact.ExactExplainer object at 0x7f0c1702d910>
ExactExplainer explainer: 224it [00:36, 4.36it/s]
Tell Shap what do¶
[13]:
xpl = SmartExplainer(
model=rf,
backend='shap',
explainer_args={'explainer': shap.explainers.PermutationExplainer, 'model': rf.predict_proba, 'masker': Xtest},
preprocessing=categ_encoding,
features_dict=titan_dict
)
xpl.compile(
y_pred=ypred,
y_target=ytest, # Optional: allows to display True Values vs Predicted Values
x=Xtest
)
INFO: Shap explainer type - shap.explainers.PermutationExplainer()
PermutationExplainer explainer: 224it [03:04, 1.14it/s]
Use contributions parameter of compile method to declare Shapley contributions¶
[14]:
xpl = SmartExplainer(
model=rf,
preprocessing=categ_encoding,
features_dict=titan_dict
)
masker = pd.DataFrame(shap.kmeans(Xtest, 50).data, columns=Xtest.columns)
explainer = shap.explainers.PermutationExplainer(model=rf.predict_proba, masker=masker)
shap_contrib = explainer.shap_values(Xtest)
xpl.compile(
contributions=shap_contrib, # Shap Contributions pd.DataFrame
y_pred=ypred,
y_target=ytest, # Optional: allows to display True Values vs Predicted Values
x=Xtest
)
PermutationExplainer explainer: 224it [00:23, 5.51it/s]