Compute Contributions with Shap - Summarize Them With Shapash

Shapash uses Shap backend to compute the Shapley contributions in order to satisfy the most hurry users who wish to display results with little lines of code.

But we recommend you to refer to the excellent Shap library.

This tutorial shows how to use precalculated contributions with Shap in Shapash

Contents: - Build a Binary Classifier - Use Shap KernelExplainer - Compile Shapash SmartExplainer - Display local_plot - to_pandas export

We used Kaggle’s Titanic dataset

[1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import shap
[2]:
from shapash.data.data_loader import data_loading
[3]:
titan_df, titan_dict = data_loading('titanic')
del titan_df['Name']
[4]:
titan_df.head()
[4]:
Survived Pclass Sex Age SibSp Parch Fare Embarked Title
PassengerId
1 0 Third class male 22.0 1 0 7.25 Southampton Mr
2 1 First class female 38.0 1 0 71.28 Cherbourg Mrs
3 1 Third class female 26.0 0 0 7.92 Southampton Miss
4 1 First class female 35.0 1 0 53.10 Southampton Mrs
5 0 Third class male 35.0 0 0 8.05 Southampton Mr

Create Classification Model

[5]:
y = titan_df['Survived']
X = titan_df.drop('Survived', axis=1)
[6]:
varcat=['Pclass','Sex','Embarked','Title']
[7]:
categ_encoding = OrdinalEncoder(cols=varcat, \
                                handle_unknown='ignore', \
                                return_df=True).fit(X)
X = categ_encoding.transform(X)

Train Test split + Random Forest fit

[8]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.75, random_state=1)

rf = RandomForestClassifier(n_estimators=100,min_samples_leaf=3)
rf.fit(Xtrain, ytrain)
[8]:
RandomForestClassifier(min_samples_leaf=3)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
[9]:
ypred = pd.DataFrame(rf.predict(Xtest),columns=['pred'],index=Xtest.index)

Use Shapash With Shapley Contributions

[10]:
from shapash import SmartExplainer

Differents ways to compute Shapeley values with Shap

Let Shapash choose the method for you

[11]:
xpl = SmartExplainer(
    model=rf,
    backend='shap',
    preprocessing=categ_encoding,
    features_dict=titan_dict
)
xpl.compile(
    y_pred=ypred,
    y_target=ytest, # Optional: allows to display True Values vs Predicted Values
    x=Xtest
)
INFO: Shap explainer type - <shap.explainers._exact.ExactExplainer object at 0x7f0c1702dca0>
ExactExplainer explainer: 224it [00:42,  4.62it/s]

Let Shap choose the method for you and give the masker you want

[12]:
xpl = SmartExplainer(
    model=rf,
    backend='shap',
    explainer_args={'model': rf.predict_proba, 'masker': Xtest},
    preprocessing=categ_encoding,
    features_dict=titan_dict
)
xpl.compile(
    y_pred=ypred,
    y_target=ytest, # Optional: allows to display True Values vs Predicted Values
    x=Xtest
)
INFO: Shap explainer type - <shap.explainers._exact.ExactExplainer object at 0x7f0c1702d910>
ExactExplainer explainer: 224it [00:36,  4.36it/s]

Tell Shap what do

[13]:
xpl = SmartExplainer(
    model=rf,
    backend='shap',
    explainer_args={'explainer': shap.explainers.PermutationExplainer, 'model': rf.predict_proba, 'masker': Xtest},
    preprocessing=categ_encoding,
    features_dict=titan_dict
)
xpl.compile(
    y_pred=ypred,
    y_target=ytest, # Optional: allows to display True Values vs Predicted Values
    x=Xtest
)
INFO: Shap explainer type - shap.explainers.PermutationExplainer()
PermutationExplainer explainer: 224it [03:04,  1.14it/s]

Use contributions parameter of compile method to declare Shapley contributions

[14]:
xpl = SmartExplainer(
    model=rf,
    preprocessing=categ_encoding,
    features_dict=titan_dict
)

masker = pd.DataFrame(shap.kmeans(Xtest, 50).data, columns=Xtest.columns)
explainer = shap.explainers.PermutationExplainer(model=rf.predict_proba, masker=masker)
shap_contrib = explainer.shap_values(Xtest)

xpl.compile(
    contributions=shap_contrib, # Shap Contributions pd.DataFrame
    y_pred=ypred,
    y_target=ytest, # Optional: allows to display True Values vs Predicted Values
    x=Xtest
)
PermutationExplainer explainer: 224it [00:23,  5.51it/s]