Compute Contributions with Shap - Summarize Them With Shapash¶

Shapash uses Shap backend to compute the Shapley contributions in order to satisfy the most hurry users who wish to display results with little lines of code.

But we recommend you to refer to the excellent Shap library.

This tutorial shows how to use precalculated contributions with Shap in Shapash

Contents: - Build a Binary Classifier - Use Shap KernelExplainer - Compile Shapash SmartExplainer - Display local_plot - to_pandas export

We used Kaggle’s Titanic dataset

[1]:

import pandas as pd
from category_encoders import OrdinalEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import shap

[2]:

from shapash.data.data_loader import data_loading

[3]:

titan_df, titan_dict = data_loading('titanic')
del titan_df['Name']

[4]:

titan_df.head()

[4]:

	Survived	Pclass	Sex	Age	SibSp	Parch	Fare	Embarked	Title
PassengerId
1	0	Third class	male	22.0	1	0	7.25	Southampton	Mr
2	1	First class	female	38.0	1	0	71.28	Cherbourg	Mrs
3	1	Third class	female	26.0	0	0	7.92	Southampton	Miss
4	1	First class	female	35.0	1	0	53.10	Southampton	Mrs
5	0	Third class	male	35.0	0	0	8.05	Southampton	Mr

Create Classification Model¶

[5]:

y = titan_df['Survived']
X = titan_df.drop('Survived', axis=1)

[6]:

varcat=['Pclass','Sex','Embarked','Title']

[7]:

categ_encoding = OrdinalEncoder(cols=varcat, \
                                handle_unknown='ignore', \
                                return_df=True).fit(X)
X = categ_encoding.transform(X)

Train Test split + Random Forest fit

[8]:

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.75, random_state=1)

rf = RandomForestClassifier(n_estimators=100,min_samples_leaf=3)
rf.fit(Xtrain, ytrain)

[8]:

RandomForestClassifier(min_samples_leaf=3)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

[9]:

ypred = pd.DataFrame(rf.predict(Xtest),columns=['pred'],index=Xtest.index)

Use Shapash With Shapley Contributions¶

[10]:

from shapash import SmartExplainer

Differents ways to compute Shapeley values with Shap¶

Let Shapash choose the method for you¶

[11]:

xpl = SmartExplainer(
    model=rf,
    backend='shap',
    preprocessing=categ_encoding,
    features_dict=titan_dict
)
xpl.compile(
    y_pred=ypred,
    y_target=ytest, # Optional: allows to display True Values vs Predicted Values
    x=Xtest
)

INFO: Shap explainer type - <shap.explainers._exact.ExactExplainer object at 0x7f0c1702dca0>

ExactExplainer explainer: 224it [00:42,  4.62it/s]

Let Shap choose the method for you and give the masker you want¶

[12]:

xpl = SmartExplainer(
    model=rf,
    backend='shap',
    explainer_args={'model': rf.predict_proba, 'masker': Xtest},
    preprocessing=categ_encoding,
    features_dict=titan_dict
)
xpl.compile(
    y_pred=ypred,
    y_target=ytest, # Optional: allows to display True Values vs Predicted Values
    x=Xtest
)

INFO: Shap explainer type - <shap.explainers._exact.ExactExplainer object at 0x7f0c1702d910>

ExactExplainer explainer: 224it [00:36,  4.36it/s]

Tell Shap what do¶

[13]:

xpl = SmartExplainer(
    model=rf,
    backend='shap',
    explainer_args={'explainer': shap.explainers.PermutationExplainer, 'model': rf.predict_proba, 'masker': Xtest},
    preprocessing=categ_encoding,
    features_dict=titan_dict
)
xpl.compile(
    y_pred=ypred,
    y_target=ytest, # Optional: allows to display True Values vs Predicted Values
    x=Xtest
)

INFO: Shap explainer type - shap.explainers.PermutationExplainer()

PermutationExplainer explainer: 224it [03:04,  1.14it/s]

Use contributions parameter of compile method to declare Shapley contributions¶

[14]:

xpl = SmartExplainer(
    model=rf,
    preprocessing=categ_encoding,
    features_dict=titan_dict
)

masker = pd.DataFrame(shap.kmeans(Xtest, 50).data, columns=Xtest.columns)
explainer = shap.explainers.PermutationExplainer(model=rf.predict_proba, masker=masker)
shap_contrib = explainer.shap_values(Xtest)

xpl.compile(
    contributions=shap_contrib, # Shap Contributions pd.DataFrame
    y_pred=ypred,
    y_target=ytest, # Optional: allows to display True Values vs Predicted Values
    x=Xtest
)

PermutationExplainer explainer: 224it [00:23,  5.51it/s]