Shapash in Jupyter - Overview¶

With this tutorial you: Understand how Shapash works in Jupyter Notebook with a simple use case

Contents: - Build a Regressor - Compile Shapash SmartExplainer - Display global and local explanability - Export local summarized explainability with to_pandas method - Save Shapash object in pickle file

Data from Kaggle House Prices

[1]:

import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

Building Supervized Model¶

[2]:

from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')

[3]:

y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

[4]:

house_df.head()

[4]:

	MSSubClass	MSZoning	LotArea	Street	LotShape	LandContour	Utilities	LotConfig	LandSlope	Neighborhood	...	EnclosedPorch	3SsnPorch	ScreenPorch	PoolArea	MiscVal	MoSold	YrSold	SaleType	SaleCondition	SalePrice
Id
1	2-Story 1946 & Newer	Residential Low Density	8450	Paved	Regular	Near Flat/Level	All public Utilities (E,G,W,& S)	Inside lot	Gentle slope	College Creek	...	0	0	0	0	0	2	2008	Warranty Deed - Conventional	Normal Sale	208500
2	1-Story 1946 & Newer All Styles	Residential Low Density	9600	Paved	Regular	Near Flat/Level	All public Utilities (E,G,W,& S)	Frontage on 2 sides of property	Gentle slope	Veenker	...	0	0	0	0	0	5	2007	Warranty Deed - Conventional	Normal Sale	181500
3	2-Story 1946 & Newer	Residential Low Density	11250	Paved	Slightly irregular	Near Flat/Level	All public Utilities (E,G,W,& S)	Inside lot	Gentle slope	College Creek	...	0	0	0	0	0	9	2008	Warranty Deed - Conventional	Normal Sale	223500
4	2-Story 1945 & Older	Residential Low Density	9550	Paved	Slightly irregular	Near Flat/Level	All public Utilities (E,G,W,& S)	Corner lot	Gentle slope	Crawford	...	272	0	0	0	0	2	2006	Warranty Deed - Conventional	Abnormal Sale	140000
5	2-Story 1946 & Newer	Residential Low Density	14260	Paved	Slightly irregular	Near Flat/Level	All public Utilities (E,G,W,& S)	Frontage on 2 sides of property	Gentle slope	Northridge	...	0	0	0	0	0	12	2008	Warranty Deed - Conventional	Normal Sale	250000

5 rows × 73 columns

Encoding Categorical Features¶

[5]:

from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(
    cols=categorical_features,
    handle_unknown='ignore',
    return_df=True).fit(X_df)

X_df=encoder.transform(X_df)

Train / Test Split¶

[6]:

Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)

Model Fitting¶

[7]:

regressor = LGBMRegressor(n_estimators=200).fit(Xtrain,ytrain)

[8]:

y_pred = pd.DataFrame(regressor.predict(Xtest),columns=['pred'],index=Xtest.index)

Understand my model with shapash¶

Declare and Compile SmartExplainer¶

[9]:

from shapash import SmartExplainer

[10]:

xpl = SmartExplainer(
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    features_dict=house_dict  # Optional parameter, dict specifies label for features name
)

[11]:

xpl.compile(x=Xtest,
 y_pred=y_pred,
 y_target=ytest, # Optional: allows to display True Values vs Predicted Values
 )

Backend: Shap TreeExplainer

Display features importance¶

[12]:

xpl.plot.features_importance()

../_images/tutorials_tutorial02-Shapash-overview-in-Jupyter_19_0.png

Focus on a specific subset¶

You can use the features_importance method to compare the contribution of features of a subset to the global features importance

[13]:

subset = [ 168, 54, 995, 799, 310, 322, 1374,
          1106, 232, 645, 1170, 1229, 703, 66,
          886, 160, 191, 1183, 1037, 991, 482,
          725, 410, 59, 28, 719, 337, 36]
xpl.plot.features_importance(selection=subset)

../_images/tutorials_tutorial02-Shapash-overview-in-Jupyter_21_0.png

Understand how a feature contributes¶

The contribution_plot allows to analyse how one feature affects prediction
Type of plot depends on the type of features
You can use feature name, feature label or feature number to specify which feature you want to analyze

[14]:

xpl.plot.contribution_plot("OverallQual")

../_images/tutorials_tutorial02-Shapash-overview-in-Jupyter_23_0.png

[15]:

xpl.plot.contribution_plot("Second floor square feet")

../_images/tutorials_tutorial02-Shapash-overview-in-Jupyter_24_0.png

Display a Summarized but Explicit local explainability¶

Filter method¶

Use the filter method to specify how to summarize local explainability There are 4 parameters to customize the summary: - max_contrib : maximum number of criteria to display - threshold : minimum value of the contribution (in absolute value) necessary to display a criterion - positive : display only positive contribution? Negative?(default None) - features_to_hide : list of features you don’t want to display

[16]:

xpl.filter(max_contrib=8,threshold=100)

Display local plot, applying your filter¶

you can use row_num, index or query parameter to specify which prediction you want to explain

[17]:

xpl.plot.local_plot(index=560)

../_images/tutorials_tutorial02-Shapash-overview-in-Jupyter_29_0.png

Save your Explainer & Export results¶

Export your local explanation to pandas DataFrame:¶

to_pandas method has the same parameters as the filter method

[18]:

summary_df= xpl.to_pandas(
    max_contrib=3, # Number Max of features to show in summary
    threshold=5000,
)

[19]:

summary_df.head()

[19]:

	pred	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2	feature_3	value_3	contribution_3
259	209141.256921	Ground living area square feet	1792	13710.4	Overall material and finish of the house	7	12776.3	Total square feet of basement area	963	-5103.03
268	178734.474531	Ground living area square feet	2192	29747	Overall material and finish of the house	5	-26151.3	Overall condition of the house	8	9190.84
289	113950.844570	Overall material and finish of the house	5	-24730	Ground living area square feet	900	-16342.6	Total square feet of basement area	882	-5922.64
650	74957.162142	Overall material and finish of the house	4	-33927.7	Ground living area square feet	630	-23234.4	Total square feet of basement area	630	-11687.9
1234	135305.243500	Overall material and finish of the house	5	-25445.7	Ground living area square feet	1188	-11476.6	Condition of sale	Abnormal Sale	-5071.82

Save your explainer in Pickle File¶

You can save the SmartExplainer Object in a pickle file to make new plots later or launch the WebApp again

[20]:

xpl.save('./xpl.pkl')