Contributions comparing plot¶

compare_plot is a method that displays scatter plot of contributions of several individuals. The purpose of these representations is to understand where the difference of predictions of several indivuals stems from.

This tutorial presents the different parameters you can use in compare_plot to tune output.

Contents: - Loading dataset and fitting a model.

Regression case: Specify the target modality to display.
Input parameters
Classification case

Data from Kaggle: House Prices

[1]:

import pandas as pd
from catboost import CatBoostRegressor
from sklearn.model_selection import train_test_split

Building Supervized Model¶

First Step : Load house prices data¶

[2]:

from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')
y_df = house_df['SalePrice'].to_frame()
X_df = house_df[house_df.columns.difference(['SalePrice'])]

[3]:

X_df.head()

[3]:

	1stFlrSF	2ndFlrSF	3SsnPorch	BedroomAbvGr	BldgType	BsmtCond	BsmtExposure	BsmtFinSF1	BsmtFinSF2	BsmtFinType1	...	SaleType	ScreenPorch	Street	TotRmsAbvGrd	TotalBsmtSF	Utilities	WoodDeckSF	YearBuilt	YearRemodAdd	YrSold
Id
1	856	854	0	3	Single-family Detached	Typical - slight dampness allowed	No Exposure/No Basement	706	0	Good Living Quarters	...	Warranty Deed - Conventional	0	Paved	8	856	All public Utilities (E,G,W,& S)	0	2003	2003	2008
2	1262	0	0	3	Single-family Detached	Typical - slight dampness allowed	Good Exposure	978	0	Average Living Quarters	...	Warranty Deed - Conventional	0	Paved	6	1262	All public Utilities (E,G,W,& S)	298	1976	1976	2007
3	920	866	0	3	Single-family Detached	Typical - slight dampness allowed	Mimimum Exposure	486	0	Good Living Quarters	...	Warranty Deed - Conventional	0	Paved	6	920	All public Utilities (E,G,W,& S)	0	2001	2002	2008
4	961	756	0	3	Single-family Detached	Good	No Exposure/No Basement	216	0	Average Living Quarters	...	Warranty Deed - Conventional	0	Paved	7	756	All public Utilities (E,G,W,& S)	0	1915	1970	2006
5	1145	1053	0	4	Single-family Detached	Typical - slight dampness allowed	Average Exposure	655	0	Good Living Quarters	...	Warranty Deed - Conventional	0	Paved	9	1145	All public Utilities (E,G,W,& S)	192	2000	2000	2008

5 rows × 72 columns

Second step : Encode the categorical variables¶

[4]:

from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(
    cols=categorical_features,
    handle_unknown='ignore',
    return_df=True).fit(X_df)

X_df = encoder.transform(X_df)

Regression case¶

Third step : Get your dataset ready and fit your model¶

[5]:

Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)

[6]:

regressor = CatBoostRegressor(n_estimators=50).fit(Xtrain, ytrain, verbose=False)

[7]:

y_pred = pd.DataFrame(regressor.predict(Xtest), columns=['pred'], index=Xtest.index)

Declare and compile your SmartExplainer explainer¶

[8]:

from shapash import SmartExplainer

[9]:

xpl = SmartExplainer(
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    features_dict=house_dict  # Optional parameter, dict specifies label for features name
)

[10]:

house_dict['MSZoning']

[10]:

'General zoning classification'

[11]:

xpl.compile(
    x=Xtest,
    y_pred=y_pred # Optional
)

Backend: Shap TreeExplainer

Compare_plot¶

Now that your explainer is ready, you can use the compare_plot to understand how two (or more) individuals are different.

For example, if you want to compare the first two individuals of the Xtest dataset, you have several ways to do it :

you can use the row_num parameter by using row_num = [0, 1]
You can also directly use the indexes, by index = [Xtest.index[0], Xtest.index[1]]
You can also use directly the index numbers : index = [259, 268]

The result of each the methods above is the same :

[12]:

xpl.plot.compare_plot(index=[Xtest.index[0], Xtest.index[1]])

../../_images/tutorials_plots_and_charts_tuto-plot04-compare_plot_20_0.png

In this example, we can see that the ‘Ground living area square feet’ contributes a lot more for Id 268 than Id 259.

We can see more details of a specific point on hover.

Number of features displayed¶

By default, the number of features displayed by the compare_plot is 20. You can modify it with the max_features parameter. You can also compare more than 2 individuals:

[13]:

xpl.plot.compare_plot(row_num=[0, 1, 2, 3, 4], max_features=8)

../../_images/tutorials_plots_and_charts_tuto-plot04-compare_plot_24_0.png

You can also decide whether or not showing the prediction in subtitle, with the show_predict parameter.

[14]:

xpl.plot.compare_plot(row_num=[0, 1], show_predict=False, max_features=100)

../../_images/tutorials_plots_and_charts_tuto-plot04-compare_plot_26_0.png

Classification case¶

Transform our use case into classification:

[15]:

from sklearn.ensemble import RandomForestClassifier

[16]:

ytrain['PriceClass'] = ytrain['SalePrice'].apply(lambda x: 1 if x < 150000 else (3 if x > 300000 else 2))
label_dict = { 1 : 'Cheap', 2 : 'Moderately Expensive', 3 : 'Expensive' }

[17]:

clf = RandomForestClassifier(n_estimators=50).fit(Xtrain,ytrain['PriceClass'])
y_pred_clf = pd.DataFrame(clf.predict(Xtest), columns=['pred'], index=Xtest.index)

Declare new SmartExplainer dedicated to classification problem¶

[18]:

xplclf = SmartExplainer(
    model=clf,
    preprocessing=encoder,
    features_dict=house_dict,
    label_dict=label_dict      # Optional parameters: display explicit output
)

[19]:

xplclf.compile(
    x=Xtest,
    y_pred=y_pred_clf
)

Backend: Shap TreeExplainer

Use label parameter of compare_plot parameter to select the explanation you want¶

with label parameter, you can specify explicit label or label number.

[20]:

xplclf.plot.compare_plot(row_num=[0, 1], label=1) # Equivalent to label = 'Cheap'

../../_images/tutorials_plots_and_charts_tuto-plot04-compare_plot_35_0.png

By default, if label parameter isn’t mentioned, the last label will be used.

[21]:

xplclf.plot.compare_plot(row_num=[0, 1, 2, 3], max_features=10)

../../_images/tutorials_plots_and_charts_tuto-plot04-compare_plot_37_0.png