How to use filter and local_plot methods¶

This tutorial presents the different parameters you can use to summarize and display local explanations. It also shows how to export this summary into pandas DataFrame

Contents: - Work with filter and local_plot method to tune output - display Positive or Negative contributions - mask hidden contrib or prediction - hide some specific features - Use query parameter to select without index or row number - Classification: How can you select the label value to display? - print the summary params - export local explanation with to_pandas

Data from Kaggle House Prices

[1]:

import pandas as pd
from category_encoders import OrdinalEncoder
from catboost import CatBoostRegressor, CatBoostClassifier
from sklearn.model_selection import train_test_split

Building Supervized Model¶

[2]:

from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

[3]:

from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(
    cols=categorical_features,
    handle_unknown='ignore',
    return_df=True).fit(X_df)

X_df=encoder.transform(X_df)

[4]:

Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)

[5]:

regressor = CatBoostRegressor(n_estimators=50).fit(Xtrain,ytrain,verbose=False)

[6]:

y_pred = pd.DataFrame(regressor.predict(Xtest),columns=['pred'],index=Xtest.index)

Work With filter and local_plot methods¶

First step: You need to Declare and Compile SmartExplainer¶

[7]:

from shapash import SmartExplainer

[8]:

xpl = SmartExplainer(
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    features_dict=house_dict, # Optional parameter, dict specifies label for features name
)

[9]:

xpl.compile(
    x=Xtest,
    y_pred=y_pred # Optional
)

Backend: Shap TreeExplainer

Filter method¶

Use the filter method to specify how to summarize local explainability you have 4 parameters to customize your summary: - max_contrib : maximum number of criteria to display - threshold : minimum value of the contribution (in absolute value) necessary to display a criterion - positive : display only positive contribution? Negative?(default None) - features_to_hide : list of features you don’t want to display

[10]:

xpl.filter(max_contrib=5)

Local_plot¶

[11]:

xpl.plot.local_plot(index=268)

../../_images/tutorials_plots_and_charts_tuto-plot01-local_plot-and-to_pandas_16_0.png

Threshold parameter to focus on significant contributions¶

[12]:

xpl.filter(max_contrib=5,threshold=10000)
xpl.plot.local_plot(index=268)

../../_images/tutorials_plots_and_charts_tuto-plot01-local_plot-and-to_pandas_18_0.png

Don’t display hidden contributions¶

[13]:

xpl.plot.local_plot(index=268,show_masked=False)

../../_images/tutorials_plots_and_charts_tuto-plot01-local_plot-and-to_pandas_20_0.png

You can also hide the predict value with parameter show_predict=False

Focus on Negative contribution¶

[14]:

xpl.filter(max_contrib=8,positive=False)
xpl.plot.local_plot(index=268)

../../_images/tutorials_plots_and_charts_tuto-plot01-local_plot-and-to_pandas_23_0.png

You can also focus positive contribution using positive=True

Hide specific features:¶

Because: - some features can be too complex - end user don’t want know unnecessary information

You can use features_to_hide parameter in filter method

[15]:

xpl.filter(max_contrib=8,positive=False,features_to_hide=['BsmtFullBath','GarageType'])
xpl.plot.local_plot(index=268)

../../_images/tutorials_plots_and_charts_tuto-plot01-local_plot-and-to_pandas_26_0.png

Select a row with a query¶

You can selct with an index or a row number. You can also use a query:

[16]:

xpl.filter(max_contrib=3,positive=False)
xpl.plot.local_plot(query="LotArea == 8400 and LotShape == 'Regular' and TotalBsmtSF == 720")

../../_images/tutorials_plots_and_charts_tuto-plot01-local_plot-and-to_pandas_28_0.png

Classification Case¶

transform our use case into classification:

[17]:

ytrain['PriceClass'] = ytrain['SalePrice'].apply(lambda x: 1 if x < 150000 else (3 if x > 300000 else 2))
label_dict = { 1 : 'Cheap', 2 : 'Moderately Expensive', 3 : 'Expensive' }

[18]:

clf = CatBoostClassifier(n_estimators=50).fit(Xtrain,ytrain['PriceClass'],verbose=False)
y_pred_clf = pd.DataFrame(clf.predict(Xtest),columns=['pred'],index=Xtest.index)

Declare new SmartExplainer dedicated to classification problem¶

[19]:

xplclf = SmartExplainer(
    model=clf,
    preprocessing=encoder,
    features_dict=house_dict,
    label_dict=label_dict # Optional parameters: display explicit output
)

[20]:

xplclf.compile(
    x=Xtest,
    y_pred=y_pred_clf
)

Backend: Shap TreeExplainer

Use label parameter of local_plot parameter to select the explanation you want¶

with label parameter, you can specify explicit label or label number

[21]:

xplclf.filter(max_contrib=7,positive=True)
xplclf.plot.local_plot(index=268,label='Moderately Expensive')

../../_images/tutorials_plots_and_charts_tuto-plot01-local_plot-and-to_pandas_36_0.png

See the summary parameters¶

[22]:

xplclf.mask_params

[22]:

{'features_to_hide': None,
 'threshold': None,
 'positive': True,
 'max_contrib': 7}

Export explanations¶

Export your local explanation in pd.DataFrame with to_pandas method :¶

The to_pandas method has the same parameters as the filter method
if you don’t specify any parameter, to_pandas use the same params you specified when you call filter method
When you work on classification problem, parameter proba=True output predict probability

[23]:

summary_df= xplclf.to_pandas(proba=True)

to_pandas params: {'features_to_hide': None, 'threshold': None, 'positive': True, 'max_contrib': 7}

[24]:

summary_df.head()

[24]:

	pred	proba	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2	feature_3	value_3	...	contribution_4	feature_5	value_5	contribution_5	feature_6	value_6	contribution_6	feature_7	value_7	contribution_7
259	Moderately Expensive	0.994917	Ground living area square feet	1792	0.309308	Interior finish of the garage?	Rough Finished	0.275467	Size of garage in square feet	564	...	0.182722	Physical locations within Ames city limits	College Creek	0.170888	Overall material and finish of the house	7	0.164045	Height of the basement	Good (90-99 inches)	0.139618
268	Moderately Expensive	0.876916	Second floor square feet	720	0.183251	Full bathrooms above grade	2	0.155086	Ground living area square feet	2192	...	0.143119	Type 1 finished square feet	378	0.142439	First Floor square feet	1052	0.127817	Half baths above grade	1	0.127717
289	Cheap	0.997304	Ground living area square feet	900	0.818922	Size of garage in square feet	280	0.561631	Total square feet of basement area	882	...	0.349033	Full bathrooms above grade	1	0.324806	Overall material and finish of the house	5	0.318031	First Floor square feet	900	0.247826
650	Cheap	0.998653	Ground living area square feet	630	0.816398	Size of garage in square feet	0	0.587745	Total square feet of basement area	630	...	0.355685	Overall material and finish of the house	4	0.317549	Full bathrooms above grade	1	0.31303	General zoning classification	Residential Medium Density	0.178395
1234	Cheap	0.852389	Ground living area square feet	1188	0.942118	Remodel date	1959	0.423368	Overall material and finish of the house	5	...	0.373812	Number of fireplaces	0	0.168725	Rating of basement finished area	Average Rec Room	0.130175	Wood deck area in square feet	0	0.12249

5 rows × 23 columns

It is also possible to calculate the probability relating to one of the target modality for all the dataset, and to display the elements of explainability associated with this target modality

[25]:

#Create One column pd.DataFrame with constant value
constantpred=pd.DataFrame([3 for x in range(Xtest.shape[0])],columns=['pred'],index=Xtest.index)
xplclf.add(y_pred=constantpred)
summary_df = xplclf.to_pandas(proba=True,max_contrib=3,threshold=0.1,positive=True)

[26]:

summary_df.head()

[26]:

	pred	proba	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2	feature_3	value_3	contribution_3
259	Expensive	0.003081	Ground living area square feet	1792	0.327986	Overall material and finish of the house	7	0.197494	Rating of basement finished area	Good Living Quarters	0.181953
268	Expensive	0.007627	Ground living area square feet	2192	0.825571	Wood deck area in square feet	262	0.251474	Remodel date	1997	0.157067
289	Expensive	0.000024	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
650	Expensive	0.000056	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1234	Expensive	0.000623	Type of sale	Court Officer Deed/Estate	0.114506	NaN	NaN	NaN	NaN	NaN	NaN

NB: The to_pandas method returns Nan for lines that do not meet your conditions