Shapash model in production - Overview¶

With this tutorial you: Understand how to create a Shapash SmartPredictor to make prediction and have local explanation in production with a simple use case.

This tutorial describes the different steps from training the model to Shapash SmartPredictor deployment. A more detailed tutorial allows you to know more about the SmartPredictor Object.

Contents: - Build a Regressor - Compile Shapash SmartExplainer - From Shapash SmartExplainer to SmartPredictor - Save Shapash Smartpredictor Object in pickle file - Make a prediction

Data from Kaggle House Prices

[1]:

import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

Step 1 : Exploration and training of the model¶

Building Supervized Model¶

In this section, we train a Machine Learning supervized model with our data House Prices.

[2]:

from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')

[3]:

y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

Preprocessing step¶

Encoding Categorical Features

[4]:

from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(cols=categorical_features,
                         handle_unknown='ignore',
                         return_df=True).fit(X_df)

X_encoded=encoder.transform(X_df)

Train / Test Split¶

[5]:

Xtrain, Xtest, ytrain, ytest = train_test_split(X_encoded, y_df, train_size=0.75, random_state=1)

Model Fitting¶

[6]:

regressor = LGBMRegressor(n_estimators=200).fit(Xtrain, ytrain)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002116 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2986
[LightGBM] [Info] Number of data points in the train set: 1095, number of used features: 66
[LightGBM] [Info] Start training from score 182319.757078

[7]:

y_pred = pd.DataFrame(regressor.predict(Xtest), columns=['pred'], index=Xtest.index)

Understand my model with shapash¶

In this section, we use the SmartExplainer Object from shapash. - It allows users to understand how the model works with the specified data. - This object must be used only for data mining step. Shapash provides another object for deployment. - In this tutorial, we are not exploring possibilites of the SmartExplainer but others will. (see other tutorials)

Declare and Compile SmartExplainer¶

[8]:

from shapash import SmartExplainer

Use wording on features names to better understanding results¶

Here, we use a wording to rename our features label with more understandable terms. It’s usefull to make our local explainability more operational and understandable for users. - To do this, we use the house_dict dictionary which maps a description to each features. - We can then use it features_dict as a parameter of the SmartExplainer.

[9]:

xpl = SmartExplainer(
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    features_dict=house_dict
)

compile() This method is the first step to understand model and prediction. It performs the sorting of contributions, the reverse preprocessing steps and all the calculations necessary for a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)

[10]:

xpl.compile(x=Xtest,
 y_pred=y_pred,
 y_target=ytest, # Optional: allows to display True Values vs Predicted Values
 )

INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x1209e3f20>

Understand results of your trained model¶

Then, we can easily get a first summary of the explanation of the model results. - Here, we chose to get the 3 most contributive features for each prediction. - We used a wording to get features names more understandable in operationnal case.

[11]:

xpl.to_pandas(max_contrib=3).head()

[11]:

	pred	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2	feature_3	value_3	contribution_3
259	211538.742157	Ground living area square feet	1792	13995.651927	Overall material and finish of the house	7	13539.441353	Total square feet of basement area	963	-5652.206854
268	178786.677257	Ground living area square feet	2192	27967.966278	Overall material and finish of the house	5	-26133.987559	Overall condition of the house	8	7799.924798
289	111985.324660	Overall material and finish of the house	5	-25571.348315	Ground living area square feet	900	-16006.763921	Total square feet of basement area	882	-5456.989325
650	73456.522515	Overall material and finish of the house	4	-34517.073676	Ground living area square feet	630	-21350.707866	Total square feet of basement area	630	-12699.371236
1234	136249.557316	Overall material and finish of the house	5	-26469.235405	Ground living area square feet	1188	-10980.550285	Condition of sale	Abnormal Sale	-5240.009373

Step 2 : SmartPredictor in production¶

Switch from SmartExplainer to SmartPredictor¶

When you are satisfied by your results and the explainablity given by Shapash, you can use the SmartPredictor object for deployment. - In this section, we learn how to easily switch from SmartExplainer to a SmartPredictor. - SmartPredictor allows you to make predictions, detail and summarize contributions on new data automatically. - It only keeps the attributes needed for deployment to be lighter than the SmartExplainer object. - SmartPredictor performs additional consistency checks before deployment. - SmartPredictor allows you to configure the way of summary to suit your use cases. - It can be used with API or in batch mode.

[12]:

predictor = xpl.to_smartpredictor()

Save and Load your SmartPredictor¶

You can easily save and load your SmartPredictor Object in pickle.

Save your SmartPredictor in Pickle File¶

[13]:

predictor.save('./predictor.pkl')

Load your SmartPredictor in Pickle File¶

[14]:

from shapash.utils.load_smartpredictor import load_smartpredictor

[15]:

predictor_load = load_smartpredictor('./predictor.pkl')

Make a prediction with your SmartPredictor¶

In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor. - The add_input method is the first step to add a dataset for prediction and explainability. - It checks the structure of the dataset, the prediction and the contribution if specified. - It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation of this method) - In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request.

Add data¶

The x input in add_input method doesn’t have to be encoded, add_input applies preprocessing.

[16]:

predictor_load.add_input(x=X_df, ypred=y_df)

INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x120bb8140>

Make prediction¶

Then, we can see ypred is the one given in add_input method by checking the attribute data[“ypred”]. If not specified, it will automatically be computed in the method.

[17]:

predictor_load.data["ypred"].head()

[17]:

	SalePrice
Id
1	208500
2	181500
3	223500
4	140000
5	250000

Get detailed explanability associated to the prediction¶

You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset. - For classification problems, it automatically associates contributions with the right predicted label. - The predicted label can be computed automatically in the method or you can specify an ypred with add_input method.

[18]:

detailed_contributions = predictor_load.detail_contributions()

INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x120bb8140>

[19]:

detailed_contributions.head()

[19]:

	SalePrice	1stFlrSF	2ndFlrSF	3SsnPorch	BedroomAbvGr	BldgType	BsmtCond	BsmtExposure	BsmtFinSF1	BsmtFinSF2	...	SaleType	ScreenPorch	Street	TotRmsAbvGrd	TotalBsmtSF	Utilities	WoodDeckSF	YearBuilt	YearRemodAdd	YrSold
Id
1	208500	-864.302666	1089.429010	0.0	337.521166	-1.949170	156.111469	-361.262389	605.499503	-62.668440	...	-121.386448	-340.892806	0.0	-353.001743	-4739.814200	0.0	-595.965510	3880.341978	2553.054173	-181.631619
2	181500	3350.844933	-584.369097	0.0	205.516384	-5.940831	123.975700	3533.453295	4220.333952	-73.558110	...	-163.670756	-245.501182	0.0	-614.916748	3362.791194	0.0	2428.197528	960.455476	-3867.310294	503.314916
3	223500	-1262.628672	324.396157	0.0	337.141678	-1.949170	155.617454	517.916355	897.597531	-67.811766	...	-130.962786	-370.460989	0.0	-224.931089	-5755.026659	0.0	-560.741090	3005.687892	2812.095218	-439.790704
4	140000	-1480.790566	76.480568	0.0	288.091047	-9.575975	315.446640	-688.845236	-2484.213094	-96.560334	...	-107.290727	-332.888191	0.0	-576.092859	-6153.292222	0.0	-713.047102	-4324.383049	-4434.789606	1122.301231
5	250000	-9853.708726	-1625.957100	0.0	-528.745169	0.831491	100.194985	374.714720	-1575.576971	-74.604628	...	-737.865156	-365.374274	0.0	-4273.313084	-4544.413358	0.0	-271.376836	1685.459458	1548.598137	-352.416569

5 rows × 73 columns

Summarize explanability of the predictions¶

You can use the summarize method to summarize your local explainability
This summary can be configured with modify_mask method so that you have explainability that meets your operational needs.
When you initialize the SmartPredictor, you can also specify : >- postprocessing: to apply a wording to several values of your dataset. >- label_dict: to rename your label for classification problems. >- features_dict: to rename your features.

[20]:

predictor_load.modify_mask(max_contrib=3)

[21]:

explanation = predictor_load.summarize()

For example, here, we chose to build a summary with 3 most contributive features of your dataset. - As you can see below, the wording defined in the first step of this tutorial has been kept by the SmartPredictor and used in the summarize method.

[22]:

explanation.head()

[22]:

	SalePrice	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2	feature_3	value_3	contribution_3
1	208500	Overall material and finish of the house	7	6920.473837	Total square feet of basement area	856	-4739.8142	Original construction date	2003	3880.341978
2	181500	Overall material and finish of the house	6	-12277.905073	Ground living area square feet	1262	-9032.425476	Overall condition of the house	8	4246.185116
3	223500	Ground living area square feet	1786	16380.55428	Overall material and finish of the house	7	10036.003238	Size of garage in square feet	608	6193.320015
4	140000	Total square feet of basement area	756	-6153.292222	Size of garage in square feet	642	5581.668158	Overall material and finish of the house	7	5283.39957
5	250000	Overall material and finish of the house	8	59198.997538	Ground living area square feet	2198	15518.770218	Size of garage in square feet	836	12725.437383