Shapash model in production - Overview

With this tutorial you: Understand how to create a Shapash SmartPredictor to make prediction and have local explanation in production with a simple use case.

This tutorial describes the different steps from training the model to Shapash SmartPredictor deployment. A more detailed tutorial allows you to know more about the SmartPredictor Object.

Contents: - Build a Regressor - Compile Shapash SmartExplainer - From Shapash SmartExplainer to SmartPredictor - Save Shapash Smartpredictor Object in pickle file - Make a prediction

Data from Kaggle House Prices

import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

Step 1 : Exploration and training of the model

Building Supervized Model

In this section, we train a Machine Learning supervized model with our data House Prices.

from import data_loading
house_df, house_dict = data_loading('house_prices')

Preprocessing step

Encoding Categorical Features

from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(cols=categorical_features,


Train / Test Split

Xtrain, Xtest, ytrain, ytest = train_test_split(X_encoded, y_df, train_size=0.75, random_state=1)

Model Fitting

regressor = LGBMRegressor(n_estimators=200).fit(Xtrain, ytrain)
y_pred = pd.DataFrame(regressor.predict(Xtest), columns=['pred'], index=Xtest.index)

Understand my model with shapash

In this section, we use the SmartExplainer Object from shapash. - It allows users to understand how the model works with the specified data. - This object must be used only for data mining step. Shapash provides another object for deployment. - In this tutorial, we are not exploring possibilites of the SmartExplainer but others will. (see other tutorials)

Declare and Compile SmartExplainer

from shapash import SmartExplainer

Use wording on features names to better understanding results

Here, we use a wording to rename our features label with more understandable terms. It’s usefull to make our local explainability more operational and understandable for users. - To do this, we use the house_dict dictionary which maps a description to each features. - We can then use it features_dict as a parameter of the SmartExplainer.

xpl = SmartExplainer(
    preprocessing=encoder, # Optional: compile step can use inverse_transform method

compile() This method is the first step to understand model and prediction. It performs the sorting of contributions, the reverse preprocessing steps and all the calculations necessary for a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)

 y_target=ytest, # Optional: allows to display True Values vs Predicted Values
Backend: Shap TreeExplainer

Understand results of your trained model

Then, we can easily get a first summary of the explanation of the model results. - Here, we chose to get the 3 most contributive features for each prediction. - We used a wording to get features names more understandable in operationnal case.

pred feature_1 value_1 contribution_1 feature_2 value_2 contribution_2 feature_3 value_3 contribution_3
259 209141.256921 Ground living area square feet 1792 13710.4 Overall material and finish of the house 7 12776.3 Total square feet of basement area 963 -5103.03
268 178734.474531 Ground living area square feet 2192 29747 Overall material and finish of the house 5 -26151.3 Overall condition of the house 8 9190.84
289 113950.844570 Overall material and finish of the house 5 -24730 Ground living area square feet 900 -16342.6 Total square feet of basement area 882 -5922.64
650 74957.162142 Overall material and finish of the house 4 -33927.7 Ground living area square feet 630 -23234.4 Total square feet of basement area 630 -11687.9
1234 135305.243500 Overall material and finish of the house 5 -25445.7 Ground living area square feet 1188 -11476.6 Condition of sale Abnormal Sale -5071.82

Step 2 : SmartPredictor in production

Switch from SmartExplainer to SmartPredictor

When you are satisfied by your results and the explainablity given by Shapash, you can use the SmartPredictor object for deployment. - In this section, we learn how to easily switch from SmartExplainer to a SmartPredictor. - SmartPredictor allows you to make predictions, detail and summarize contributions on new data automatically. - It only keeps the attributes needed for deployment to be lighter than the SmartExplainer object. - SmartPredictor performs additional consistency checks before deployment. - SmartPredictor allows you to configure the way of summary to suit your use cases. - It can be used with API or in batch mode.

predictor = xpl.to_smartpredictor()

Save and Load your SmartPredictor

You can easily save and load your SmartPredictor Object in pickle.

Save your SmartPredictor in Pickle File


Load your SmartPredictor in Pickle File

from shapash.utils.load_smartpredictor import load_smartpredictor
predictor_load = load_smartpredictor('./predictor.pkl')

Make a prediction with your SmartPredictor

In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor. - The add_input method is the first step to add a dataset for prediction and explainability. - It checks the structure of the dataset, the prediction and the contribution if specified. - It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation of this method) - In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request.

Add data

The x input in add_input method doesn’t have to be encoded, add_input applies preprocessing.

predictor_load.add_input(x=X_df, ypred=y_df)

Make prediction

Then, we can see ypred is the one given in add_input method by checking the attribute data[“ypred”]. If not specified, it will automatically be computed in the method.

1 208500
2 181500
3 223500
4 140000
5 250000

Get detailed explanability associated to the prediction

You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset. - For classification problems, it automatically associates contributions with the right predicted label. - The predicted label can be computed automatically in the method or you can specify an ypred with add_input method.

detailed_contributions = predictor_load.detail_contributions()
SalePrice 1stFlrSF 2ndFlrSF 3SsnPorch BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 ... SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
1 208500 -1104.994176 1281.445856 0.0 375.679661 12.259902 157.224629 -233.025420 -738.445396 -59.294761 ... -104.645827 -351.621116 0.0 -498.228775 -5165.503476 0.0 -944.040092 3870.961681 2219.313761 17.478037
2 181500 2249.403962 -655.861167 0.0 123.907278 -9.270166 139.431860 2699.247506 5102.469936 -84.771341 ... -153.842142 -236.526862 0.0 -705.112993 2988.981279 0.0 2090.785074 323.902986 -3861.776078 424.382977
3 223500 -1426.795115 -616.113112 0.0 369.536957 9.210944 199.213726 1032.288162 -92.179454 -93.169310 ... -91.178667 -280.832451 0.0 -324.734175 -5338.340597 0.0 -777.746743 3837.761102 2192.921648 -98.965041
4 140000 -653.873832 121.459865 0.0 307.677892 9.720006 252.786934 -530.156452 -2987.649814 -77.039912 ... -114.608224 -338.435699 0.0 -635.065828 -6548.453864 0.0 -974.503140 -3386.361210 -5232.537839 1633.763619
5 250000 -9531.577733 -1097.620788 0.0 -1574.988323 7.453569 130.470247 623.939546 -2396.572526 -92.929525 ... -481.118248 -366.250007 0.0 -4733.603060 -4675.706762 0.0 165.653455 2334.652063 1355.358932 -395.126541

5 rows × 73 columns

Summarize explanability of the predictions

  • You can use the summarize method to summarize your local explainability

  • This summary can be configured with modify_mask method so that you have explainability that meets your operational needs.

  • When you initialize the SmartPredictor, you can also specify : >- postprocessing: to apply a wording to several values of your dataset. >- label_dict: to rename your label for classification problems. >- features_dict: to rename your features.

explanation = predictor_load.summarize()

For example, here, we chose to build a summary with 3 most contributive features of your dataset. - As you can see below, the wording defined in the first step of this tutorial has been kept by the SmartPredictor and used in the summarize method.

SalePrice feature_1 value_1 contribution_1 feature_2 value_2 contribution_2 feature_3 value_3 contribution_3
1 208500 Overall material and finish of the house 7 8248.82 Total square feet of basement area 856 -5165.5 Original construction date 2003 3870.96
2 181500 Overall material and finish of the house 6 -14419.4 Ground living area square feet 1262 -9238.07 Overall condition of the house 8 6371.61
3 223500 Ground living area square feet 1786 15880.4 Overall material and finish of the house 7 9651.28 Size of garage in square feet 608 6259.46
4 140000 Total square feet of basement area 756 -6548.45 Remodel date 1970 -5232.54 Size of garage in square feet 642 4384.29
5 250000 Overall material and finish of the house 8 55722.1 Ground living area square feet 2198 17176.5 Size of garage in square feet 836 14907.7