Shapash model in production - Overview

With this tutorial you: Understand how to create a Shapash SmartPredictor to make prediction and have local explanation in production with a simple use case.

This tutorial describes the different steps from training the model to Shapash SmartPredictor deployment. A more detailed tutorial allows you to know more about the SmartPredictor Object.

Contents: - Build a Regressor - Compile Shapash SmartExplainer - From Shapash SmartExplainer to SmartPredictor - Save Shapash Smartpredictor Object in pickle file - Make a prediction

Data from Kaggle House Prices

[1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

Step 1 : Exploration and training of the model

Building Supervized Model

In this section, we train a Machine Learning supervized model with our data House Prices.

[2]:
from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')
[3]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

Preprocessing step

Encoding Categorical Features

[4]:
from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(cols=categorical_features,
                         handle_unknown='ignore',
                         return_df=True).fit(X_df)

X_encoded=encoder.transform(X_df)

Train / Test Split

[5]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_encoded, y_df, train_size=0.75, random_state=1)

Model Fitting

[6]:
regressor = LGBMRegressor(n_estimators=200).fit(Xtrain, ytrain)
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002116 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2986
[LightGBM] [Info] Number of data points in the train set: 1095, number of used features: 66
[LightGBM] [Info] Start training from score 182319.757078
[7]:
y_pred = pd.DataFrame(regressor.predict(Xtest), columns=['pred'], index=Xtest.index)

Understand my model with shapash

In this section, we use the SmartExplainer Object from shapash. - It allows users to understand how the model works with the specified data. - This object must be used only for data mining step. Shapash provides another object for deployment. - In this tutorial, we are not exploring possibilites of the SmartExplainer but others will. (see other tutorials)

Declare and Compile SmartExplainer

[8]:
from shapash import SmartExplainer

Use wording on features names to better understanding results

Here, we use a wording to rename our features label with more understandable terms. It’s usefull to make our local explainability more operational and understandable for users. - To do this, we use the house_dict dictionary which maps a description to each features. - We can then use it features_dict as a parameter of the SmartExplainer.

[9]:
xpl = SmartExplainer(
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    features_dict=house_dict
)

compile() This method is the first step to understand model and prediction. It performs the sorting of contributions, the reverse preprocessing steps and all the calculations necessary for a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)

[10]:
xpl.compile(x=Xtest,
 y_pred=y_pred,
 y_target=ytest, # Optional: allows to display True Values vs Predicted Values
 )
INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x1209e3f20>

Understand results of your trained model

Then, we can easily get a first summary of the explanation of the model results. - Here, we chose to get the 3 most contributive features for each prediction. - We used a wording to get features names more understandable in operationnal case.

[11]:
xpl.to_pandas(max_contrib=3).head()
[11]:
pred feature_1 value_1 contribution_1 feature_2 value_2 contribution_2 feature_3 value_3 contribution_3
259 211538.742157 Ground living area square feet 1792 13995.651927 Overall material and finish of the house 7 13539.441353 Total square feet of basement area 963 -5652.206854
268 178786.677257 Ground living area square feet 2192 27967.966278 Overall material and finish of the house 5 -26133.987559 Overall condition of the house 8 7799.924798
289 111985.324660 Overall material and finish of the house 5 -25571.348315 Ground living area square feet 900 -16006.763921 Total square feet of basement area 882 -5456.989325
650 73456.522515 Overall material and finish of the house 4 -34517.073676 Ground living area square feet 630 -21350.707866 Total square feet of basement area 630 -12699.371236
1234 136249.557316 Overall material and finish of the house 5 -26469.235405 Ground living area square feet 1188 -10980.550285 Condition of sale Abnormal Sale -5240.009373

Step 2 : SmartPredictor in production

Switch from SmartExplainer to SmartPredictor

When you are satisfied by your results and the explainablity given by Shapash, you can use the SmartPredictor object for deployment. - In this section, we learn how to easily switch from SmartExplainer to a SmartPredictor. - SmartPredictor allows you to make predictions, detail and summarize contributions on new data automatically. - It only keeps the attributes needed for deployment to be lighter than the SmartExplainer object. - SmartPredictor performs additional consistency checks before deployment. - SmartPredictor allows you to configure the way of summary to suit your use cases. - It can be used with API or in batch mode.

[12]:
predictor = xpl.to_smartpredictor()

Save and Load your SmartPredictor

You can easily save and load your SmartPredictor Object in pickle.

Save your SmartPredictor in Pickle File

[13]:
predictor.save('./predictor.pkl')

Load your SmartPredictor in Pickle File

[14]:
from shapash.utils.load_smartpredictor import load_smartpredictor
[15]:
predictor_load = load_smartpredictor('./predictor.pkl')

Make a prediction with your SmartPredictor

In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor. - The add_input method is the first step to add a dataset for prediction and explainability. - It checks the structure of the dataset, the prediction and the contribution if specified. - It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation of this method) - In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request.

Add data

The x input in add_input method doesn’t have to be encoded, add_input applies preprocessing.

[16]:
predictor_load.add_input(x=X_df, ypred=y_df)
INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x120bb8140>

Make prediction

Then, we can see ypred is the one given in add_input method by checking the attribute data[“ypred”]. If not specified, it will automatically be computed in the method.

[17]:
predictor_load.data["ypred"].head()
[17]:
SalePrice
Id
1 208500
2 181500
3 223500
4 140000
5 250000

Get detailed explanability associated to the prediction

You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset. - For classification problems, it automatically associates contributions with the right predicted label. - The predicted label can be computed automatically in the method or you can specify an ypred with add_input method.

[18]:
detailed_contributions = predictor_load.detail_contributions()
INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x120bb8140>
[19]:
detailed_contributions.head()
[19]:
SalePrice 1stFlrSF 2ndFlrSF 3SsnPorch BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 ... SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
Id
1 208500 -864.302666 1089.429010 0.0 337.521166 -1.949170 156.111469 -361.262389 605.499503 -62.668440 ... -121.386448 -340.892806 0.0 -353.001743 -4739.814200 0.0 -595.965510 3880.341978 2553.054173 -181.631619
2 181500 3350.844933 -584.369097 0.0 205.516384 -5.940831 123.975700 3533.453295 4220.333952 -73.558110 ... -163.670756 -245.501182 0.0 -614.916748 3362.791194 0.0 2428.197528 960.455476 -3867.310294 503.314916
3 223500 -1262.628672 324.396157 0.0 337.141678 -1.949170 155.617454 517.916355 897.597531 -67.811766 ... -130.962786 -370.460989 0.0 -224.931089 -5755.026659 0.0 -560.741090 3005.687892 2812.095218 -439.790704
4 140000 -1480.790566 76.480568 0.0 288.091047 -9.575975 315.446640 -688.845236 -2484.213094 -96.560334 ... -107.290727 -332.888191 0.0 -576.092859 -6153.292222 0.0 -713.047102 -4324.383049 -4434.789606 1122.301231
5 250000 -9853.708726 -1625.957100 0.0 -528.745169 0.831491 100.194985 374.714720 -1575.576971 -74.604628 ... -737.865156 -365.374274 0.0 -4273.313084 -4544.413358 0.0 -271.376836 1685.459458 1548.598137 -352.416569

5 rows × 73 columns

Summarize explanability of the predictions

  • You can use the summarize method to summarize your local explainability

  • This summary can be configured with modify_mask method so that you have explainability that meets your operational needs.

  • When you initialize the SmartPredictor, you can also specify : >- postprocessing: to apply a wording to several values of your dataset. >- label_dict: to rename your label for classification problems. >- features_dict: to rename your features.

[20]:
predictor_load.modify_mask(max_contrib=3)
[21]:
explanation = predictor_load.summarize()

For example, here, we chose to build a summary with 3 most contributive features of your dataset. - As you can see below, the wording defined in the first step of this tutorial has been kept by the SmartPredictor and used in the summarize method.

[22]:
explanation.head()
[22]:
SalePrice feature_1 value_1 contribution_1 feature_2 value_2 contribution_2 feature_3 value_3 contribution_3
1 208500 Overall material and finish of the house 7 6920.473837 Total square feet of basement area 856 -4739.8142 Original construction date 2003 3880.341978
2 181500 Overall material and finish of the house 6 -12277.905073 Ground living area square feet 1262 -9032.425476 Overall condition of the house 8 4246.185116
3 223500 Ground living area square feet 1786 16380.55428 Overall material and finish of the house 7 10036.003238 Size of garage in square feet 608 6193.320015
4 140000 Total square feet of basement area 756 -6153.292222 Size of garage in square feet 642 5581.668158 Overall material and finish of the house 7 5283.39957
5 250000 Overall material and finish of the house 8 59198.997538 Ground living area square feet 2198 15518.770218 Size of garage in square feet 836 12725.437383