Shapash model in production - Overview

With this tutorial you: Understand how to create a Shapash SmartPredictor to make prediction and have local explanation in production with a simple use case.

This tutorial describes the different steps from training the model to Shapash SmartPredictor deployment. A more detailed tutorial allows you to know more about the SmartPredictor Object.

Contents: - Build a Regressor - Compile Shapash SmartExplainer - From Shapash SmartExplainer to SmartPredictor - Save Shapash Smartpredictor Object in pickle file - Make a prediction

Data from Kaggle House Prices

[1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

Step 1 : Exploration and training of the model

Building Supervized Model

In this section, we train a Machine Learning supervized model with our data House Prices.

[2]:
from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')
[3]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

Preprocessing step

Encoding Categorical Features

[4]:
from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(cols=categorical_features,
                         handle_unknown='ignore',
                         return_df=True).fit(X_df)

X_encoded=encoder.transform(X_df)

Train / Test Split

[5]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_encoded, y_df, train_size=0.75, random_state=1)

Model Fitting

[6]:
regressor = LGBMRegressor(n_estimators=200).fit(Xtrain, ytrain)
[7]:
y_pred = pd.DataFrame(regressor.predict(Xtest), columns=['pred'], index=Xtest.index)

Understand my model with shapash

In this section, we use the SmartExplainer Object from shapash. - It allows users to understand how the model works with the specified data. - This object must be used only for data mining step. Shapash provides another object for deployment. - In this tutorial, we are not exploring possibilites of the SmartExplainer but others will. (see other tutorials)

Declare and Compile SmartExplainer

[8]:
from shapash import SmartExplainer

Use wording on features names to better understanding results

Here, we use a wording to rename our features label with more understandable terms. It’s usefull to make our local explainability more operational and understandable for users. - To do this, we use the house_dict dictionary which maps a description to each features. - We can then use it features_dict as a parameter of the SmartExplainer.

[9]:
xpl = SmartExplainer(
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    features_dict=house_dict
)

compile() This method is the first step to understand model and prediction. It performs the sorting of contributions, the reverse preprocessing steps and all the calculations necessary for a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)

[10]:
xpl.compile(x=Xtest,
 y_pred=y_pred,
 y_target=ytest, # Optional: allows to display True Values vs Predicted Values
 )
Backend: Shap TreeExplainer

Understand results of your trained model

Then, we can easily get a first summary of the explanation of the model results. - Here, we chose to get the 3 most contributive features for each prediction. - We used a wording to get features names more understandable in operationnal case.

[11]:
xpl.to_pandas(max_contrib=3).head()
[11]:
pred feature_1 value_1 contribution_1 feature_2 value_2 contribution_2 feature_3 value_3 contribution_3
259 209141.256921 Ground living area square feet 1792 13710.4 Overall material and finish of the house 7 12776.3 Total square feet of basement area 963 -5103.03
268 178734.474531 Ground living area square feet 2192 29747 Overall material and finish of the house 5 -26151.3 Overall condition of the house 8 9190.84
289 113950.844570 Overall material and finish of the house 5 -24730 Ground living area square feet 900 -16342.6 Total square feet of basement area 882 -5922.64
650 74957.162142 Overall material and finish of the house 4 -33927.7 Ground living area square feet 630 -23234.4 Total square feet of basement area 630 -11687.9
1234 135305.243500 Overall material and finish of the house 5 -25445.7 Ground living area square feet 1188 -11476.6 Condition of sale Abnormal Sale -5071.82

Step 2 : SmartPredictor in production

Switch from SmartExplainer to SmartPredictor

When you are satisfied by your results and the explainablity given by Shapash, you can use the SmartPredictor object for deployment. - In this section, we learn how to easily switch from SmartExplainer to a SmartPredictor. - SmartPredictor allows you to make predictions, detail and summarize contributions on new data automatically. - It only keeps the attributes needed for deployment to be lighter than the SmartExplainer object. - SmartPredictor performs additional consistency checks before deployment. - SmartPredictor allows you to configure the way of summary to suit your use cases. - It can be used with API or in batch mode.

[12]:
predictor = xpl.to_smartpredictor()

Save and Load your SmartPredictor

You can easily save and load your SmartPredictor Object in pickle.

Save your SmartPredictor in Pickle File

[13]:
predictor.save('./predictor.pkl')

Load your SmartPredictor in Pickle File

[14]:
from shapash.utils.load_smartpredictor import load_smartpredictor
[15]:
predictor_load = load_smartpredictor('./predictor.pkl')

Make a prediction with your SmartPredictor

In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor. - The add_input method is the first step to add a dataset for prediction and explainability. - It checks the structure of the dataset, the prediction and the contribution if specified. - It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation of this method) - In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request.

Add data

The x input in add_input method doesn’t have to be encoded, add_input applies preprocessing.

[16]:
predictor_load.add_input(x=X_df, ypred=y_df)

Make prediction

Then, we can see ypred is the one given in add_input method by checking the attribute data[“ypred”]. If not specified, it will automatically be computed in the method.

[17]:
predictor_load.data["ypred"].head()
[17]:
SalePrice
Id
1 208500
2 181500
3 223500
4 140000
5 250000

Get detailed explanability associated to the prediction

You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset. - For classification problems, it automatically associates contributions with the right predicted label. - The predicted label can be computed automatically in the method or you can specify an ypred with add_input method.

[18]:
detailed_contributions = predictor_load.detail_contributions()
[19]:
detailed_contributions.head()
[19]:
SalePrice 1stFlrSF 2ndFlrSF 3SsnPorch BedroomAbvGr BldgType BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 ... SaleType ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF YearBuilt YearRemodAdd YrSold
Id
1 208500 -1104.994176 1281.445856 0.0 375.679661 12.259902 157.224629 -233.025420 -738.445396 -59.294761 ... -104.645827 -351.621116 0.0 -498.228775 -5165.503476 0.0 -944.040092 3870.961681 2219.313761 17.478037
2 181500 2249.403962 -655.861167 0.0 123.907278 -9.270166 139.431860 2699.247506 5102.469936 -84.771341 ... -153.842142 -236.526862 0.0 -705.112993 2988.981279 0.0 2090.785074 323.902986 -3861.776078 424.382977
3 223500 -1426.795115 -616.113112 0.0 369.536957 9.210944 199.213726 1032.288162 -92.179454 -93.169310 ... -91.178667 -280.832451 0.0 -324.734175 -5338.340597 0.0 -777.746743 3837.761102 2192.921648 -98.965041
4 140000 -653.873832 121.459865 0.0 307.677892 9.720006 252.786934 -530.156452 -2987.649814 -77.039912 ... -114.608224 -338.435699 0.0 -635.065828 -6548.453864 0.0 -974.503140 -3386.361210 -5232.537839 1633.763619
5 250000 -9531.577733 -1097.620788 0.0 -1574.988323 7.453569 130.470247 623.939546 -2396.572526 -92.929525 ... -481.118248 -366.250007 0.0 -4733.603060 -4675.706762 0.0 165.653455 2334.652063 1355.358932 -395.126541

5 rows × 73 columns

Summarize explanability of the predictions

  • You can use the summarize method to summarize your local explainability

  • This summary can be configured with modify_mask method so that you have explainability that meets your operational needs.

  • When you initialize the SmartPredictor, you can also specify : >- postprocessing: to apply a wording to several values of your dataset. >- label_dict: to rename your label for classification problems. >- features_dict: to rename your features.

[20]:
predictor_load.modify_mask(max_contrib=3)
[21]:
explanation = predictor_load.summarize()

For example, here, we chose to build a summary with 3 most contributive features of your dataset. - As you can see below, the wording defined in the first step of this tutorial has been kept by the SmartPredictor and used in the summarize method.

[22]:
explanation.head()
[22]:
SalePrice feature_1 value_1 contribution_1 feature_2 value_2 contribution_2 feature_3 value_3 contribution_3
1 208500 Overall material and finish of the house 7 8248.82 Total square feet of basement area 856 -5165.5 Original construction date 2003 3870.96
2 181500 Overall material and finish of the house 6 -14419.4 Ground living area square feet 1262 -9238.07 Overall condition of the house 8 6371.61
3 223500 Ground living area square feet 1786 15880.4 Overall material and finish of the house 7 9651.28 Size of garage in square feet 608 6259.46
4 140000 Total square feet of basement area 756 -6548.45 Remodel date 1970 -5232.54 Size of garage in square feet 642 4384.29
5 250000 Overall material and finish of the house 8 55722.1 Ground living area square feet 2198 17176.5 Size of garage in square feet 836 14907.7