Shapash model in production - Overview¶
With this tutorial you: Understand how to create a Shapash SmartPredictor to make prediction and have local explanation in production with a simple use case.
This tutorial describes the different steps from training the model to Shapash SmartPredictor deployment. A more detailed tutorial allows you to know more about the SmartPredictor Object.
Contents: - Build a Regressor - Compile Shapash SmartExplainer - From Shapash SmartExplainer to SmartPredictor - Save Shapash Smartpredictor Object in pickle file - Make a prediction
Data from Kaggle House Prices
[1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split
Step 1 : Exploration and training of the model¶
Building Supervized Model¶
In this section, we train a Machine Learning supervized model with our data House Prices.
[2]:
from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')
[3]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]
Preprocessing step¶
Encoding Categorical Features
[4]:
from category_encoders import OrdinalEncoder
categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']
encoder = OrdinalEncoder(cols=categorical_features,
handle_unknown='ignore',
return_df=True).fit(X_df)
X_encoded=encoder.transform(X_df)
Train / Test Split¶
[5]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_encoded, y_df, train_size=0.75, random_state=1)
Model Fitting¶
[6]:
regressor = LGBMRegressor(n_estimators=200).fit(Xtrain, ytrain)
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.002116 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2986
[LightGBM] [Info] Number of data points in the train set: 1095, number of used features: 66
[LightGBM] [Info] Start training from score 182319.757078
[7]:
y_pred = pd.DataFrame(regressor.predict(Xtest), columns=['pred'], index=Xtest.index)
Understand my model with shapash¶
In this section, we use the SmartExplainer Object from shapash. - It allows users to understand how the model works with the specified data. - This object must be used only for data mining step. Shapash provides another object for deployment. - In this tutorial, we are not exploring possibilites of the SmartExplainer but others will. (see other tutorials)
Declare and Compile SmartExplainer¶
[8]:
from shapash import SmartExplainer
Use wording on features names to better understanding results¶
Here, we use a wording to rename our features label with more understandable terms. It’s usefull to make our local explainability more operational and understandable for users. - To do this, we use the house_dict dictionary which maps a description to each features. - We can then use it features_dict as a parameter of the SmartExplainer.
[9]:
xpl = SmartExplainer(
model=regressor,
preprocessing=encoder, # Optional: compile step can use inverse_transform method
features_dict=house_dict
)
compile() This method is the first step to understand model and prediction. It performs the sorting of contributions, the reverse preprocessing steps and all the calculations necessary for a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)
[10]:
xpl.compile(x=Xtest,
y_pred=y_pred,
y_target=ytest, # Optional: allows to display True Values vs Predicted Values
)
INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x1209e3f20>
Understand results of your trained model¶
Then, we can easily get a first summary of the explanation of the model results. - Here, we chose to get the 3 most contributive features for each prediction. - We used a wording to get features names more understandable in operationnal case.
[11]:
xpl.to_pandas(max_contrib=3).head()
[11]:
| pred | feature_1 | value_1 | contribution_1 | feature_2 | value_2 | contribution_2 | feature_3 | value_3 | contribution_3 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 259 | 211538.742157 | Ground living area square feet | 1792 | 13995.651927 | Overall material and finish of the house | 7 | 13539.441353 | Total square feet of basement area | 963 | -5652.206854 |
| 268 | 178786.677257 | Ground living area square feet | 2192 | 27967.966278 | Overall material and finish of the house | 5 | -26133.987559 | Overall condition of the house | 8 | 7799.924798 |
| 289 | 111985.324660 | Overall material and finish of the house | 5 | -25571.348315 | Ground living area square feet | 900 | -16006.763921 | Total square feet of basement area | 882 | -5456.989325 |
| 650 | 73456.522515 | Overall material and finish of the house | 4 | -34517.073676 | Ground living area square feet | 630 | -21350.707866 | Total square feet of basement area | 630 | -12699.371236 |
| 1234 | 136249.557316 | Overall material and finish of the house | 5 | -26469.235405 | Ground living area square feet | 1188 | -10980.550285 | Condition of sale | Abnormal Sale | -5240.009373 |
Step 2 : SmartPredictor in production¶
Switch from SmartExplainer to SmartPredictor¶
When you are satisfied by your results and the explainablity given by Shapash, you can use the SmartPredictor object for deployment. - In this section, we learn how to easily switch from SmartExplainer to a SmartPredictor. - SmartPredictor allows you to make predictions, detail and summarize contributions on new data automatically. - It only keeps the attributes needed for deployment to be lighter than the SmartExplainer object. - SmartPredictor performs additional consistency checks before deployment. - SmartPredictor allows you to configure the way of summary to suit your use cases. - It can be used with API or in batch mode.
[12]:
predictor = xpl.to_smartpredictor()
Save and Load your SmartPredictor¶
You can easily save and load your SmartPredictor Object in pickle.
Save your SmartPredictor in Pickle File¶
[13]:
predictor.save('./predictor.pkl')
Load your SmartPredictor in Pickle File¶
[14]:
from shapash.utils.load_smartpredictor import load_smartpredictor
[15]:
predictor_load = load_smartpredictor('./predictor.pkl')
Make a prediction with your SmartPredictor¶
In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor. - The add_input method is the first step to add a dataset for prediction and explainability. - It checks the structure of the dataset, the prediction and the contribution if specified. - It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation of this method) - In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request.
Add data¶
The x input in add_input method doesn’t have to be encoded, add_input applies preprocessing.
[16]:
predictor_load.add_input(x=X_df, ypred=y_df)
INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x120bb8140>
Make prediction¶
Then, we can see ypred is the one given in add_input method by checking the attribute data[“ypred”]. If not specified, it will automatically be computed in the method.
[17]:
predictor_load.data["ypred"].head()
[17]:
| SalePrice | |
|---|---|
| Id | |
| 1 | 208500 |
| 2 | 181500 |
| 3 | 223500 |
| 4 | 140000 |
| 5 | 250000 |
Get detailed explanability associated to the prediction¶
You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset. - For classification problems, it automatically associates contributions with the right predicted label. - The predicted label can be computed automatically in the method or you can specify an ypred with add_input method.
[18]:
detailed_contributions = predictor_load.detail_contributions()
INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x120bb8140>
[19]:
detailed_contributions.head()
[19]:
| SalePrice | 1stFlrSF | 2ndFlrSF | 3SsnPorch | BedroomAbvGr | BldgType | BsmtCond | BsmtExposure | BsmtFinSF1 | BsmtFinSF2 | ... | SaleType | ScreenPorch | Street | TotRmsAbvGrd | TotalBsmtSF | Utilities | WoodDeckSF | YearBuilt | YearRemodAdd | YrSold | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Id | |||||||||||||||||||||
| 1 | 208500 | -864.302666 | 1089.429010 | 0.0 | 337.521166 | -1.949170 | 156.111469 | -361.262389 | 605.499503 | -62.668440 | ... | -121.386448 | -340.892806 | 0.0 | -353.001743 | -4739.814200 | 0.0 | -595.965510 | 3880.341978 | 2553.054173 | -181.631619 |
| 2 | 181500 | 3350.844933 | -584.369097 | 0.0 | 205.516384 | -5.940831 | 123.975700 | 3533.453295 | 4220.333952 | -73.558110 | ... | -163.670756 | -245.501182 | 0.0 | -614.916748 | 3362.791194 | 0.0 | 2428.197528 | 960.455476 | -3867.310294 | 503.314916 |
| 3 | 223500 | -1262.628672 | 324.396157 | 0.0 | 337.141678 | -1.949170 | 155.617454 | 517.916355 | 897.597531 | -67.811766 | ... | -130.962786 | -370.460989 | 0.0 | -224.931089 | -5755.026659 | 0.0 | -560.741090 | 3005.687892 | 2812.095218 | -439.790704 |
| 4 | 140000 | -1480.790566 | 76.480568 | 0.0 | 288.091047 | -9.575975 | 315.446640 | -688.845236 | -2484.213094 | -96.560334 | ... | -107.290727 | -332.888191 | 0.0 | -576.092859 | -6153.292222 | 0.0 | -713.047102 | -4324.383049 | -4434.789606 | 1122.301231 |
| 5 | 250000 | -9853.708726 | -1625.957100 | 0.0 | -528.745169 | 0.831491 | 100.194985 | 374.714720 | -1575.576971 | -74.604628 | ... | -737.865156 | -365.374274 | 0.0 | -4273.313084 | -4544.413358 | 0.0 | -271.376836 | 1685.459458 | 1548.598137 | -352.416569 |
5 rows × 73 columns
Summarize explanability of the predictions¶
You can use the summarize method to summarize your local explainability
This summary can be configured with modify_mask method so that you have explainability that meets your operational needs.
When you initialize the SmartPredictor, you can also specify : >- postprocessing: to apply a wording to several values of your dataset. >- label_dict: to rename your label for classification problems. >- features_dict: to rename your features.
[20]:
predictor_load.modify_mask(max_contrib=3)
[21]:
explanation = predictor_load.summarize()
For example, here, we chose to build a summary with 3 most contributive features of your dataset. - As you can see below, the wording defined in the first step of this tutorial has been kept by the SmartPredictor and used in the summarize method.
[22]:
explanation.head()
[22]:
| SalePrice | feature_1 | value_1 | contribution_1 | feature_2 | value_2 | contribution_2 | feature_3 | value_3 | contribution_3 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 208500 | Overall material and finish of the house | 7 | 6920.473837 | Total square feet of basement area | 856 | -4739.8142 | Original construction date | 2003 | 3880.341978 |
| 2 | 181500 | Overall material and finish of the house | 6 | -12277.905073 | Ground living area square feet | 1262 | -9032.425476 | Overall condition of the house | 8 | 4246.185116 |
| 3 | 223500 | Ground living area square feet | 1786 | 16380.55428 | Overall material and finish of the house | 7 | 10036.003238 | Size of garage in square feet | 608 | 6193.320015 |
| 4 | 140000 | Total square feet of basement area | 756 | -6153.292222 | Size of garage in square feet | 642 | 5581.668158 | Overall material and finish of the house | 7 | 5283.39957 |
| 5 | 250000 | Overall material and finish of the house | 8 | 59198.997538 | Ground living area square feet | 2198 | 15518.770218 | Size of garage in square feet | 836 | 12725.437383 |