Shapash model in production - Overview¶
With this tutorial you: Understand how to create a Shapash SmartPredictor to make prediction and have local explanation in production with a simple use case.
This tutorial describes the different steps from training the model to Shapash SmartPredictor deployment. A more detailed tutorial allows you to know more about the SmartPredictor Object.
Contents: - Build a Regressor - Compile Shapash SmartExplainer - From Shapash SmartExplainer to SmartPredictor - Save Shapash Smartpredictor Object in pickle file - Make a prediction
Data from Kaggle House Prices
[1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split
Step 1 : Exploration and training of the model¶
Building Supervized Model¶
In this section, we train a Machine Learning supervized model with our data House Prices.
[2]:
from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')
[3]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]
Preprocessing step¶
Encoding Categorical Features
[4]:
from category_encoders import OrdinalEncoder
categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']
encoder = OrdinalEncoder(cols=categorical_features,
handle_unknown='ignore',
return_df=True).fit(X_df)
X_encoded=encoder.transform(X_df)
Train / Test Split¶
[5]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_encoded, y_df, train_size=0.75, random_state=1)
Model Fitting¶
[6]:
regressor = LGBMRegressor(n_estimators=200).fit(Xtrain, ytrain)
[7]:
y_pred = pd.DataFrame(regressor.predict(Xtest), columns=['pred'], index=Xtest.index)
Understand my model with shapash¶
In this section, we use the SmartExplainer Object from shapash. - It allows users to understand how the model works with the specified data. - This object must be used only for data mining step. Shapash provides another object for deployment. - In this tutorial, we are not exploring possibilites of the SmartExplainer but others will. (see other tutorials)
Declare and Compile SmartExplainer¶
[8]:
from shapash import SmartExplainer
Use wording on features names to better understanding results¶
Here, we use a wording to rename our features label with more understandable terms. It’s usefull to make our local explainability more operational and understandable for users. - To do this, we use the house_dict dictionary which maps a description to each features. - We can then use it features_dict as a parameter of the SmartExplainer.
[9]:
xpl = SmartExplainer(
model=regressor,
preprocessing=encoder, # Optional: compile step can use inverse_transform method
features_dict=house_dict
)
compile() This method is the first step to understand model and prediction. It performs the sorting of contributions, the reverse preprocessing steps and all the calculations necessary for a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)
[10]:
xpl.compile(x=Xtest,
y_pred=y_pred,
y_target=ytest, # Optional: allows to display True Values vs Predicted Values
)
Backend: Shap TreeExplainer
Understand results of your trained model¶
Then, we can easily get a first summary of the explanation of the model results. - Here, we chose to get the 3 most contributive features for each prediction. - We used a wording to get features names more understandable in operationnal case.
[11]:
xpl.to_pandas(max_contrib=3).head()
[11]:
pred | feature_1 | value_1 | contribution_1 | feature_2 | value_2 | contribution_2 | feature_3 | value_3 | contribution_3 | |
---|---|---|---|---|---|---|---|---|---|---|
259 | 209141.256921 | Ground living area square feet | 1792 | 13710.4 | Overall material and finish of the house | 7 | 12776.3 | Total square feet of basement area | 963 | -5103.03 |
268 | 178734.474531 | Ground living area square feet | 2192 | 29747 | Overall material and finish of the house | 5 | -26151.3 | Overall condition of the house | 8 | 9190.84 |
289 | 113950.844570 | Overall material and finish of the house | 5 | -24730 | Ground living area square feet | 900 | -16342.6 | Total square feet of basement area | 882 | -5922.64 |
650 | 74957.162142 | Overall material and finish of the house | 4 | -33927.7 | Ground living area square feet | 630 | -23234.4 | Total square feet of basement area | 630 | -11687.9 |
1234 | 135305.243500 | Overall material and finish of the house | 5 | -25445.7 | Ground living area square feet | 1188 | -11476.6 | Condition of sale | Abnormal Sale | -5071.82 |
Step 2 : SmartPredictor in production¶
Switch from SmartExplainer to SmartPredictor¶
When you are satisfied by your results and the explainablity given by Shapash, you can use the SmartPredictor object for deployment. - In this section, we learn how to easily switch from SmartExplainer to a SmartPredictor. - SmartPredictor allows you to make predictions, detail and summarize contributions on new data automatically. - It only keeps the attributes needed for deployment to be lighter than the SmartExplainer object. - SmartPredictor performs additional consistency checks before deployment. - SmartPredictor allows you to configure the way of summary to suit your use cases. - It can be used with API or in batch mode.
[12]:
predictor = xpl.to_smartpredictor()
Save and Load your SmartPredictor¶
You can easily save and load your SmartPredictor Object in pickle.
Save your SmartPredictor in Pickle File¶
[13]:
predictor.save('./predictor.pkl')
Load your SmartPredictor in Pickle File¶
[14]:
from shapash.utils.load_smartpredictor import load_smartpredictor
[15]:
predictor_load = load_smartpredictor('./predictor.pkl')
Make a prediction with your SmartPredictor¶
In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor. - The add_input method is the first step to add a dataset for prediction and explainability. - It checks the structure of the dataset, the prediction and the contribution if specified. - It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation of this method) - In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request.
Add data¶
The x input in add_input method doesn’t have to be encoded, add_input applies preprocessing.
[16]:
predictor_load.add_input(x=X_df, ypred=y_df)
Make prediction¶
Then, we can see ypred is the one given in add_input method by checking the attribute data[“ypred”]. If not specified, it will automatically be computed in the method.
[17]:
predictor_load.data["ypred"].head()
[17]:
SalePrice | |
---|---|
Id | |
1 | 208500 |
2 | 181500 |
3 | 223500 |
4 | 140000 |
5 | 250000 |
Get detailed explanability associated to the prediction¶
You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset. - For classification problems, it automatically associates contributions with the right predicted label. - The predicted label can be computed automatically in the method or you can specify an ypred with add_input method.
[18]:
detailed_contributions = predictor_load.detail_contributions()
[19]:
detailed_contributions.head()
[19]:
SalePrice | 1stFlrSF | 2ndFlrSF | 3SsnPorch | BedroomAbvGr | BldgType | BsmtCond | BsmtExposure | BsmtFinSF1 | BsmtFinSF2 | ... | SaleType | ScreenPorch | Street | TotRmsAbvGrd | TotalBsmtSF | Utilities | WoodDeckSF | YearBuilt | YearRemodAdd | YrSold | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Id | |||||||||||||||||||||
1 | 208500 | -1104.994176 | 1281.445856 | 0.0 | 375.679661 | 12.259902 | 157.224629 | -233.025420 | -738.445396 | -59.294761 | ... | -104.645827 | -351.621116 | 0.0 | -498.228775 | -5165.503476 | 0.0 | -944.040092 | 3870.961681 | 2219.313761 | 17.478037 |
2 | 181500 | 2249.403962 | -655.861167 | 0.0 | 123.907278 | -9.270166 | 139.431860 | 2699.247506 | 5102.469936 | -84.771341 | ... | -153.842142 | -236.526862 | 0.0 | -705.112993 | 2988.981279 | 0.0 | 2090.785074 | 323.902986 | -3861.776078 | 424.382977 |
3 | 223500 | -1426.795115 | -616.113112 | 0.0 | 369.536957 | 9.210944 | 199.213726 | 1032.288162 | -92.179454 | -93.169310 | ... | -91.178667 | -280.832451 | 0.0 | -324.734175 | -5338.340597 | 0.0 | -777.746743 | 3837.761102 | 2192.921648 | -98.965041 |
4 | 140000 | -653.873832 | 121.459865 | 0.0 | 307.677892 | 9.720006 | 252.786934 | -530.156452 | -2987.649814 | -77.039912 | ... | -114.608224 | -338.435699 | 0.0 | -635.065828 | -6548.453864 | 0.0 | -974.503140 | -3386.361210 | -5232.537839 | 1633.763619 |
5 | 250000 | -9531.577733 | -1097.620788 | 0.0 | -1574.988323 | 7.453569 | 130.470247 | 623.939546 | -2396.572526 | -92.929525 | ... | -481.118248 | -366.250007 | 0.0 | -4733.603060 | -4675.706762 | 0.0 | 165.653455 | 2334.652063 | 1355.358932 | -395.126541 |
5 rows × 73 columns
Summarize explanability of the predictions¶
You can use the summarize method to summarize your local explainability
This summary can be configured with modify_mask method so that you have explainability that meets your operational needs.
When you initialize the SmartPredictor, you can also specify : >- postprocessing: to apply a wording to several values of your dataset. >- label_dict: to rename your label for classification problems. >- features_dict: to rename your features.
[20]:
predictor_load.modify_mask(max_contrib=3)
[21]:
explanation = predictor_load.summarize()
For example, here, we chose to build a summary with 3 most contributive features of your dataset. - As you can see below, the wording defined in the first step of this tutorial has been kept by the SmartPredictor and used in the summarize method.
[22]:
explanation.head()
[22]:
SalePrice | feature_1 | value_1 | contribution_1 | feature_2 | value_2 | contribution_2 | feature_3 | value_3 | contribution_3 | |
---|---|---|---|---|---|---|---|---|---|---|
1 | 208500 | Overall material and finish of the house | 7 | 8248.82 | Total square feet of basement area | 856 | -5165.5 | Original construction date | 2003 | 3870.96 |
2 | 181500 | Overall material and finish of the house | 6 | -14419.4 | Ground living area square feet | 1262 | -9238.07 | Overall condition of the house | 8 | 6371.61 |
3 | 223500 | Ground living area square feet | 1786 | 15880.4 | Overall material and finish of the house | 7 | 9651.28 | Size of garage in square feet | 608 | 6259.46 |
4 | 140000 | Total square feet of basement area | 756 | -6548.45 | Remodel date | 1970 | -5232.54 | Size of garage in square feet | 642 | 4384.29 |
5 | 250000 | Overall material and finish of the house | 8 | 55722.1 | Ground living area square feet | 2198 | 17176.5 | Size of garage in square feet | 836 | 14907.7 |