Shapash model in production - Overview¶

With this tutorial you: Understand how to create a Shapash SmartPredictor to make prediction and have local explanation in production with a simple use case.

This tutorial describes the different steps from training the model to Shapash SmartPredictor deployment. A more detailed tutorial allows you to know more about the SmartPredictor Object.

Contents: - Build a Regressor - Compile Shapash SmartExplainer - From Shapash SmartExplainer to SmartPredictor - Save Shapash Smartpredictor Object in pickle file - Make a prediction

Data from Kaggle House Prices

[1]:

import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

Step 1 : Exploration and training of the model¶

Building Supervized Model¶

In this section, we train a Machine Learning supervized model with our data House Prices.

[2]:

from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')

[3]:

y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

Preprocessing step¶

Encoding Categorical Features

[4]:

from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(cols=categorical_features,
                         handle_unknown='ignore',
                         return_df=True).fit(X_df)

X_encoded=encoder.transform(X_df)

Train / Test Split¶

[5]:

Xtrain, Xtest, ytrain, ytest = train_test_split(X_encoded, y_df, train_size=0.75, random_state=1)

Model Fitting¶

[6]:

regressor = LGBMRegressor(n_estimators=200).fit(Xtrain, ytrain)

[7]:

y_pred = pd.DataFrame(regressor.predict(Xtest), columns=['pred'], index=Xtest.index)

Understand my model with shapash¶

In this section, we use the SmartExplainer Object from shapash. - It allows users to understand how the model works with the specified data. - This object must be used only for data mining step. Shapash provides another object for deployment. - In this tutorial, we are not exploring possibilites of the SmartExplainer but others will. (see other tutorials)

Declare and Compile SmartExplainer¶

[8]:

from shapash import SmartExplainer

Use wording on features names to better understanding results¶

Here, we use a wording to rename our features label with more understandable terms. It’s usefull to make our local explainability more operational and understandable for users. - To do this, we use the house_dict dictionary which maps a description to each features. - We can then use it features_dict as a parameter of the SmartExplainer.

[9]:

xpl = SmartExplainer(
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    features_dict=house_dict
)

compile() This method is the first step to understand model and prediction. It performs the sorting of contributions, the reverse preprocessing steps and all the calculations necessary for a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)

[10]:

xpl.compile(x=Xtest,
 y_pred=y_pred,
 y_target=ytest, # Optional: allows to display True Values vs Predicted Values
 )

Backend: Shap TreeExplainer

Understand results of your trained model¶

Then, we can easily get a first summary of the explanation of the model results. - Here, we chose to get the 3 most contributive features for each prediction. - We used a wording to get features names more understandable in operationnal case.

[11]:

xpl.to_pandas(max_contrib=3).head()

[11]:

	pred	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2	feature_3	value_3	contribution_3
259	209141.256921	Ground living area square feet	1792	13710.4	Overall material and finish of the house	7	12776.3	Total square feet of basement area	963	-5103.03
268	178734.474531	Ground living area square feet	2192	29747	Overall material and finish of the house	5	-26151.3	Overall condition of the house	8	9190.84
289	113950.844570	Overall material and finish of the house	5	-24730	Ground living area square feet	900	-16342.6	Total square feet of basement area	882	-5922.64
650	74957.162142	Overall material and finish of the house	4	-33927.7	Ground living area square feet	630	-23234.4	Total square feet of basement area	630	-11687.9
1234	135305.243500	Overall material and finish of the house	5	-25445.7	Ground living area square feet	1188	-11476.6	Condition of sale	Abnormal Sale	-5071.82

Step 2 : SmartPredictor in production¶

Switch from SmartExplainer to SmartPredictor¶

When you are satisfied by your results and the explainablity given by Shapash, you can use the SmartPredictor object for deployment. - In this section, we learn how to easily switch from SmartExplainer to a SmartPredictor. - SmartPredictor allows you to make predictions, detail and summarize contributions on new data automatically. - It only keeps the attributes needed for deployment to be lighter than the SmartExplainer object. - SmartPredictor performs additional consistency checks before deployment. - SmartPredictor allows you to configure the way of summary to suit your use cases. - It can be used with API or in batch mode.

[12]:

predictor = xpl.to_smartpredictor()

Save and Load your SmartPredictor¶

You can easily save and load your SmartPredictor Object in pickle.

Save your SmartPredictor in Pickle File¶

[13]:

predictor.save('./predictor.pkl')

Load your SmartPredictor in Pickle File¶

[14]:

from shapash.utils.load_smartpredictor import load_smartpredictor

[15]:

predictor_load = load_smartpredictor('./predictor.pkl')

Make a prediction with your SmartPredictor¶

In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor. - The add_input method is the first step to add a dataset for prediction and explainability. - It checks the structure of the dataset, the prediction and the contribution if specified. - It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation of this method) - In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request.

Add data¶

The x input in add_input method doesn’t have to be encoded, add_input applies preprocessing.

[16]:

predictor_load.add_input(x=X_df, ypred=y_df)

Make prediction¶

Then, we can see ypred is the one given in add_input method by checking the attribute data[“ypred”]. If not specified, it will automatically be computed in the method.

[17]:

predictor_load.data["ypred"].head()

[17]:

	SalePrice
Id
1	208500
2	181500
3	223500
4	140000
5	250000

Get detailed explanability associated to the prediction¶

You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset. - For classification problems, it automatically associates contributions with the right predicted label. - The predicted label can be computed automatically in the method or you can specify an ypred with add_input method.

[18]:

detailed_contributions = predictor_load.detail_contributions()

[19]:

detailed_contributions.head()

[19]:

	SalePrice	1stFlrSF	2ndFlrSF	3SsnPorch	BedroomAbvGr	BldgType	BsmtCond	BsmtExposure	BsmtFinSF1	BsmtFinSF2	...	SaleType	ScreenPorch	Street	TotRmsAbvGrd	TotalBsmtSF	Utilities	WoodDeckSF	YearBuilt	YearRemodAdd	YrSold
Id
1	208500	-1104.994176	1281.445856	0.0	375.679661	12.259902	157.224629	-233.025420	-738.445396	-59.294761	...	-104.645827	-351.621116	0.0	-498.228775	-5165.503476	0.0	-944.040092	3870.961681	2219.313761	17.478037
2	181500	2249.403962	-655.861167	0.0	123.907278	-9.270166	139.431860	2699.247506	5102.469936	-84.771341	...	-153.842142	-236.526862	0.0	-705.112993	2988.981279	0.0	2090.785074	323.902986	-3861.776078	424.382977
3	223500	-1426.795115	-616.113112	0.0	369.536957	9.210944	199.213726	1032.288162	-92.179454	-93.169310	...	-91.178667	-280.832451	0.0	-324.734175	-5338.340597	0.0	-777.746743	3837.761102	2192.921648	-98.965041
4	140000	-653.873832	121.459865	0.0	307.677892	9.720006	252.786934	-530.156452	-2987.649814	-77.039912	...	-114.608224	-338.435699	0.0	-635.065828	-6548.453864	0.0	-974.503140	-3386.361210	-5232.537839	1633.763619
5	250000	-9531.577733	-1097.620788	0.0	-1574.988323	7.453569	130.470247	623.939546	-2396.572526	-92.929525	...	-481.118248	-366.250007	0.0	-4733.603060	-4675.706762	0.0	165.653455	2334.652063	1355.358932	-395.126541

5 rows × 73 columns

Summarize explanability of the predictions¶

You can use the summarize method to summarize your local explainability
This summary can be configured with modify_mask method so that you have explainability that meets your operational needs.
When you initialize the SmartPredictor, you can also specify : >- postprocessing: to apply a wording to several values of your dataset. >- label_dict: to rename your label for classification problems. >- features_dict: to rename your features.

[20]:

predictor_load.modify_mask(max_contrib=3)

[21]:

explanation = predictor_load.summarize()

For example, here, we chose to build a summary with 3 most contributive features of your dataset. - As you can see below, the wording defined in the first step of this tutorial has been kept by the SmartPredictor and used in the summarize method.

[22]:

explanation.head()

[22]:

	SalePrice	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2	feature_3	value_3	contribution_3
1	208500	Overall material and finish of the house	7	8248.82	Total square feet of basement area	856	-5165.5	Original construction date	2003	3870.96
2	181500	Overall material and finish of the house	6	-14419.4	Ground living area square feet	1262	-9238.07	Overall condition of the house	8	6371.61
3	223500	Ground living area square feet	1786	15880.4	Overall material and finish of the house	7	9651.28	Size of garage in square feet	608	6259.46
4	140000	Total square feet of basement area	756	-6548.45	Remodel date	1970	-5232.54	Size of garage in square feet	642	4384.29
5	250000	Overall material and finish of the house	8	55722.1	Ground living area square feet	2198	17176.5	Size of garage in square feet	836	14907.7