From model training to deployment - an introduction to the SmartPredictor object

Shapash provides a SmartPredictor Object to make prediction and local explainability for operational needs in deployment context. It gives a summary of the local explanation of your prediction. SmartPredictor allows users to configure the summary to suit their use. It is an object dedicated to deployment, lighter than SmartExplainer Object with additionnal consistency checks. SmartPredictor can be used with an API or in batch mode.

This tutorial provides more information to help you getting started with the SmartPredictor Object of Shapash.

Contents: - Build a SmartPredictor - Save and Load a Smartpredictor - Add input - Use label and wording - Summarize explaination

We used Kaggle’s Titanic dataset

Step 1: Exploration and training of the model

Import Dataset

First, we need to import a dataset. Here we chose the famous dataset Titanic from Kaggle.

[1]:
import numpy as np
import pandas as pd
from category_encoders import OrdinalEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import shap
[2]:
from shapash.explainer.smart_predictor import SmartPredictor
from shapash.utils.load_smartpredictor import load_smartpredictor
from shapash.data.data_loader import data_loading
[3]:
titan_df, titan_dict = data_loading('titanic')
del titan_df['Name']
[4]:
titan_df.head()
[4]:
Survived Pclass Sex Age SibSp Parch Fare Embarked Title
PassengerId
1 0 Third class male 22.0 1 0 7.25 Southampton Mr
2 1 First class female 38.0 1 0 71.28 Cherbourg Mrs
3 1 Third class female 26.0 0 0 7.92 Southampton Miss
4 1 First class female 35.0 1 0 53.10 Southampton Mrs
5 0 Third class male 35.0 0 0 8.05 Southampton Mr

Create Classification Model

In this section, we train a Machine Learning supervized model with our data. In our example, we are confronted to a classification problem.

[5]:
y = titan_df['Survived']
X = titan_df.drop('Survived', axis=1)
[6]:
varcat=['Pclass', 'Sex', 'Embarked', 'Title']

Preprocessing Step

Encoding Categorical Features

[7]:
categ_encoding = OrdinalEncoder(cols=varcat, \
                                handle_unknown='ignore', \
                                return_df=True).fit(X)
X = categ_encoding.transform(X)

Train Test split + Random Forest fit

[8]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.75, random_state=1)

rf = RandomForestClassifier(n_estimators=100, min_samples_leaf=3)
rf.fit(Xtrain, ytrain)
[8]:
RandomForestClassifier(min_samples_leaf=3)
[9]:
ypred=pd.DataFrame(rf.predict(Xtest), columns=['pred'], index=Xtest.index)

Explore your trained model results Step with SmartExplainer

[10]:
from shapash import SmartExplainer

Use Label and Wording

Here, we use labels and wording to get a more understandable explainabily. - features_dict : allow users to rename features of their datasets - label_dict : allow users in classification problems to rename label predicted - postprocessing : allow users to apply some wording to the features wanted

[11]:
feature_dict = {
                'Pclass': 'Ticket class',
                 'Sex': 'Sex',
                 'Age': 'Age',
                 'SibSp': 'Relatives such as brother or wife',
                 'Parch': 'Relatives like children or parents',
                 'Fare': 'Passenger fare',
                 'Embarked': 'Port of embarkation',
                 'Title': 'Title of passenger'
               }
[12]:
label_dict = {0: "Not Survived", 1: "Survived"}
[13]:
postprocessing = {"Pclass": {'type': 'transcoding', 'rule': { 'First class': '1st class', 'Second class': '2nd class', "Third class": "3rd class"}}}

Define a SmartExplainer

[14]:
xpl = SmartExplainer(
    model=rf,
    preprocessing=categ_encoding,
    postprocessing=postprocessing,
    label_dict=label_dict,
    features_dict=feature_dict
)

compile() This method is the first step to understand model and prediction. It performs the sorting of contributions, the reverse preprocessing steps and all the calculations necessary for a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)

[15]:
xpl.compile(x=Xtest, y_pred=ypred)
Backend: Shap TreeExplainer

Understand results of your trained model with SmartExplainer

We can easily get a first summary of the explanation of the model results. - We choose to get the 3 most contributive features for each prediction. - We use a wording to get features names more understandable in operationnal case. - We rename the predicted label to show a more explicit prediction. - We apply a post-processing to transform some feature’s values.

[16]:
xpl.to_pandas(max_contrib=3).head()
[16]:
pred feature_1 value_1 contribution_1 feature_2 value_2 contribution_2 feature_3 value_3 contribution_3
863 Survived Title of passenger Mrs 0.170394 Sex female 0.168492 Ticket class 1st class 0.110185
224 Not Survived Title of passenger Mr 0.0913801 Sex male 0.0835547 Passenger fare 7.9 0.0654677
85 Survived Title of passenger Miss 0.204571 Sex female 0.169965 Ticket class 2nd class 0.102774
681 Survived Title of passenger Miss 0.193106 Sex female 0.153766 Port of embarkation Queenstown 0.13015
536 Survived Title of passenger Miss 0.206245 Ticket class 2nd class 0.128066 Sex female 0.112756

Step 2: SmartPredictor in production

to_smartpredictor() - It allows users to switch from a SmartExplainer used for data mining to the SmartPredictor. - It keeps the attributes needed for deployment to be lighter than the SmartExplainer object. - Smartpredictor performs additional consistency checks before deployment. - This object is dedicated to the deployment.

In this section, we learn how to initialize a SmartPredictor. - It makes new predictions and summarize explainability that you configured to make it operational to your needs. - SmartPredictor can be used with API or in batch mode. - It handles dataframes and dictionnaries input data.

Switch from SmartExplainer Object to SmartPredictor Object

[17]:
predictor = xpl.to_smartpredictor()

Save your predictor in Pickle File

[18]:
predictor.save('./predictor.pkl')

Load your predictor in Pickle File

[19]:
predictor_load = load_smartpredictor('./predictor.pkl')

Make a prediction with your SmartPredictor

  • Once our SmartPredictor has been initialized, we can compute new predictions and explain them.

  • First, we specify a new dataset which can be a pandas.DataFrame or a dictionnary. (usefull when you decide to use an API in your deployment process)

  • We use the add_input method of the SmartPredictor. (see the documentation of this method)

Add data

[20]:
person_x = {'Pclass': 'First class',
             'Sex': 'female',
             'Age': 36,
             'SibSp': 1,
             'Parch': 0,
             'Fare': 7.25,
             'Embarked': 'Cherbourg',
             'Title': 'Miss'
           }
[21]:
predictor_load.add_input(x=person_x)

If you don’t specify an ypred in the add_input method, SmartPredictor use its predict method to automatically affect the predicted value to ypred.

Make prediction

Let’s display ypred which has been automatically computed in add_input method.

[22]:
predictor_load.data["ypred"]
[22]:
ypred proba
0 Survived 0.744009

The predict_proba method of Smartpredictor computes the probabilties associated to each label.

[23]:
prediction_proba = predictor_load.predict_proba()
[24]:
prediction_proba
[24]:
class_0 class_1
0 0.255991 0.744009

Get detailed explanability associated to the prediction

  • You can use the method detail_contributions for detailed contributions of each of your features for each row of your new dataset.

  • For classification problems, it automatically associates contributions with the right predicted label.

  • The predicted label are computed automatically or you can specify an ypred with add_input method.

[25]:
detailed_contributions = predictor_load.detail_contributions()

The ypred has already been renamed with the value that we’ve given in the label_dict.

[26]:
detailed_contributions
[26]:
ypred proba Pclass Sex Age SibSp Parch Fare Embarked Title
0 Survived 0.744009 0.0950201 0.153742 -0.0111338 0.0192229 -0.00411547 -0.0879743 0.0316235 0.177685

Summarize explanability of the predictions

  • You can use the summarize method to summarize your local explainability.

  • This summary can be configured with the modify_mask method to suit your use case.

  • When you initialize the SmartPredictor, you can also specify : >- postprocessing: to apply a wording to several values of your dataset. >- label_dict: to rename your label for classification problems. >- features_dict: to rename your features.

We use modify_mask method to only get the 4 most contributives features in our local summary.

[27]:
predictor_load.modify_mask(max_contrib=4)
[28]:
explanation = predictor_load.summarize()
  • The dictionnary of mapping given to the SmartExplainer Object allows us to rename the ‘Title’ feature into ‘Title of passenger’.

  • The value of this features has been worded correctly: ‘First class’ became ‘1st class’.

  • Our explanability is focused on the 4 most contributive features.

[29]:
explanation
[29]:
ypred proba feature_1 value_1 contribution_1 feature_2 value_2 contribution_2 feature_3 value_3 contribution_3 feature_4 value_4 contribution_4
0 Survived 0.744009 Title of passenger Miss 0.177685 Sex female 0.153742 Ticket class 1st class 0.0950201 Passenger fare 7.25 -0.0879743

Classification - choose the predicted value and customize the summary

Configure summary: define the predicted label

You can change the ypred or the x given in add_input method to make new prediction and summary of your explanability.

[30]:
predictor_load.add_input(x=person_x, ypred=pd.DataFrame({"ypred": [0]}))
[31]:
predictor_load.modify_mask(max_contrib=3)
[32]:
explanation = predictor_load.summarize()

The displayed contributions and summary adapt to changing the predicted value of y_pred from 1 to 0.

[33]:
explanation
[33]:
ypred proba feature_1 value_1 contribution_1 feature_2 value_2 contribution_2 feature_3 value_3 contribution_3
0 Not Survived 0.255991 Title of passenger Miss -0.177685 Sex female -0.153742 Ticket class 1st class -0.0950201

Configure summary: mask one feature, select positives contributions

  • The modify_mask method allows us to configure the summary parameters of your explainability.

  • Here, we hide some features from our explanability and only get the one which has positives contributions.

[34]:
predictor_load.modify_mask(features_to_hide=["Fare"], positive=True)
[35]:
explanation = predictor_load.summarize()
[36]:
explanation
[36]:
ypred proba feature_1 value_1 contribution_1 feature_2 value_2 contribution_2
0 Not Survived 0.255991 Age 36 0.0111338 Relatives like children or parents 0 0.00411547

Configure summary: the threshold parameter

We display features which has contributions greater than 0.01.

[37]:
predictor_load.modify_mask(threshold=0.01)
[38]:
explanation = predictor_load.summarize()
[39]:
explanation
[39]:
ypred proba feature_1 value_1 contribution_1
0 Not Survived 0.255991 Age 36 0.0111338