From model training to deployment - an introduction to the SmartPredictor object¶

Shapash provides a SmartPredictor Object to make prediction and local explainability for operational needs in deployment context. It gives a summary of the local explanation of your prediction. SmartPredictor allows users to configure the summary to suit their use. It is an object dedicated to deployment, lighter than SmartExplainer Object with additionnal consistency checks. SmartPredictor can be used with an API or in batch mode.

This tutorial provides more information to help you getting started with the SmartPredictor Object of Shapash.

Contents: - Build a SmartPredictor - Save and Load a Smartpredictor - Add input - Use label and wording - Summarize explaination

We used Kaggle’s Titanic dataset

Step 1: Exploration and training of the model¶

Import Dataset¶

First, we need to import a dataset. Here we chose the famous dataset Titanic from Kaggle.

[1]:

import numpy as np
import pandas as pd
from category_encoders import OrdinalEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import shap

[2]:

from shapash.explainer.smart_predictor import SmartPredictor
from shapash.utils.load_smartpredictor import load_smartpredictor
from shapash.data.data_loader import data_loading

[3]:

titan_df, titan_dict = data_loading('titanic')
del titan_df['Name']

[4]:

titan_df.head()

[4]:

	Survived	Pclass	Sex	Age	SibSp	Parch	Fare	Embarked	Title
PassengerId
1	0	Third class	male	22.0	1	0	7.25	Southampton	Mr
2	1	First class	female	38.0	1	0	71.28	Cherbourg	Mrs
3	1	Third class	female	26.0	0	0	7.92	Southampton	Miss
4	1	First class	female	35.0	1	0	53.10	Southampton	Mrs
5	0	Third class	male	35.0	0	0	8.05	Southampton	Mr

Create Classification Model¶

In this section, we train a Machine Learning supervized model with our data. In our example, we are confronted to a classification problem.

[5]:

y = titan_df['Survived']
X = titan_df.drop('Survived', axis=1)

[6]:

varcat=['Pclass', 'Sex', 'Embarked', 'Title']

Preprocessing Step¶

Encoding Categorical Features

[7]:

categ_encoding = OrdinalEncoder(cols=varcat, \
                                handle_unknown='ignore', \
                                return_df=True).fit(X)
X = categ_encoding.transform(X)

Train Test split + Random Forest fit¶

[8]:

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.75, random_state=1)

rf = RandomForestClassifier(n_estimators=100, min_samples_leaf=3)
rf.fit(Xtrain, ytrain)

[8]:

RandomForestClassifier(min_samples_leaf=3)

[9]:

ypred=pd.DataFrame(rf.predict(Xtest), columns=['pred'], index=Xtest.index)

Explore your trained model results Step with SmartExplainer¶

[10]:

from shapash import SmartExplainer

Use Label and Wording¶

Here, we use labels and wording to get a more understandable explainabily. - features_dict : allow users to rename features of their datasets - label_dict : allow users in classification problems to rename label predicted - postprocessing : allow users to apply some wording to the features wanted

[11]:

feature_dict = {
                'Pclass': 'Ticket class',
                 'Sex': 'Sex',
                 'Age': 'Age',
                 'SibSp': 'Relatives such as brother or wife',
                 'Parch': 'Relatives like children or parents',
                 'Fare': 'Passenger fare',
                 'Embarked': 'Port of embarkation',
                 'Title': 'Title of passenger'
               }

[12]:

label_dict = {0: "Not Survived", 1: "Survived"}

[13]:

postprocessing = {"Pclass": {'type': 'transcoding', 'rule': { 'First class': '1st class', 'Second class': '2nd class', "Third class": "3rd class"}}}

Define a SmartExplainer¶

[14]:

xpl = SmartExplainer(
    model=rf,
    preprocessing=categ_encoding,
    postprocessing=postprocessing,
    label_dict=label_dict,
    features_dict=feature_dict
)

compile() This method is the first step to understand model and prediction. It performs the sorting of contributions, the reverse preprocessing steps and all the calculations necessary for a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)

[15]:

xpl.compile(x=Xtest, y_pred=ypred)

Backend: Shap TreeExplainer

Understand results of your trained model with SmartExplainer¶

We can easily get a first summary of the explanation of the model results. - We choose to get the 3 most contributive features for each prediction. - We use a wording to get features names more understandable in operationnal case. - We rename the predicted label to show a more explicit prediction. - We apply a post-processing to transform some feature’s values.

[16]:

xpl.to_pandas(max_contrib=3).head()

[16]:

	pred	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2	feature_3	value_3	contribution_3
863	Survived	Title of passenger	Mrs	0.170394	Sex	female	0.168492	Ticket class	1st class	0.110185
224	Not Survived	Title of passenger	Mr	0.0913801	Sex	male	0.0835547	Passenger fare	7.9	0.0654677
85	Survived	Title of passenger	Miss	0.204571	Sex	female	0.169965	Ticket class	2nd class	0.102774
681	Survived	Title of passenger	Miss	0.193106	Sex	female	0.153766	Port of embarkation	Queenstown	0.13015
536	Survived	Title of passenger	Miss	0.206245	Ticket class	2nd class	0.128066	Sex	female	0.112756

Step 2: SmartPredictor in production¶

to_smartpredictor() - It allows users to switch from a SmartExplainer used for data mining to the SmartPredictor. - It keeps the attributes needed for deployment to be lighter than the SmartExplainer object. - Smartpredictor performs additional consistency checks before deployment. - This object is dedicated to the deployment.

In this section, we learn how to initialize a SmartPredictor. - It makes new predictions and summarize explainability that you configured to make it operational to your needs. - SmartPredictor can be used with API or in batch mode. - It handles dataframes and dictionnaries input data.

Switch from SmartExplainer Object to SmartPredictor Object¶

[17]:

predictor = xpl.to_smartpredictor()

Save your predictor in Pickle File¶

[18]:

predictor.save('./predictor.pkl')

Load your predictor in Pickle File¶

[19]:

predictor_load = load_smartpredictor('./predictor.pkl')

Make a prediction with your SmartPredictor¶

Once our SmartPredictor has been initialized, we can compute new predictions and explain them.
First, we specify a new dataset which can be a pandas.DataFrame or a dictionnary. (usefull when you decide to use an API in your deployment process)
We use the add_input method of the SmartPredictor. (see the documentation of this method)

Add data¶

[20]:

person_x = {'Pclass': 'First class',
             'Sex': 'female',
             'Age': 36,
             'SibSp': 1,
             'Parch': 0,
             'Fare': 7.25,
             'Embarked': 'Cherbourg',
             'Title': 'Miss'
           }

[21]:

predictor_load.add_input(x=person_x)

If you don’t specify an ypred in the add_input method, SmartPredictor use its predict method to automatically affect the predicted value to ypred.

Make prediction¶

Let’s display ypred which has been automatically computed in add_input method.

[22]:

predictor_load.data["ypred"]

[22]:

	ypred	proba
0	Survived	0.744009

The predict_proba method of Smartpredictor computes the probabilties associated to each label.

[23]:

prediction_proba = predictor_load.predict_proba()

[24]:

prediction_proba

[24]:

	class_0	class_1
0	0.255991	0.744009

Get detailed explanability associated to the prediction¶

You can use the method detail_contributions for detailed contributions of each of your features for each row of your new dataset.
For classification problems, it automatically associates contributions with the right predicted label.
The predicted label are computed automatically or you can specify an ypred with add_input method.

[25]:

detailed_contributions = predictor_load.detail_contributions()

The ypred has already been renamed with the value that we’ve given in the label_dict.

[26]:

detailed_contributions

[26]:

	ypred	proba	Pclass	Sex	Age	SibSp	Parch	Fare	Embarked	Title
0	Survived	0.744009	0.0950201	0.153742	-0.0111338	0.0192229	-0.00411547	-0.0879743	0.0316235	0.177685

Summarize explanability of the predictions¶

You can use the summarize method to summarize your local explainability.
This summary can be configured with the modify_mask method to suit your use case.
When you initialize the SmartPredictor, you can also specify : >- postprocessing: to apply a wording to several values of your dataset. >- label_dict: to rename your label for classification problems. >- features_dict: to rename your features.

We use modify_mask method to only get the 4 most contributives features in our local summary.

[27]:

predictor_load.modify_mask(max_contrib=4)

[28]:

explanation = predictor_load.summarize()

The dictionnary of mapping given to the SmartExplainer Object allows us to rename the ‘Title’ feature into ‘Title of passenger’.
The value of this features has been worded correctly: ‘First class’ became ‘1st class’.
Our explanability is focused on the 4 most contributive features.

[29]:

explanation

[29]:

	ypred	proba	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2	feature_3	value_3	contribution_3	feature_4	value_4	contribution_4
0	Survived	0.744009	Title of passenger	Miss	0.177685	Sex	female	0.153742	Ticket class	1st class	0.0950201	Passenger fare	7.25	-0.0879743

Classification - choose the predicted value and customize the summary¶

Configure summary: define the predicted label¶

You can change the ypred or the x given in add_input method to make new prediction and summary of your explanability.

[30]:

predictor_load.add_input(x=person_x, ypred=pd.DataFrame({"ypred": [0]}))

[31]:

predictor_load.modify_mask(max_contrib=3)

[32]:

explanation = predictor_load.summarize()

The displayed contributions and summary adapt to changing the predicted value of y_pred from 1 to 0.

[33]:

explanation

[33]:

	ypred	proba	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2	feature_3	value_3	contribution_3
0	Not Survived	0.255991	Title of passenger	Miss	-0.177685	Sex	female	-0.153742	Ticket class	1st class	-0.0950201

Configure summary: mask one feature, select positives contributions¶

The modify_mask method allows us to configure the summary parameters of your explainability.
Here, we hide some features from our explanability and only get the one which has positives contributions.

[34]:

predictor_load.modify_mask(features_to_hide=["Fare"], positive=True)

[35]:

explanation = predictor_load.summarize()

[36]:

explanation

[36]:

	ypred	proba	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2
0	Not Survived	0.255991	Age	36	0.0111338	Relatives like children or parents	0	0.00411547

Configure summary: the threshold parameter¶

We display features which has contributions greater than 0.01.

[37]:

predictor_load.modify_mask(threshold=0.01)

[38]:

explanation = predictor_load.summarize()

[39]:

explanation

[39]:

	ypred	proba	feature_1	value_1	contribution_1
0	Not Survived	0.255991	Age	36	0.0111338