From model training to deployment - an introduction to the SmartPredictor object¶
Shapash provides a SmartPredictor Object to make prediction and local explainability for operational needs in deployment context. It gives a summary of the local explanation of your prediction. SmartPredictor allows users to configure the summary to suit their use. It is an object dedicated to deployment, lighter than SmartExplainer Object with additionnal consistency checks. SmartPredictor can be used with an API or in batch mode.
This tutorial provides more information to help you getting started with the SmartPredictor Object of Shapash.
Contents: - Build a SmartPredictor - Save and Load a Smartpredictor - Add input - Use label and wording - Summarize explaination
We used Kaggle’s Titanic dataset
Step 1: Exploration and training of the model¶
Import Dataset¶
First, we need to import a dataset. Here we chose the famous dataset Titanic from Kaggle.
[1]:
import numpy as np
import pandas as pd
from category_encoders import OrdinalEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import shap
[2]:
from shapash.explainer.smart_predictor import SmartPredictor
from shapash.utils.load_smartpredictor import load_smartpredictor
from shapash.data.data_loader import data_loading
[3]:
titan_df, titan_dict = data_loading('titanic')
del titan_df['Name']
[4]:
titan_df.head()
[4]:
Survived | Pclass | Sex | Age | SibSp | Parch | Fare | Embarked | Title | |
---|---|---|---|---|---|---|---|---|---|
PassengerId | |||||||||
1 | 0 | Third class | male | 22.0 | 1 | 0 | 7.25 | Southampton | Mr |
2 | 1 | First class | female | 38.0 | 1 | 0 | 71.28 | Cherbourg | Mrs |
3 | 1 | Third class | female | 26.0 | 0 | 0 | 7.92 | Southampton | Miss |
4 | 1 | First class | female | 35.0 | 1 | 0 | 53.10 | Southampton | Mrs |
5 | 0 | Third class | male | 35.0 | 0 | 0 | 8.05 | Southampton | Mr |
Create Classification Model¶
In this section, we train a Machine Learning supervized model with our data. In our example, we are confronted to a classification problem.
[5]:
y = titan_df['Survived']
X = titan_df.drop('Survived', axis=1)
[6]:
varcat=['Pclass', 'Sex', 'Embarked', 'Title']
Preprocessing Step¶
Encoding Categorical Features
[7]:
categ_encoding = OrdinalEncoder(cols=varcat, \
handle_unknown='ignore', \
return_df=True).fit(X)
X = categ_encoding.transform(X)
Train Test split + Random Forest fit¶
[8]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.75, random_state=1)
rf = RandomForestClassifier(n_estimators=100, min_samples_leaf=3)
rf.fit(Xtrain, ytrain)
[8]:
RandomForestClassifier(min_samples_leaf=3)
[9]:
ypred=pd.DataFrame(rf.predict(Xtest), columns=['pred'], index=Xtest.index)
Explore your trained model results Step with SmartExplainer¶
[10]:
from shapash import SmartExplainer
Use Label and Wording¶
Here, we use labels and wording to get a more understandable explainabily. - features_dict : allow users to rename features of their datasets - label_dict : allow users in classification problems to rename label predicted - postprocessing : allow users to apply some wording to the features wanted
[11]:
feature_dict = {
'Pclass': 'Ticket class',
'Sex': 'Sex',
'Age': 'Age',
'SibSp': 'Relatives such as brother or wife',
'Parch': 'Relatives like children or parents',
'Fare': 'Passenger fare',
'Embarked': 'Port of embarkation',
'Title': 'Title of passenger'
}
[12]:
label_dict = {0: "Not Survived", 1: "Survived"}
[13]:
postprocessing = {"Pclass": {'type': 'transcoding', 'rule': { 'First class': '1st class', 'Second class': '2nd class', "Third class": "3rd class"}}}
Define a SmartExplainer¶
[14]:
xpl = SmartExplainer(
model=rf,
preprocessing=categ_encoding,
postprocessing=postprocessing,
label_dict=label_dict,
features_dict=feature_dict
)
compile() This method is the first step to understand model and prediction. It performs the sorting of contributions, the reverse preprocessing steps and all the calculations necessary for a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)
[15]:
xpl.compile(x=Xtest, y_pred=ypred)
Backend: Shap TreeExplainer
Understand results of your trained model with SmartExplainer¶
We can easily get a first summary of the explanation of the model results. - We choose to get the 3 most contributive features for each prediction. - We use a wording to get features names more understandable in operationnal case. - We rename the predicted label to show a more explicit prediction. - We apply a post-processing to transform some feature’s values.
[16]:
xpl.to_pandas(max_contrib=3).head()
[16]:
pred | feature_1 | value_1 | contribution_1 | feature_2 | value_2 | contribution_2 | feature_3 | value_3 | contribution_3 | |
---|---|---|---|---|---|---|---|---|---|---|
863 | Survived | Title of passenger | Mrs | 0.170394 | Sex | female | 0.168492 | Ticket class | 1st class | 0.110185 |
224 | Not Survived | Title of passenger | Mr | 0.0913801 | Sex | male | 0.0835547 | Passenger fare | 7.9 | 0.0654677 |
85 | Survived | Title of passenger | Miss | 0.204571 | Sex | female | 0.169965 | Ticket class | 2nd class | 0.102774 |
681 | Survived | Title of passenger | Miss | 0.193106 | Sex | female | 0.153766 | Port of embarkation | Queenstown | 0.13015 |
536 | Survived | Title of passenger | Miss | 0.206245 | Ticket class | 2nd class | 0.128066 | Sex | female | 0.112756 |
Step 2: SmartPredictor in production¶
to_smartpredictor() - It allows users to switch from a SmartExplainer used for data mining to the SmartPredictor. - It keeps the attributes needed for deployment to be lighter than the SmartExplainer object. - Smartpredictor performs additional consistency checks before deployment. - This object is dedicated to the deployment.
In this section, we learn how to initialize a SmartPredictor. - It makes new predictions and summarize explainability that you configured to make it operational to your needs. - SmartPredictor can be used with API or in batch mode. - It handles dataframes and dictionnaries input data.
Switch from SmartExplainer Object to SmartPredictor Object¶
[17]:
predictor = xpl.to_smartpredictor()
Save your predictor in Pickle File¶
[18]:
predictor.save('./predictor.pkl')
Load your predictor in Pickle File¶
[19]:
predictor_load = load_smartpredictor('./predictor.pkl')
Make a prediction with your SmartPredictor¶
Once our SmartPredictor has been initialized, we can compute new predictions and explain them.
First, we specify a new dataset which can be a pandas.DataFrame or a dictionnary. (usefull when you decide to use an API in your deployment process)
We use the add_input method of the SmartPredictor. (see the documentation of this method)
Add data¶
[20]:
person_x = {'Pclass': 'First class',
'Sex': 'female',
'Age': 36,
'SibSp': 1,
'Parch': 0,
'Fare': 7.25,
'Embarked': 'Cherbourg',
'Title': 'Miss'
}
[21]:
predictor_load.add_input(x=person_x)
If you don’t specify an ypred in the add_input method, SmartPredictor use its predict method to automatically affect the predicted value to ypred.
Make prediction¶
Let’s display ypred which has been automatically computed in add_input method.
[22]:
predictor_load.data["ypred"]
[22]:
ypred | proba | |
---|---|---|
0 | Survived | 0.744009 |
The predict_proba method of Smartpredictor computes the probabilties associated to each label.
[23]:
prediction_proba = predictor_load.predict_proba()
[24]:
prediction_proba
[24]:
class_0 | class_1 | |
---|---|---|
0 | 0.255991 | 0.744009 |
Get detailed explanability associated to the prediction¶
You can use the method detail_contributions for detailed contributions of each of your features for each row of your new dataset.
For classification problems, it automatically associates contributions with the right predicted label.
The predicted label are computed automatically or you can specify an ypred with add_input method.
[25]:
detailed_contributions = predictor_load.detail_contributions()
The ypred has already been renamed with the value that we’ve given in the label_dict.
[26]:
detailed_contributions
[26]:
ypred | proba | Pclass | Sex | Age | SibSp | Parch | Fare | Embarked | Title | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Survived | 0.744009 | 0.0950201 | 0.153742 | -0.0111338 | 0.0192229 | -0.00411547 | -0.0879743 | 0.0316235 | 0.177685 |
Summarize explanability of the predictions¶
You can use the summarize method to summarize your local explainability.
This summary can be configured with the modify_mask method to suit your use case.
When you initialize the SmartPredictor, you can also specify : >- postprocessing: to apply a wording to several values of your dataset. >- label_dict: to rename your label for classification problems. >- features_dict: to rename your features.
We use modify_mask method to only get the 4 most contributives features in our local summary.
[27]:
predictor_load.modify_mask(max_contrib=4)
[28]:
explanation = predictor_load.summarize()
The dictionnary of mapping given to the SmartExplainer Object allows us to rename the ‘Title’ feature into ‘Title of passenger’.
The value of this features has been worded correctly: ‘First class’ became ‘1st class’.
Our explanability is focused on the 4 most contributive features.
[29]:
explanation
[29]:
ypred | proba | feature_1 | value_1 | contribution_1 | feature_2 | value_2 | contribution_2 | feature_3 | value_3 | contribution_3 | feature_4 | value_4 | contribution_4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Survived | 0.744009 | Title of passenger | Miss | 0.177685 | Sex | female | 0.153742 | Ticket class | 1st class | 0.0950201 | Passenger fare | 7.25 | -0.0879743 |
Classification - choose the predicted value and customize the summary¶
Configure summary: define the predicted label¶
You can change the ypred or the x given in add_input method to make new prediction and summary of your explanability.
[30]:
predictor_load.add_input(x=person_x, ypred=pd.DataFrame({"ypred": [0]}))
[31]:
predictor_load.modify_mask(max_contrib=3)
[32]:
explanation = predictor_load.summarize()
The displayed contributions and summary adapt to changing the predicted value of y_pred from 1 to 0.
[33]:
explanation
[33]:
ypred | proba | feature_1 | value_1 | contribution_1 | feature_2 | value_2 | contribution_2 | feature_3 | value_3 | contribution_3 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Not Survived | 0.255991 | Title of passenger | Miss | -0.177685 | Sex | female | -0.153742 | Ticket class | 1st class | -0.0950201 |
Configure summary: mask one feature, select positives contributions¶
The modify_mask method allows us to configure the summary parameters of your explainability.
Here, we hide some features from our explanability and only get the one which has positives contributions.
[34]:
predictor_load.modify_mask(features_to_hide=["Fare"], positive=True)
[35]:
explanation = predictor_load.summarize()
[36]:
explanation
[36]:
ypred | proba | feature_1 | value_1 | contribution_1 | feature_2 | value_2 | contribution_2 | |
---|---|---|---|---|---|---|---|---|
0 | Not Survived | 0.255991 | Age | 36 | 0.0111338 | Relatives like children or parents | 0 | 0.00411547 |
Configure summary: the threshold parameter¶
We display features which has contributions greater than 0.01.
[37]:
predictor_load.modify_mask(threshold=0.01)
[38]:
explanation = predictor_load.summarize()
[39]:
explanation
[39]:
ypred | proba | feature_1 | value_1 | contribution_1 | |
---|---|---|---|---|---|
0 | Not Survived | 0.255991 | Age | 36 | 0.0111338 |