Postprocessing parameter in compile method¶
Compile method is a method that creates the explainer you need for your model. This compile method has many parameters, and among those is postprocessing
parameter, that will be explained in this tutorial. This parameter allows to modify the dataset with several techniques, for a better visualization. This tutorial presents the different way you can modify data, and the right syntax to do it.
Contents: - Loading dataset and fitting a model.
Creating our SmartExplainer and compiling it without postprocessing.
New SmartExplainer with postprocessing parameter.
Data from Kaggle: Titanic
[1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
Building Supervized Model¶
First step : Importing our dataset¶
[2]:
from shapash.data.data_loader import data_loading
titanic_df, titanic_dict = data_loading('titanic')
y_df=titanic_df['Survived']
X_df=titanic_df[titanic_df.columns.difference(['Survived'])]
[3]:
titanic_df.head()
[3]:
Survived | Pclass | Name | Sex | Age | SibSp | Parch | Fare | Embarked | Title | |
---|---|---|---|---|---|---|---|---|---|---|
PassengerId | ||||||||||
1 | 0 | Third class | Braund Owen Harris | male | 22.0 | 1 | 0 | 7.25 | Southampton | Mr |
2 | 1 | First class | Cumings John Bradley (Florence Briggs Thayer) | female | 38.0 | 1 | 0 | 71.28 | Cherbourg | Mrs |
3 | 1 | Third class | Heikkinen Laina | female | 26.0 | 0 | 0 | 7.92 | Southampton | Miss |
4 | 1 | First class | Futrelle Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 53.10 | Southampton | Mrs |
5 | 0 | Third class | Allen William Henry | male | 35.0 | 0 | 0 | 8.05 | Southampton | Mr |
Second step : Encode our categorical variables¶
[4]:
from category_encoders import OrdinalEncoder
categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']
encoder = OrdinalEncoder(
cols=categorical_features,
handle_unknown='ignore',
return_df=True).fit(X_df)
X_df = encoder.transform(X_df)
Third step : Train/test split and fitting our model¶
[5]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)
[6]:
classifier = RandomForestClassifier(n_estimators=200).fit(Xtrain, ytrain)
[7]:
y_pred = pd.DataFrame(classifier.predict(Xtest), columns=['pred'], index=Xtest.index) # Predictions
Fourth step : Declaring our Explainer¶
[8]:
from shapash import SmartExplainer
[9]:
xpl = SmartExplainer(
model=classifier,
preprocessing=encoder, # Optional: compile step can use inverse_transform method
features_dict=titanic_dict # Optional parameter, dict specifies label for features name
)
Compiling without postprocessing parameter¶
After declaring our explainer, we need to compile it on our model and data in order to have information.
[10]:
xpl.compile(x=Xtest, y_pred=y_pred)
Backend: Shap TreeExplainer
We can now use our explainer to understand model predictions, through plots or data. We also can find our original dataset, before preprocessing.
[11]:
xpl.x_init
[11]:
Age | Embarked | Fare | Name | Parch | Pclass | Sex | SibSp | Title | |
---|---|---|---|---|---|---|---|---|---|
PassengerId | |||||||||
863 | 48.0 | Southampton | 25.93 | Swift Frederick Joel (Margaret Welles Barron) | 0 | First class | female | 0 | Mrs |
224 | 29.5 | Southampton | 7.90 | Nenkoff Christo | 0 | Third class | male | 0 | Mr |
85 | 17.0 | Southampton | 10.50 | Ilett Bertha | 0 | Second class | female | 0 | Miss |
681 | 29.5 | Queenstown | 8.14 | Peters Katie | 0 | Third class | female | 0 | Miss |
536 | 7.0 | Southampton | 26.25 | Hart Eva Miriam | 2 | Second class | female | 0 | Miss |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
507 | 33.0 | Southampton | 26.00 | Quick Frederick Charles (Jane Richards) | 2 | Second class | female | 0 | Mrs |
468 | 56.0 | Southampton | 26.55 | Smart John Montgomery | 0 | First class | male | 0 | Mr |
741 | 29.5 | Southampton | 30.00 | Hawksford Walter James | 0 | First class | male | 0 | Mr |
355 | 29.5 | Cherbourg | 7.22 | Yousif Wazli | 0 | Third class | male | 0 | Mr |
450 | 52.0 | Southampton | 30.50 | Peuchen Arthur Godfrey | 0 | First class | male | 0 | Major |
223 rows × 9 columns
All the analysis you can do is in this tutorial : Tutorial
Compiling with postprocessing parameter¶
Nevertheless, here we want to add postprocessing to our data to understand them better, and to have a better explicability.
The syntax for the postprocessing parameter is as follow :
postprocess = {
'name_of_the_feature': {'type': 'type_of_modification', 'rule': 'rule_to_apply'},
'second_name_of_features': {'type': 'type_of_modification', 'rule': 'rule_to_apply'},
...
}
You have five different types of modifications :
prefix : If you want to modify the beginning of the data. The syntax is
{'features_name': {'type': 'prefix',
'rule': 'Example : '}
}
suffix : If you want to add something at the end of some features, the syntax is similar :
{'features_name': {'type': 'suffix',
'rule': ' is an example'}
}
transcoding : This is a mapping function which modifies categorical variables. The syntax is :
{'features_name': {'type': 'transcoding',
'rule': {'old_name1': 'new_name1',
'old_name2': 'new_name2',
...
}
}
}
If you don’t map all possible values, those values won’t be modified.
regex : If you want to modify strings, you can do it by regular expressions like this:
{'features_name': {'type': 'regex',
'rule': {'in': '^M',
'out': 'm'
}
}
}
case : If you want to change the case of a certain features, you can or change everything in lowercase with
'rule': 'lower'
, or change in uppercase with'rule': 'upper'
. The syntax is :
{'features_name': {'type': 'case',
'rule': 'upper'}
Of course, you don’t have to modify all features. Let’s give an example.
[12]:
postprocess = {
'Age': {'type': 'suffix',
'rule': ' years old' # Adding 'years old' at the end
},
'Sex': {'type': 'transcoding',
'rule': {'male': 'Man',
'female': 'Woman'}
},
'Pclass': {'type': 'regex',
'rule': {'in': ' class$',
'out': ''} # Deleting 'class' word at the end
},
'Fare': {'type': 'prefix',
'rule': '$' # Adding $ at the beginning
},
'Embarked': {'type': 'case',
'rule': 'upper'
}
}
You can now add this postprocess dict in parameter :
[13]:
xpl_postprocess = SmartExplainer(
model=classifier,
postprocessing=postprocess,
preprocessing=encoder, # Optional: compile step can use inverse_transform method
features_dict=titanic_dict
)
[14]:
xpl_postprocess.compile(
x=Xtest,
y_pred=y_pred, # Optional
)
Backend: Shap TreeExplainer
You can now visualize your dataset, which is modified.
[15]:
xpl_postprocess.x_init
[15]:
Age | Embarked | Fare | Name | Parch | Pclass | Sex | SibSp | Title | |
---|---|---|---|---|---|---|---|---|---|
PassengerId | |||||||||
863 | 48.0 years old | SOUTHAMPTON | $25.93 | Swift Frederick Joel (Margaret Welles Barron) | 0 | First | Woman | 0 | Mrs |
224 | 29.5 years old | SOUTHAMPTON | $7.9 | Nenkoff Christo | 0 | Third | Man | 0 | Mr |
85 | 17.0 years old | SOUTHAMPTON | $10.5 | Ilett Bertha | 0 | Second | Woman | 0 | Miss |
681 | 29.5 years old | QUEENSTOWN | $8.14 | Peters Katie | 0 | Third | Woman | 0 | Miss |
536 | 7.0 years old | SOUTHAMPTON | $26.25 | Hart Eva Miriam | 2 | Second | Woman | 0 | Miss |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
507 | 33.0 years old | SOUTHAMPTON | $26.0 | Quick Frederick Charles (Jane Richards) | 2 | Second | Woman | 0 | Mrs |
468 | 56.0 years old | SOUTHAMPTON | $26.55 | Smart John Montgomery | 0 | First | Man | 0 | Mr |
741 | 29.5 years old | SOUTHAMPTON | $30.0 | Hawksford Walter James | 0 | First | Man | 0 | Mr |
355 | 29.5 years old | CHERBOURG | $7.22 | Yousif Wazli | 0 | Third | Man | 0 | Mr |
450 | 52.0 years old | SOUTHAMPTON | $30.5 | Peuchen Arthur Godfrey | 0 | First | Man | 0 | Major |
223 rows × 9 columns
All the plots are also modified with the postprocessing modifications.
The main purpose of postprocessing modifications is a better understanding of the data, especially when the features names are not specified, such as in to_pandas() method, which orders the features depending on their importance.
[17]:
xpl_postprocess.to_pandas()
to_pandas params: {'features_to_hide': None, 'threshold': None, 'positive': None, 'max_contrib': 20}
[17]:
pred | feature_1 | value_1 | contribution_1 | feature_2 | value_2 | contribution_2 | feature_3 | value_3 | contribution_3 | ... | contribution_6 | feature_7 | value_7 | contribution_7 | feature_8 | value_8 | contribution_8 | feature_9 | value_9 | contribution_9 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
863 | 1 | Title of passenger | Mrs | 0.163479 | Sex | Woman | 0.154309 | Ticket class | First | 0.130221 | ... | 0.0406219 | Name, First name | Swift Frederick Joel (Margaret Welles Barron) | -0.0381955 | Port of embarkation | SOUTHAMPTON | -0.0147327 | Relatives like children or parents | 0 | -0.00538103 |
224 | 0 | Title of passenger | Mr | 0.094038 | Sex | Man | 0.0696282 | Age | 29.5 years old | 0.0658556 | ... | 0.0151605 | Relatives such as brother or wife | 0 | -0.00855039 | Relatives like children or parents | 0 | 0.00124433 | Name, First name | Nenkoff Christo | -0.000577095 |
85 | 1 | Title of passenger | Miss | 0.190529 | Sex | Woman | 0.135507 | Ticket class | Second | 0.0809714 | ... | -0.025286 | Relatives like children or parents | 0 | -0.0238222 | Relatives such as brother or wife | 0 | 0.0209045 | Age | 17.0 years old | -0.00702283 |
681 | 1 | Title of passenger | Miss | 0.237477 | Port of embarkation | QUEENSTOWN | 0.143451 | Sex | Woman | 0.127931 | ... | 0.0243567 | Relatives like children or parents | 0 | 0.0165205 | Passenger fare | $8.14 | -0.0109633 | Age | 29.5 years old | 0.00327866 |
536 | 1 | Title of passenger | Miss | 0.210166 | Ticket class | Second | 0.168247 | Sex | Woman | 0.0876445 | ... | 0.0147503 | Relatives like children or parents | 2 | 0.0125069 | Port of embarkation | SOUTHAMPTON | -0.0119119 | Name, First name | Hart Eva Miriam | 0.00654165 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
507 | 1 | Title of passenger | Mrs | 0.215332 | Sex | Woman | 0.194419 | Ticket class | Second | 0.166437 | ... | -0.0079185 | Relatives like children or parents | 2 | 0.00407485 | Age | 33.0 years old | -0.00263589 | Name, First name | Quick Frederick Charles (Jane Richards) | 0.00162901 |
468 | 0 | Sex | Man | 0.100602 | Passenger fare | $26.55 | -0.099794 | Title of passenger | Mr | 0.0967768 | ... | 0.0243706 | Port of embarkation | SOUTHAMPTON | 0.0124424 | Relatives such as brother or wife | 0 | -0.0108301 | Relatives like children or parents | 0 | -0.00332632 |
741 | 0 | Title of passenger | Mr | 0.131861 | Sex | Man | 0.110845 | Age | 29.5 years old | 0.104878 | ... | 0.0339308 | Relatives such as brother or wife | 0 | -0.00715564 | Name, First name | Hawksford Walter James | 0.00165882 | Relatives like children or parents | 0 | -0.00137946 |
355 | 0 | Title of passenger | Mr | 0.12679 | Sex | Man | 0.0933251 | Age | 29.5 years old | 0.0717939 | ... | -0.0271103 | Name, First name | Yousif Wazli | 0.0163174 | Relatives such as brother or wife | 0 | -0.0108501 | Relatives like children or parents | 0 | -0.000543508 |
450 | 0 | Sex | Man | 0.13572 | Title of passenger | Major | -0.0723023 | Age | 52.0 years old | 0.0690373 | ... | 0.027384 | Relatives such as brother or wife | 0 | -0.0134144 | Relatives like children or parents | 0 | 0.00256623 | Name, First name | Peuchen Arthur Godfrey | 0.00229483 |
223 rows × 28 columns