Postprocessing parameter in compile method¶

Compile method is a method that creates the explainer you need for your model. This compile method has many parameters, and among those is postprocessing parameter, that will be explained in this tutorial. This parameter allows to modify the dataset with several techniques, for a better visualization. This tutorial presents the different way you can modify data, and the right syntax to do it.

Contents: - Loading dataset and fitting a model.

Creating our SmartExplainer and compiling it without postprocessing.
New SmartExplainer with postprocessing parameter.

Data from Kaggle: Titanic

[1]:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

Building Supervized Model¶

First step : Importing our dataset¶

[2]:

from shapash.data.data_loader import data_loading
titanic_df, titanic_dict = data_loading('titanic')
y_df=titanic_df['Survived']
X_df=titanic_df[titanic_df.columns.difference(['Survived'])]

[3]:

titanic_df.head()

[3]:

	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Fare	Embarked	Title
PassengerId
1	0	Third class	Braund Owen Harris	male	22.0	1	0	7.25	Southampton	Mr
2	1	First class	Cumings John Bradley (Florence Briggs Thayer)	female	38.0	1	0	71.28	Cherbourg	Mrs
3	1	Third class	Heikkinen Laina	female	26.0	0	0	7.92	Southampton	Miss
4	1	First class	Futrelle Jacques Heath (Lily May Peel)	female	35.0	1	0	53.10	Southampton	Mrs
5	0	Third class	Allen William Henry	male	35.0	0	0	8.05	Southampton	Mr

Second step : Encode our categorical variables¶

[4]:

from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(
    cols=categorical_features,
    handle_unknown='ignore',
    return_df=True).fit(X_df)

X_df = encoder.transform(X_df)

Third step : Train/test split and fitting our model¶

[5]:

Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)

[6]:

classifier = RandomForestClassifier(n_estimators=200).fit(Xtrain, ytrain)

[7]:

y_pred = pd.DataFrame(classifier.predict(Xtest), columns=['pred'], index=Xtest.index) # Predictions

Fourth step : Declaring our Explainer¶

[8]:

from shapash import SmartExplainer

[9]:

xpl = SmartExplainer(
    model=classifier,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    features_dict=titanic_dict   # Optional parameter, dict specifies label for features name
)

Compiling without postprocessing parameter¶

After declaring our explainer, we need to compile it on our model and data in order to have information.

[10]:

xpl.compile(x=Xtest, y_pred=y_pred)

Backend: Shap TreeExplainer

We can now use our explainer to understand model predictions, through plots or data. We also can find our original dataset, before preprocessing.

[11]:

xpl.x_init

[11]:

	Age	Embarked	Fare	Name	Parch	Pclass	Sex	SibSp	Title
PassengerId
863	48.0	Southampton	25.93	Swift Frederick Joel (Margaret Welles Barron)	0	First class	female	0	Mrs
224	29.5	Southampton	7.90	Nenkoff Christo	0	Third class	male	0	Mr
85	17.0	Southampton	10.50	Ilett Bertha	0	Second class	female	0	Miss
681	29.5	Queenstown	8.14	Peters Katie	0	Third class	female	0	Miss
536	7.0	Southampton	26.25	Hart Eva Miriam	2	Second class	female	0	Miss
...	...	...	...	...	...	...	...	...	...
507	33.0	Southampton	26.00	Quick Frederick Charles (Jane Richards)	2	Second class	female	0	Mrs
468	56.0	Southampton	26.55	Smart John Montgomery	0	First class	male	0	Mr
741	29.5	Southampton	30.00	Hawksford Walter James	0	First class	male	0	Mr
355	29.5	Cherbourg	7.22	Yousif Wazli	0	Third class	male	0	Mr
450	52.0	Southampton	30.50	Peuchen Arthur Godfrey	0	First class	male	0	Major

223 rows × 9 columns

All the analysis you can do is in this tutorial : Tutorial

Compiling with postprocessing parameter¶

Nevertheless, here we want to add postprocessing to our data to understand them better, and to have a better explicability.

The syntax for the postprocessing parameter is as follow :

postprocess = {
    'name_of_the_feature': {'type': 'type_of_modification', 'rule': 'rule_to_apply'},
    'second_name_of_features': {'type': 'type_of_modification', 'rule': 'rule_to_apply'},
    ...
}

You have five different types of modifications :

1. prefix : If you want to modify the beginning of the data. The syntax is

{'features_name': {'type': 'prefix',
                     'rule': 'Example : '}
}

1. suffix : If you want to add something at the end of some features, the syntax is similar :

{'features_name': {'type': 'suffix',
                     'rule': ' is an example'}
}

1. transcoding : This is a mapping function which modifies categorical variables. The syntax is :

{'features_name': {'type': 'transcoding',
                     'rule': {'old_name1': 'new_name1',
                              'old_name2': 'new_name2',
                              ...
                             }
                    }
}

If you don’t map all possible values, those values won’t be modified.

1. regex : If you want to modify strings, you can do it by regular expressions like this:

{'features_name': {'type': 'regex',
                     'rule': {'in': '^M',
                              'out': 'm'
                             }
                    }
}

1. case : If you want to change the case of a certain features, you can or change everything in lowercase with 'rule': 'lower', or change in uppercase with 'rule': 'upper'. The syntax is :

{'features_name': {'type': 'case',
                     'rule': 'upper'}

Of course, you don’t have to modify all features. Let’s give an example.

[12]:

postprocess = {
    'Age': {'type': 'suffix',
            'rule': ' years old' # Adding 'years old' at the end
           },
    'Sex': {'type': 'transcoding',
            'rule': {'male': 'Man',
                     'female': 'Woman'}
           },
    'Pclass': {'type': 'regex',
               'rule': {'in': ' class$',
                        'out': ''} # Deleting 'class' word at the end
              },
    'Fare': {'type': 'prefix',
             'rule': '$' # Adding $ at the beginning
            },
    'Embarked': {'type': 'case',
                 'rule': 'upper'
                }
}

You can now add this postprocess dict in parameter :

[13]:

xpl_postprocess = SmartExplainer(
    model=classifier,
    postprocessing=postprocess,
    preprocessing=encoder,       # Optional: compile step can use inverse_transform method
    features_dict=titanic_dict
)

[14]:

xpl_postprocess.compile(
    x=Xtest,
    y_pred=y_pred, # Optional
)

Backend: Shap TreeExplainer

You can now visualize your dataset, which is modified.

[15]:

xpl_postprocess.x_init

[15]:

	Age	Embarked	Fare	Name	Parch	Pclass	Sex	SibSp	Title
PassengerId
863	48.0 years old	SOUTHAMPTON	$25.93	Swift Frederick Joel (Margaret Welles Barron)	0	First	Woman	0	Mrs
224	29.5 years old	SOUTHAMPTON	$7.9	Nenkoff Christo	0	Third	Man	0	Mr
85	17.0 years old	SOUTHAMPTON	$10.5	Ilett Bertha	0	Second	Woman	0	Miss
681	29.5 years old	QUEENSTOWN	$8.14	Peters Katie	0	Third	Woman	0	Miss
536	7.0 years old	SOUTHAMPTON	$26.25	Hart Eva Miriam	2	Second	Woman	0	Miss
...	...	...	...	...	...	...	...	...	...
507	33.0 years old	SOUTHAMPTON	$26.0	Quick Frederick Charles (Jane Richards)	2	Second	Woman	0	Mrs
468	56.0 years old	SOUTHAMPTON	$26.55	Smart John Montgomery	0	First	Man	0	Mr
741	29.5 years old	SOUTHAMPTON	$30.0	Hawksford Walter James	0	First	Man	0	Mr
355	29.5 years old	CHERBOURG	$7.22	Yousif Wazli	0	Third	Man	0	Mr
450	52.0 years old	SOUTHAMPTON	$30.5	Peuchen Arthur Godfrey	0	First	Man	0	Major

223 rows × 9 columns

All the plots are also modified with the postprocessing modifications.

The main purpose of postprocessing modifications is a better understanding of the data, especially when the features names are not specified, such as in to_pandas() method, which orders the features depending on their importance.

[17]:

xpl_postprocess.to_pandas()

to_pandas params: {'features_to_hide': None, 'threshold': None, 'positive': None, 'max_contrib': 20}

[17]:

	pred	feature_1	value_1	contribution_1	feature_2	value_2	contribution_2	feature_3	value_3	contribution_3	...	contribution_6	feature_7	value_7	contribution_7	feature_8	value_8	contribution_8	feature_9	value_9	contribution_9
863	1	Title of passenger	Mrs	0.163479	Sex	Woman	0.154309	Ticket class	First	0.130221	...	0.0406219	Name, First name	Swift Frederick Joel (Margaret Welles Barron)	-0.0381955	Port of embarkation	SOUTHAMPTON	-0.0147327	Relatives like children or parents	0	-0.00538103
224	0	Title of passenger	Mr	0.094038	Sex	Man	0.0696282	Age	29.5 years old	0.0658556	...	0.0151605	Relatives such as brother or wife	0	-0.00855039	Relatives like children or parents	0	0.00124433	Name, First name	Nenkoff Christo	-0.000577095
85	1	Title of passenger	Miss	0.190529	Sex	Woman	0.135507	Ticket class	Second	0.0809714	...	-0.025286	Relatives like children or parents	0	-0.0238222	Relatives such as brother or wife	0	0.0209045	Age	17.0 years old	-0.00702283
681	1	Title of passenger	Miss	0.237477	Port of embarkation	QUEENSTOWN	0.143451	Sex	Woman	0.127931	...	0.0243567	Relatives like children or parents	0	0.0165205	Passenger fare	$8.14	-0.0109633	Age	29.5 years old	0.00327866
536	1	Title of passenger	Miss	0.210166	Ticket class	Second	0.168247	Sex	Woman	0.0876445	...	0.0147503	Relatives like children or parents	2	0.0125069	Port of embarkation	SOUTHAMPTON	-0.0119119	Name, First name	Hart Eva Miriam	0.00654165
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
507	1	Title of passenger	Mrs	0.215332	Sex	Woman	0.194419	Ticket class	Second	0.166437	...	-0.0079185	Relatives like children or parents	2	0.00407485	Age	33.0 years old	-0.00263589	Name, First name	Quick Frederick Charles (Jane Richards)	0.00162901
468	0	Sex	Man	0.100602	Passenger fare	$26.55	-0.099794	Title of passenger	Mr	0.0967768	...	0.0243706	Port of embarkation	SOUTHAMPTON	0.0124424	Relatives such as brother or wife	0	-0.0108301	Relatives like children or parents	0	-0.00332632
741	0	Title of passenger	Mr	0.131861	Sex	Man	0.110845	Age	29.5 years old	0.104878	...	0.0339308	Relatives such as brother or wife	0	-0.00715564	Name, First name	Hawksford Walter James	0.00165882	Relatives like children or parents	0	-0.00137946
355	0	Title of passenger	Mr	0.12679	Sex	Man	0.0933251	Age	29.5 years old	0.0717939	...	-0.0271103	Name, First name	Yousif Wazli	0.0163174	Relatives such as brother or wife	0	-0.0108501	Relatives like children or parents	0	-0.000543508
450	0	Sex	Man	0.13572	Title of passenger	Major	-0.0723023	Age	52.0 years old	0.0690373	...	0.027384	Relatives such as brother or wife	0	-0.0134144	Relatives like children or parents	0	0.00256623	Name, First name	Peuchen Arthur Godfrey	0.00229483

223 rows × 28 columns