SmartPredictor Object¶

The SmartPredictor object allows to:

compute predictions
configure summary of the local explanation
deploy interpretability of your model for operational needs

It can be used in API mode and batch mode.

class shapash.explainer.smart_predictor.SmartPredictor(features_dict, model, columns_dict, backend, features_types, label_dict=None, preprocessing=None, postprocessing=None, features_groups=None, mask_params=None)[source]¶

Bases: object

The SmartPredictor class is an object lighter than SmartExplainer Object with additionnal consistency checks.

The SmartPredictor object is provided to deploy the summary of local explanation for the operational needs.

Switching from SmartExplainer to SmartPredictor, allows users to reproduce the same results automatically on datasets with right structure.

SmartPredictor is designed to make new results understandable:

It checks consistency of all parameters
It applies preprocessing and postprocessing
It computes models contributions
It makes predictions
It summarizes local explainability

This class allows the user to automatically summarize the results of his model on new datasets (prediction, preprocessing and postprocessing linking, explainability). The SmartPredictor has several methods described below.

The SmartPredictor Attributes :

features_dict: dict: Dictionary mapping technical feature names to domain names.
model: model object: model used to check the different values of target estimate predict_proba
backend: str or backend object: backend (explainer) used to compute contributions
columns_dict: dict: Dictionary mapping integer column number (in the same order of the trained dataset) to technical feature names.
features_types: dict: Dictionary mapping features with the right types needed.
label_dict: dict (optional): Dictionary mapping integer labels to domain names (classification - target values).
preprocessing: category_encoders, ColumnTransformer, list or dict (optional): The processing apply to the original data.
postprocessing: dict (optional): Dictionary of postprocessing modifications to apply in x_init dataframe.
_case: string: String that informs if the model used is for classification or regression problem.
_classes: list, None: List of labels if the model used is for classification problem, None otherwise.
mask_params: dict (optional): Dictionary that specify how to summarize the explainability.

How to declare a new SmartPredictor object?

Example

>>> predictor = SmartPredictor(features_dict=my_features_dict,
>>>                             model=my_model,
>>>                             backend=my_backend,
>>>                             columns_dict=my_columns_dict,
>>>                             features_types=my_features_type_dict,
>>>                             label_dict=my_label_dict,
>>>                             preprocessing=my_preprocess,
>>>                             postprocessing=my_postprocess)

or the most common syntax

>>> predictor = xpl.to_smartpredictor()

xpl, explainer: object: SmartExplainer instance to point to.

add_input(x=None, ypred=None, contributions=None)[source]¶

The add_input method is the first step to add a dataset for prediction and explainability.

add_input applies to x parameter :

consistencies checks
preprocessing and postprocessing specified during the initialisation
features reordering with the right order for the model

If you don’t specify ypred or contributions, add_input compute them. It’s possible to not specified one parameter if it has already been defined before. For example, if the user want to specified an ypred without reinitialize the dataset x already defined. If the user declare a new input x, all the parameters stored will be cleaned.

Example

>>> predictor.add_input(x=xtest_df)
>>> predictor.add_input(ypred=ytest_df)

Parameters

x (dict, pandas.DataFrame (optional)) – Raw dataset used by the model to perform the prediction (not preprocessed).
ypred (pandas.DataFrame (optional)) – User-specified prediction values.
contributions (pandas.DataFrame (regression) or list (classification) (optional)) – local contributions aggregated if the preprocessing part requires it (e.g. one-hot encoding).

detail_contributions(contributions=None, use_groups=None)[source]¶

The detail_contributions method associates the right contributions with the right data predicted. (with ypred specified in add_input or computed automatically)

Parameters

contributions (object (optional)) – Local contributions, or list of local contributions.
use_groups (bool (optional)) – Whether or not to compute groups of features contributions.

Returns

A Dataset with ypred and the right associated contributions.

Return type

pandas.DataFrame

Example

>>> predictor.add_input(x=xtest_df)
>>> predictor.detail_contributions()

modify_mask(features_to_hide=None, threshold=None, positive=None, max_contrib=None)[source]¶

This method allows the users to modify the mask_params values. Each parameter is optional, modify_mask method modifies only the values specified in parameters.

This method has to be used to configure the summary displayed with summarize method.

Parameters

features_to_hide (list, optional (default: None)) – List of strings, containing features to hide.
threshold (float, optional (default: None)) – Absolute threshold below which any contribution is hidden.
positive (bool, optional (default: None)) – If True, hide negative values. False, hide positive values If None, hide nothing.
max_contrib (int, optional (default: None)) – Maximum number of contributions to show.

Examples

>>> predictor.modify_mask(max_contrib=1)
>>> summary_df = predictor.summarize()
>>> summary_df
        pred    proba       feature_1   value_1     contribution_1
0       0           0.756416    Sex             1.0             0.322308
1       3           0.628911    Sex             2.0             0.585475
2       0           0.543308    Sex             2.0             -0.486667

predict()[source]¶

The predict method compute the predicted values for each x row defined in add_input.

Returns: A dataset with predicted values for each x row.
Return type: pandas.DataFrame

Example

>>> predictor.add_input(x=xtest_df)
>>> predictor.predict()

predict_proba()[source]¶

The predict_proba compute the probabilities predicted for each x row defined in add_input.

Returns: A dataset with all probabilities of each label if there is no ypred data or a dataset with ypred and the associated probability.
Return type: pandas.DataFrame

Example

>>> predictor.add_input(x=xtest_df)
>>> predictor.predict_proba()

save(path)[source]¶

Save method allows users to save SmartPredictor object on disk using a pickle file. Save method can be useful: you don’t have to recompile to display results later.

Load_smartpredictor method allow to load your SmartPredictor object saved. (See example below)

Parameters: path (str) – File path to store the pickle file

Example

>>> predictor.save('path_to_pkl/predictor.pkl')
>>> from shapash.utils.load_smartpredictor import load_smartpredictor
>>> predictor_load = load_smartpredictor('path_to_pkl/predictor.pkl')

A sidecar manifest path + ".manifest.json" is written alongside the pickle with the shapash version, model framework version, and a schema fingerprint. load_smartpredictor uses it to detect version skew or schema tampering at load time. The pickle remains a valid standalone artifact: predictors saved by older versions of shapash without a manifest still load, with a DeprecationWarning.

summarize(use_groups=None)[source]¶

The summarize method allows to display the summary of local explainability. This method can be configured with modify_mask method to summarize the explainability to suit needs.

If the user doesn’t use modify_mask, the summarize method uses the mask_params parameters specified during the initialisation of the SmartPredictor.

In classification case, The summarize method summarizes the explainability which corresponds to :

the predicted values specified by the user or automatically computed (with add_input method)
the right probabilities from predict_proba associated to the right predicted values
the right contributions ranked and filtered as specify with modify_mask method

Parameters

use_groups (bool (optional)) – Whether or not to compute groups of features contributions.

Returns

selected explanation of each row for classification case

Return type

pandas.DataFrame

Examples

>>> summary_df = predictor.summarize()
>>> summary_df
        pred    proba       feature_1   value_1     contribution_1      feature_2       value_2     contribution_2
0       0           0.756416    Sex             1.0             0.322308            Pclass          3.0         0.155069
1       3           0.628911    Sex             2.0             0.585475            Pclass          1.0         0.370504
2       0           0.543308    Sex             2.0             -0.486667           Pclass          3.0         0.255072

>>> predictor.modify_mask(max_contrib=1)
>>> summary_df = predictor.summarize()
>>> summary_df
        pred    proba       feature_1   value_1     contribution_1
0       0           0.756416    Sex             1.0             0.322308
1       3           0.628911    Sex             2.0             0.585475
2       0           0.543308    Sex             2.0             -0.486667