SmartPredictor Object

The SmartPredictor object allows to:
  • compute predictions

  • configure summary of the local explanation

  • deploy interpretability of your model for operational needs

It can be used in API mode and batch mode.

class shapash.explainer.smart_predictor.SmartPredictor(features_dict, model, columns_dict, backend, features_types, label_dict=None, preprocessing=None, postprocessing=None, features_groups=None, mask_params=None)[source]

Bases: object

The SmartPredictor class is an object lighter than SmartExplainer Object with additionnal consistency checks.

The SmartPredictor object is provided to deploy the summary of local explanation for the operational needs.

Switching from SmartExplainer to SmartPredictor, allows users to reproduce the same results automatically on datasets with right structure.

SmartPredictor is designed to make new results understandable:
  • It checks consistency of all parameters

  • It applies preprocessing and postprocessing

  • It computes models contributions

  • It makes predictions

  • It summarizes local explainability

This class allows the user to automatically summarize the results of his model on new datasets (prediction, preprocessing and postprocessing linking, explainability). The SmartPredictor has several methods described below.

The SmartPredictor Attributes :

features_dict: dict

Dictionary mapping technical feature names to domain names.

model: model object

model used to check the different values of target estimate predict_proba

backend: str or backend object

backend (explainer) used to compute contributions

columns_dict: dict

Dictionary mapping integer column number (in the same order of the trained dataset) to technical feature names.

features_types: dict

Dictionary mapping features with the right types needed.

label_dict: dict (optional)

Dictionary mapping integer labels to domain names (classification - target values).

preprocessing: category_encoders, ColumnTransformer, list or dict (optional)

The processing apply to the original data.

postprocessing: dict (optional)

Dictionary of postprocessing modifications to apply in x_init dataframe.

_case: string

String that informs if the model used is for classification or regression problem.

_classes: list, None

List of labels if the model used is for classification problem, None otherwise.

mask_params: dict (optional)

Dictionary that specify how to summarize the explainability.

How to declare a new SmartPredictor object?

Example

>>> predictor = SmartPredictor(features_dict=my_features_dict,
>>>                             model=my_model,
>>>                             backend=my_backend,
>>>                             columns_dict=my_columns_dict,
>>>                             features_types=my_features_type_dict,
>>>                             label_dict=my_label_dict,
>>>                             preprocessing=my_preprocess,
>>>                             postprocessing=my_postprocess)

or the most common syntax

>>> predictor = xpl.to_smartpredictor()
xpl, explainer: object

SmartExplainer instance to point to.

add_input(x=None, ypred=None, contributions=None)[source]

The add_input method is the first step to add a dataset for prediction and explainability.

add_input applies to x parameter :
  • consistencies checks

  • preprocessing and postprocessing specified during the initialisation

  • features reordering with the right order for the model

If you don’t specify ypred or contributions, add_input compute them. It’s possible to not specified one parameter if it has already been defined before. For example, if the user want to specified an ypred without reinitialize the dataset x already defined. If the user declare a new input x, all the parameters stored will be cleaned.

Example

>>> predictor.add_input(x=xtest_df)
>>> predictor.add_input(ypred=ytest_df)
Parameters
  • x (dict, pandas.DataFrame (optional)) – Raw dataset used by the model to perform the prediction (not preprocessed).

  • ypred (pandas.DataFrame (optional)) – User-specified prediction values.

  • contributions (pandas.DataFrame (regression) or list (classification) (optional)) – local contributions aggregated if the preprocessing part requires it (e.g. one-hot encoding).

detail_contributions(contributions=None, use_groups=None)[source]

The detail_contributions method associates the right contributions with the right data predicted. (with ypred specified in add_input or computed automatically)

Parameters
  • contributions (object (optional)) – Local contributions, or list of local contributions.

  • use_groups (bool (optional)) – Whether or not to compute groups of features contributions.

Returns

A Dataset with ypred and the right associated contributions.

Return type

pandas.DataFrame

Example

>>> predictor.add_input(x=xtest_df)
>>> predictor.detail_contributions()
modify_mask(features_to_hide=None, threshold=None, positive=None, max_contrib=None)[source]

This method allows the users to modify the mask_params values. Each parameter is optional, modify_mask method modifies only the values specified in parameters.

This method has to be used to configure the summary displayed with summarize method.

Parameters
  • features_to_hide (list, optional (default: None)) – List of strings, containing features to hide.

  • threshold (float, optional (default: None)) – Absolute threshold below which any contribution is hidden.

  • positive (bool, optional (default: None)) – If True, hide negative values. False, hide positive values If None, hide nothing.

  • max_contrib (int, optional (default: None)) – Maximum number of contributions to show.

Examples

>>> predictor.modify_mask(max_contrib=1)
>>> summary_df = predictor.summarize()
>>> summary_df
        pred    proba       feature_1   value_1     contribution_1
0       0           0.756416    Sex             1.0             0.322308
1       3           0.628911    Sex             2.0             0.585475
2       0           0.543308    Sex             2.0             -0.486667
predict()[source]

The predict method compute the predicted values for each x row defined in add_input.

Returns

A dataset with predicted values for each x row.

Return type

pandas.DataFrame

Example

>>> predictor.add_input(x=xtest_df)
>>> predictor.predict()
predict_proba()[source]

The predict_proba compute the probabilities predicted for each x row defined in add_input.

Returns

A dataset with all probabilities of each label if there is no ypred data or a dataset with ypred and the associated probability.

Return type

pandas.DataFrame

Example

>>> predictor.add_input(x=xtest_df)
>>> predictor.predict_proba()
save(path)[source]

Save method allows users to save SmartPredictor object on disk using a pickle file. Save method can be useful: you don’t have to recompile to display results later.

Load_smartpredictor method allow to load your SmartPredictor object saved. (See example below)

Parameters

path (str) – File path to store the pickle file

Example

>>> predictor.save('path_to_pkl/predictor.pkl')
>>> from shapash.utils.load_smartpredictor import load_smartpredictor
>>> predictor_load = load_smartpredictor('path_to_pkl/predictor.pkl')
summarize(use_groups=None)[source]

The summarize method allows to display the summary of local explainability. This method can be configured with modify_mask method to summarize the explainability to suit needs.

If the user doesn’t use modify_mask, the summarize method uses the mask_params parameters specified during the initialisation of the SmartPredictor.

In classification case, The summarize method summarizes the explainability which corresponds to :
  • the predicted values specified by the user or automatically computed (with add_input method)

  • the right probabilities from predict_proba associated to the right predicted values

  • the right contributions ranked and filtered as specify with modify_mask method

Parameters

use_groups (bool (optional)) – Whether or not to compute groups of features contributions.

Returns

  • selected explanation of each row for classification case

Return type

pandas.DataFrame

Examples

>>> summary_df = predictor.summarize()
>>> summary_df
        pred    proba       feature_1   value_1     contribution_1      feature_2       value_2     contribution_2
0       0           0.756416    Sex             1.0             0.322308            Pclass          3.0         0.155069
1       3           0.628911    Sex             2.0             0.585475            Pclass          1.0         0.370504
2       0           0.543308    Sex             2.0             -0.486667           Pclass          3.0         0.255072
>>> predictor.modify_mask(max_contrib=1)
>>> summary_df = predictor.summarize()
>>> summary_df
        pred    proba       feature_1   value_1     contribution_1
0       0           0.756416    Sex             1.0             0.322308
1       3           0.628911    Sex             2.0             0.585475
2       0           0.543308    Sex             2.0             -0.486667