SmartExplainer Object

SmartExplainer declaration and data preparation

class shapash.explainer.smart_explainer.SmartExplainer(model, backend='shap', preprocessing=None, postprocessing=None, features_groups=None, features_dict=None, label_dict=None, title_story: Optional[str] = None, palette_name=None, colors_dict=None, **backend_kwargs)[source]

Bases: object

The SmartExplainer class is the main object of the Shapash library. It allows the Data Scientists to perform many operations to make the results more understandable : linking encoders, models, predictions, label dict and datasets. SmartExplainer users have several methods which are described below. :param model: model used to consistency check. model object can also be used by some method to compute

predict and predict_proba values

Parameters
  • backend (str or shapash.backend object (default: 'shap')) – Select which computation method to use in order to compute contributions and feature importance. Possible values are ‘shap’ or ‘lime’. Default is ‘shap’. It is also possible to pass a backend class inherited from shpash.backend.BaseBackend.

  • preprocessing (category_encoders, ColumnTransformer, list, dict, optional (default: None)) – –> Differents types of preprocessing are available: - A single category_encoders (OrdinalEncoder/OnehotEncoder/BaseNEncoder/BinaryEncoder/TargetEncoder) - A single ColumnTransformer with scikit-learn encoding or category_encoders transformers - A list with multiple category_encoders with optional (dict, list of dict) - A list with a single ColumnTransformer with optional (dict, list of dict) - A dict - A list of dict

  • postprocessing (dict, optional (default: None)) –

    Dictionnary of postprocessing modifications to apply in x_init dataframe. Dictionnary with feature names as keys (or number, or well labels referencing to features names), which modifies dataset features by features. –> Different types of postprocessing are available, but the syntax is this one: One key by features, 5 different types of modifications:

    features_groups : dict, optional (default: None)

    Dictionnary containing features that should be grouped together. This option allows to compute and display the contributions and importance of this group of features. Features that are grouped together will still be displayed in the webapp when clicking on a group. >>> { ‘feature1’ : { ‘type’ : ‘prefix’, ‘rule’ : ‘age: ‘ }, ‘feature2’ : { ‘type’ : ‘suffix’, ‘rule’ : ‘$/week ‘ }, ‘feature3’ : { ‘type’ : ‘transcoding’, ‘rule‘: { ‘code1’ : ‘single’, ‘code2’ : ‘married’}}, ‘feature4’ : { ‘type’ : ‘regex’ , ‘rule‘: { ‘in’ : ‘AND’, ‘out’ : ‘ & ‘ }}, ‘feature5’ : { ‘type’ : ‘case’ , ‘rule‘: ‘lower’‘ } } Only one transformation by features is possible.

  • features_groups (dict, optional (default: None)) – Dictionnary containing features that should be grouped together. This option allows to compute and display the contributions and importance of this group of features. Features that are grouped together will still be displayed in the webapp when clicking on a group. >>> { ‘feature_group_1’ : [‘feature3’, ‘feature7’, ‘feature24’], ‘feature_group_2’ : [‘feature1’, ‘feature12’], }

  • features_dict (dict) – Dictionary mapping technical feature names to domain names.

  • label_dict (dict) – Dictionary mapping integer labels to domain names (classification - target values).

  • title_story (str (default: None)) – The default title is empty. You can specify a custom title which can be used the webapp, or other methods

  • palette_name (str) – Name of the palette used for the colors of the report (refer to style folder).

  • colors_dic (dict) – dictionnary contaning every palettes of colors. You can use this parameter to change any color of the graphs.

  • **backend_kwargs (dict) – Keyword parameters to be passed to the backend.

data

Data dictionary has 3 entries. Each key returns a pd.DataFrame (regression) or a list of pd.DataFrame (classification - The length of the lists is equivalent to the number of labels). All pd.DataFrame have she same shape (n_samples, n_features). For the regression case, data that should be regarded as a single array of size (n_samples, n_features, 3). data[‘contrib_sorted’]: pandas.DataFrame (regression) or list of pandas.DataFrame (classification)

Contains local contributions of the prediction set, with common line index. Columns are ‘contrib_1’, ‘contrib_2’, … and contains the top contributions for each line from left to right. In multi-class problems, this is a list of contributions, one for each class.

data[‘var_dict’]: pandas.DataFrame (regression) or list of pandas.DataFrame (classification)

Must contain only ints. It gives, for each line, the list of most import features regarding the local decomposition. In order to save space, columns are denoted by integers, the conversion being done with the columns_dict member. In multi-class problems, this is a list of dataframes, one for each class.

data[‘x_sorted’]: pandas.DataFrame (regression) or list of pandas.DataFrame (classification)

It gives, for each line, the list of most important features values regarding the local decomposition. These values can only be understood with respect to data[‘var_dict’]

Type

dict

backend_name

backend name if backend passed is a string

x_encoded

preprocessed dataset used by the model to perform the prediction.

Type

pandas.DataFrame

x_init

x_encoded dataset with inverse transformation with eventual postprocessing modifications.

Type

pandas.DataFrame

x_contrib_plot

x_encoded dataset with inverse transformation, without postprocessing used for contribution_plot.

Type

pandas.DataFrame

y_pred

User-specified prediction values.

Type

pandas.DataFrame

contributions

local contributions aggregated if the preprocessing part requires it (e.g. one-hot encoding).

Type

pandas.DataFrame (regression) or list (classification)

features_dict

Dictionary mapping technical feature names to domain names.

Type

dict

inv_features_dict

Inverse features_dict mapping.

Type

dict

label_dict

Dictionary mapping integer labels to domain names (classification - target values).

Type

dict

inv_label_dict

Inverse label_dict mapping.

Type

dict

columns_dict

Dictionary mapping integer column number to technical feature names.

Type

dict

plot

Helper object containing all plotting functions (Bridge pattern).

Type

object

model

model used to check the different values of target estimate predict proba

Type

model object

features_desc

Dictionary that references the numbers of feature values ​​in the x_init

Type

dict

features_imp

Features importance values

Type

pandas.Series (regression) or list (classification)

local_neighbors

Dictionary of values to be displayed on the local_neighbors plot. The key is “norm_shap (normalized contributions values of instance and neighbors)

Type

dict

features_stability

Dictionary of arrays to be displayed on the stability plot. The keys are “amplitude” (average contributions values for selected instances) and “stability” (stability metric across neighborhood)

Type

dict

preprocessing

The processing apply to the original data.

Type

category_encoders, ColumnTransformer, list or dict

postprocessing

Dictionnary of postprocessing modifications to apply in x_init dataframe.

Type

dict

y_target

Target values

Type

pandas.Series or pandas.DataFrame, optional (default: None)

Example

>>> xpl = SmartExplainer(model, features_dict=featd,label_dict=labeld)
>>> xpl.compile(x=x_encoded, y_target=y)
>>> xpl.plot.features_importance()
add(y_pred=None, y_target=None, label_dict=None, features_dict=None, title_story: Optional[str] = None, additional_data=None, additional_features_dict=None)[source]

add method allows the user to add a label_dict, features_dict or y_pred without compiling again (and it can last a few moments). y_pred can be used in the plot to color scatter. y_pred is needed in the to_pandas method. label_dict and features_dict displays allow to display clearer results. :param y_pred: Prediction values (1 column only).

The index must be identical to the index of x_init.

Parameters
  • label_dict (dict, optional (default: None)) – Dictionary mapping integer labels to domain names.

  • features_dict (dict, optional (default: None)) – Dictionary mapping technical feature names to domain names.

  • title_story (str (default: None)) – The default title is empty. You can specify a custom title which can be used the webapp, or other methods

  • y_target (pandas.Series or pandas.DataFrame, optional (default: None)) – Target values (1 column only). The index must be identical to the index of x_init. This is an interesting parameter for outputs on prediction

  • additional_data (pandas.DataFrame, optional (default: None)) – Additional dataset of features outsite the model. The index must be identical to the index of x_init. This is an interesting parameter for visualisation and filtering in Shapash SmartApp.

  • additional_features_dict (dict) – Dictionary mapping technical feature names to domain names for additional data.

compile(x, contributions=None, y_pred=None, y_target=None, additional_data=None, additional_features_dict=None)[source]

The compile method is the first step to understand model and prediction. It performs the sorting of contributions, the reverse preprocessing steps and performs all the calculations necessary for a quick display of plots and efficient display of summary of explanation. This step can last a few moments with large datasets. :param x: Prediction set.

IMPORTANT: this should be the raw prediction set, whose values are seen by the end user. x is a preprocessed dataset: Shapash can apply the model to it

Parameters
  • contributions (pandas.DataFrame, np.ndarray or list) – single or multiple contributions (multi-class) to handle. if pandas.Dataframe, the index and columns should be share with the prediction set. if np.ndarray, index and columns will be generated according to x dataset

  • y_pred (pandas.Series or pandas.DataFrame, optional (default: None)) – Prediction values (1 column only). The index must be identical to the index of x_init. This is an interesting parameter for more explicit outputs. Shapash lets users define their own predict, as they may wish to set their own threshold (classification)

  • y_target (pandas.Series or pandas.DataFrame, optional (default: None)) – Target values (1 column only). The index must be identical to the index of x_init. This is an interesting parameter for outputs on prediction

  • additional_data (pandas.DataFrame, optional (default: None)) – Additional dataset of features outsite the model. The index must be identical to the index of x_init. This is an interesting parameter for visualisation and filtering in Shapash SmartApp.

  • additional_features_dict (dict) – Dictionary mapping technical feature names to domain names for additional data.

Example

>>> xpl.compile(x=x_test)
filter(features_to_hide=None, threshold=None, positive=None, max_contrib=None, display_groups=None)[source]

The filter method is an important method which allows to summarize the local explainability by using the user defined parameters which correspond to its use case. Filter method is used with the local_plot method of Smarplotter to see the concrete result of this summary with a local contribution barchart Please, watch the local_plot tutorial to see how these two methods are combined with a concrete example :param features_to_hide: List of strings, containing features to hide. :type features_to_hide: list, optional (default: None) :param threshold: Absolute threshold below which any contribution is hidden. :type threshold: float, optional (default: None) :param positive: If True, hide negative values. False, hide positive values

If None, hide nothing.

Parameters
  • max_contrib (int, optional (default: None)) – Maximum number of contributions to show.

  • display_groups (bool (default: None)) – Whether or not to display groups of features. This option is only useful if groups of features are declared when compiling SmartExplainer object.

generate_report(output_file, project_info_file, x_train=None, y_train=None, y_test=None, title_story=None, title_description=None, metrics=None, working_dir=None, notebook_path=None, kernel_name=None)[source]

This method will generate an HTML report containing different information about the project. It analyzes the data and the model used in order to provide interesting insights that can be shared using the HTML format. It requires a project info yml file on which can figure different information about the project. :param output_file: Path to the HTML file to write. :type output_file: str :param project_info_file: Path to the file used to display some information about the project in the report. :type project_info_file: str :param x_train: DataFrame used for training the model. :type x_train: pd.DataFrame, optional :param y_train: Series of labels in the training set. :type y_train: pd.Series or pd.DataFrame, optional :param y_test: Series of labels in the test set. :type y_test: pd.Series or pd.DataFrame, optional :param title_story: Report title. :type title_story: str, optional :param title_description: Report title description (as written just below the title). :type title_description: str, optional :param metrics: Metrics used in the model performance section. The metrics parameter should be a list

of dict. Each dict contains they following keys : ‘path’ (path to the metric function, ex: ‘sklearn.metrics.mean_absolute_error’), ‘name’ (optional, name of the metric as displayed in the report), and ‘use_proba_values’ (optional, possible values are False (default) or True if the metric uses proba values instead of predicted values). For example, metrics=[{‘name’: ‘F1 score’, ‘path’: ‘sklearn.metrics.f1_score’}]

Parameters
  • working_dir (str, optional) – Working directory in which will be generated the notebook used to create the report and where the objects used to execute it will be saved. This parameter can be usefull if one wants to create its own custom report and debug the notebook used to generate the html report. If None, a temporary directory will be used.

  • notebook_path (str, optional) – Path to the notebook used to generate the report. If None, the Shapash base report notebook will be used.

  • kernel_name (str, optional) – Name of the kernel used to generate the report. This parameter can be usefull if you have multiple jupyter kernels and that the method does not use the right kernel by default.

Examples

>>> xpl.generate_report(
        output_file='report.html',
        project_info_file='utils/project_info.yml',
        x_train=x_train,
        y_train=y_train,
        y_test=ytest,
        title_story="House prices project report",
        title_description="This document is a data science report of the kaggle house prices project."
        metrics=[
            {
                'path': 'sklearn.metrics.mean_squared_error',
                'name': 'Mean squared error',  # Optional : name that will be displayed next to the metric
            },
            {
                'path': 'sklearn.metrics.mean_absolute_error',
                'name': 'Mean absolute error',
            }
        ]
    )
classmethod load(path)[source]

Load method allows Shapash user to use pickled SmartExplainer. To use this method you must first declare your SmartExplainer object Watch the following example :param path: File path of the pickle file. :type path: str

Example

>>> xpl = SmartExplainer.load('path_to_pkl/xpl.pkl')
run_app(port: Optional[int] = None, host: Optional[str] = None, title_story: Optional[str] = None, settings: Optional[dict] = None) shapash.utils.threading.CustomThread[source]

run_app method launches the interpretability web app associated with the shapash object. run_app method can be used directly in a Jupyter notebook The link to the webapp is directly mentioned in the Jupyter output Use object.kill() method to kill the current instance Examples are presented in the web_app tutorial (please check tutorial part of this doc) :param port: The port is by default on 8050. You can specify a custom port

for your webapp.

Parameters
  • host (str (default: None)) – The default host is ‘0.0.0.0’. You can specify a custom ip address for your webapp

  • title_story (str (default: None)) – The default title is empty. You can specify a custom title for your webapp (can be reused in other methods like in a report, …)

  • settings (dict (default: None)) – A dict describing the default webapp settings values to be used Possible settings (dict keys) are ‘rows’, ‘points’, ‘violin’, ‘features’ Values should be positive ints

Returns

Return the thread instance of your server.

Return type

CustomThread

Example

>>> app = xpl.run_app()
>>> app.kill()
save(path)[source]

Save method allows user to save SmartExplainer object on disk using a pickle file. Save method can be useful: you don’t have to recompile to display results later :param path: File path to store the pickle file :type path: str

Example

>>> xpl.save('path_to_pkl/xpl.pkl')
to_pandas(features_to_hide=None, threshold=None, positive=None, max_contrib=None, proba=False, use_groups=None)[source]

The to_pandas method allows to export the summary of local explainability. This method proposes a set of parameters to summarize the explainability of each point. If the user does not specify any, the to_pandas method uses the parameter specified during the last execution of the filter method. In classification case, The method to_pandas summarizes the explicability which corresponds to the predicted values specified by the user (with compile or add method). the proba parameter displays the corresponding predict proba value for each point In classification case, There are 2 ways to use this to pandas method. - Provide a real prediction set to explain - Focus on a constant target value and look at the proba and explainability corresponding to each point. (in that case, specify a constant pd.Series with add or compile method) Examples are presented in the tutorial local_plot (please check tutorial part of this doc) :param features_to_hide: List of strings, containing features to hide. :type features_to_hide: list, optional (default: None) :param threshold: Absolute threshold below which any contribution is hidden. :type threshold: float, optional (default: None) :param positive: If True, hide negative values. Hide positive values otherwise. If None, hide nothing. :type positive: bool, optional (default: None) :param max_contrib: Number of contributions to show in the pandas df :type max_contrib: int, optional (default: 5) :param proba: adding proba in output df :type proba: bool, optional (default: False) :param use_groups: Whether or not to use groups of features contributions (only available if features_groups

parameter was not empty when calling compile method).

Returns

  • selected explanation of each row for classification case

Return type

pandas.DataFrame

Examples

>>> summary_df = xpl.to_pandas(max_contrib=2,proba=True)
>>> summary_df
    pred        proba       feature_1   value_1     contribution_1      feature_2       value_2     contribution_2
0       0           0.756416    Sex             1.0             0.322308            Pclass          3.0         0.155069
1       3           0.628911    Sex             2.0             0.585475            Pclass          1.0         0.370504
2       0           0.543308    Sex             2.0             -0.486667           Pclass          3.0         0.255072

The Plot Methods

class shapash.explainer.smart_plotter.SmartPlotter(explainer)[source]

Bases: object

SmartPlotter is a Bridge pattern decoupling plotting functions from SmartExplainer. The smartplotter class includes all the methods used to display graphics Each SmartPlotter method is easy to use from a Smart explainer object, just use the following syntax Attributes : explainer: object

SmartExplainer instance to point to.

Example

>>> xpl.plot.my_plot_method(param=value)
compacity_plot(selection=None, max_points=2000, force=False, approx=0.9, nb_features=5, file_name=None, auto_open=False)[source]

The Compacity_plot has the main objective of determining if a small subset of features can be extracted to provide a simpler explanation of the model. indeed, having too many features might negatively affect the model explainability and make it harder to undersand. The following two plots are proposed: * We identify the minimum number of required features (based on the top contribution values) that well approximate the model, and thus, provide accurate explanations. In particular, the prediction with the chosen subset needs to be close enough (see distance definition below) to the one obtained with all features. * Conversely, we determine how close we get to the output with all features by using only a subset of them. Distance definition * For regression: .. math:

distance = \frac{|output_{allFeatures} - output_{currentFeatures}|}{|output_{allFeatures}|}
  • For classification:

\[distance = |output_{allFeatures} - output_{currentFeatures}|\]
Parameters
  • selection (list) – Contains list of index, subset of the input DataFrame that we use for the compute of stability statistics

  • max_points (int, optional) – Maximum number to plot in compacity plot, by default 2000

  • force (bool, optional) – force == True, force the compute of stability values, by default False

  • approx (float, optional) – How close we want to be from model with all features, by default 0.9 (=90%)

  • nb_features (int, optional) – Number of features used, by default 5

  • file_name (string, optional) – Specify the save path of html files. If it is not provided, no file will be saved, by default None

  • auto_open (bool, optional) – open automatically the plot, by default False

compare_plot(index=None, row_num=None, label=None, max_features=20, width=900, height=550, show_predict=True, file_name=None, auto_open=True)[source]

Plotly comparison plot of several individuals’ contributions. Plots contributions feature by feature. Allows to see the differences of contributions between two or more individuals, with each individual represented by a unique line. :param index: 1st option to select individual rows.

Int list of index referencing rows.

Parameters
  • row_num (list) – 2nd option to select individual rows. int list corresponding to the row numbers of individuals (starting at 0).

  • label (int or string (default: None)) – If the label is of string type, check if it can be changed to integer to select the good dataframe object.

  • max_features (int (optional, default: 20)) – Number of contributions to show. If greater than the total of features, shows all.

  • width (int (default: 900)) – Plotly figure - layout width.

  • height (int (default: 550)) – Plotly figure - layout height.

  • show_predict (boolean (default: True)) – Shows predict or predict_proba value.

  • file_name (string (optional)) – File name to use to save the plotly bar chart. If None the bar chart will not be saved.

  • auto_open (boolean (optional)) – Indicates whether to open the bar plot or not.

Returns

Comparison plot of the contributions of the different individuals.

Return type

Plotly Figure Object

Example

>>> xpl.plot.compare_plot(row_num=[0, 1, 2])
contribution_plot(col, selection=None, label=- 1, violin_maxf=10, max_points=2000, proba=True, width=900, height=600, file_name=None, auto_open=False, zoom=False)[source]

contribution_plot method diplays a Plotly scatter or violin plot of a selected feature. It represents the contribution of the selected feature to the predicted value. This plot allows the user to understand how the value of a feature affects a prediction Type of plot (Violin/scatter) is automatically selected. It depends on the feature to be analyzed, the type of use case (regression / classification) and the presence of predicted values attribute. A sample is taken if the number of points to be displayed is too large Using col parameter, shapash user can specify the column num, name or column label of the feature contribution_plot tutorial offers many examples (please check tutorial part of this doc) :param col: Name, label name or column number of the column whose contributions we want to plot :type col: String or Int :param selection: Contains list of index, subset of the input DataFrame that we want to plot :type selection: list (optional) :param label: If the label is of string type, check if it can be changed to integer to select the

good dataframe object.

Parameters
  • violin_maxf (int (optional, default: 10)) – maximum number modality to plot violin. If the feature specified with col argument has more modalities than violin_maxf, a scatter plot will be choose

  • max_points (int (optional, default: 2000)) – maximum number to plot in contribution plot. if input dataset is bigger than max_points, a sample limits the number of points to plot. nb: you can also limit the number using ‘selection’ parameter.

  • proba (bool (optional, default: True)) – use predict_proba to color plot (classification case)

  • width (Int (default: 900)) – Plotly figure - layout width

  • height (Int (default: 600)) – Plotly figure - layout height

  • file_name (string (optional)) – File name to use to save the plotly bar chart. If None the bar chart will not be saved.

  • auto_open (Boolean (optional)) – Indicate whether to open the bar plot or not.

  • zoom (bool (default=False)) – graph is currently zoomed

Return type

Plotly Figure Object

Example

>>> xpl.plot.contribution_plot(0)
features_importance(max_features=20, selection=None, label=- 1, group_name=None, display_groups=True, force=False, width=900, height=500, file_name=None, auto_open=False, zoom=False)[source]

features_importance display a plotly features importance plot. in Multiclass Case, this features_importance focus on a label value. User specifies the label value using label parameter. the selection parameter allows the user to compare a subset to the global features importance features_importance tutorial offers several examples

(please check tutorial part of this doc)

Parameters
  • max_features (int (optional, default 20)) – this argument limit the number of hbar in features importance plot if max_features is 20, plot selects the 20 most important features

  • selection (list (optional, default None)) – This argument allows to represent the importance calculated with a subset. Subset features importance is compared to global in the plot Argument must contains list of index, subset of the input DataFrame that we want to plot

  • label (integer or string (default -1)) – If the label is of string type, check if it can be changed to integer to select the good dataframe object.

  • group_name (str (optional, default None)) – Allows to display the features importance of the variables that are grouped together inside a group of features. This parameter is only available if the SmartExplainer object has been compiled using the features_groups optional parameter and should correspond to a key of features_groups dictionary.

  • display_groups (bool (default True)) – If groups of features are declared in SmartExplainer object, this parameter allows to specify whether or not to display them.

  • force (bool (optional, default False)) – force == True, force the compute features importance if it’s already done

  • width (Int (default: 900)) – Plotly figure - layout width

  • height (Int (default: 500)) – Plotly figure - layout height

  • file_name (string (optional)) – File name to use to save the plotly bar chart. If None the bar chart will not be saved.

  • auto_open (Boolean (optional)) – Indicate whether to open the bar plot or not.

  • zoom (bool (default=False)) – graph is currently zoomed

Return type

Plotly Figure Object

Example

>>> xpl.plot.features_importance()
local_neighbors_plot(index, max_features=10, file_name=None, auto_open=False)[source]

The Local_neighbors_plot has the main objective of increasing confidence in interpreting the contribution values of a selected instance. This plot analyzes the local neighborhood of the instance, and compares its contribution values with those of its neighbors. Intuitively, for similar instances, we would expect similar contributions. Those neighbors are selected as follows : * We select top N neighbors for each instance (using L1 norm + variance normalization) * We discard neighbors whose model output is too different (see equations below) from the instance output * We discard additional neighbors if their distance to the instance is bigger than a predefined value (to remove outliers) In this neighborhood, we would expect instances to have similar SHAP values. If not, one might need to be cautious when interpreting SHAP values. The difference between outputs is measured with the following distance definition : * For regression: .. math:

distance = \frac{|output_{allFeatures} -
                  output_{currentFeatures}|}{|output_{allFeatures}|}
  • For classification:

\[distance = |output_{allFeatures} - output_{currentFeatures}|\]
Parameters
  • index (int) – Contains index row of the input DataFrame that we use to display contribution values in the neighborhood

  • max_features (int, optional) – Maximum number of displayed features, by default 10

  • file_name (string, optional) – Specify the save path of html files. If it is not provided, no file will be saved, by default None

  • auto_open (bool, optional) – open automatically the plot, by default False

Returns

The figure that will be displayed

Return type

fig

local_plot(index=None, row_num=None, query=None, label=None, show_masked=True, show_predict=True, display_groups=None, yaxis_max_label=12, width=900, height=550, file_name=None, auto_open=False, zoom=False)[source]

The local_plot method is used to display the local contributions of an individual in the dataset. The plot returned is a summary of local explainability. you could use the method filter beforehand to modify the parameters of this summary. preprocessing is used here to make this graph more intelligible index, row_num or query parameter can be used to select the local explanations to display local_plot tutorial offers a lot of examples (please check tutorial part of this doc) :param index: 1rst option, to select a row whose local contribution will be displayed.

Use this parameter to select a row by index

Parameters
  • row_num (int (default None)) – 2nd option, specify the row number to select the row whose local contribution will be displayed.

  • query (string) – 3rd option: Boolean condition that must filter only one line of the prediction set before plotting.

  • label (integer or string (default None)) – If the label is of string type, check if it can be changed to integer to select the good dataframe object.

  • show_masked (bool (default: False)) – show the sum of the contributions of the hidden variable

  • show_predict (bool (default: True)) – show predict or predict proba value

  • yaxis_max_label (int) – Maximum number of variables to display labels on the y axis

  • display_groups (bool (default: None)) – Whether or not to display groups of features. This option is only useful if groups of features are declared when compiling SmartExplainer object.

  • width (Int (default: 900)) – Plotly figure - layout width

  • height (Int (default: 550)) – Plotly figure - layout height

  • file_name (string (optional)) – File name to use to save the plotly bar chart. If None the bar chart will not be saved.

  • auto_open (Boolean (optional)) – Indicate whether to open the bar plot or not.

  • zoom (bool (default=False)) – graph is currently zoomed

Returns

Input arrays updated with masked contributions.

Return type

Plotly Figure Object

Example

>>> xpl.plot.local_plot(row_num=0)
stability_plot(selection=None, max_points=500, force=False, max_features=10, distribution='none', file_name=None, auto_open=False)[source]

The Stability_plot has the main objective of increasing confidence in contribution values, and helping determine if we can trust an explanation. The idea behind local stability is the following : if instances are very similar, then one would expect the explanations to be similar as well. Therefore, locally stable explanations are an important factor that help build trust around a particular explanation method. The generated graphs can take multiple forms, but they all analyze the same two aspects: for each feature we look at Amplitude vs. Variability. in order terms, how important the feature is on average vs. how the feature impact changes in the instance neighborhood. The average importance of the feature is the average SHAP value of the feature acros all considered instances The neighborhood is defined as follows : * We select top N neighbors for each instance (using L1 norm + variance normalization) * We discard neighbors whose model output is too different (see equations below) from the instance output * We discard additional neighbors if their distance to the instance is bigger than a predefined value (to remove outliers) The difference between outputs is measured with the following distance definition: * For regression: .. math:

distance = \frac{|output_{allFeatures} - output_{currentFeatures}|}{|output_{allFeatures}|}
  • For classification:

\[distance = |output_{allFeatures} - output_{currentFeatures}|\]
Parameters
  • selection (list) – Contains list of index, subset of the input DataFrame that we use for the compute of stability statistics

  • max_points (int, optional) – Maximum number to plot in compacity plot, by default 500

  • force (bool, optional) – force == True, force the compute of stability values, by default False

  • distribution (str, optional) – Add distribution of variability for each feature, by default ‘none’. The other values are ‘boxplot’ or ‘violin’ that specify the type of plot

  • file_name (string, optional) – Specify the save path of html files. If it is not provided, no file will be saved, by default None

  • auto_open (bool, optional) – open automatically the plot, by default False

Returns

  • If single instance

    • plot – Normalized contribution values of instance and neighbors

  • If multiple instances

    • if distribution == “none”: Mean amplitude of each feature contribution vs. mean variability across neighbors

    • if distribution == “boxplot”: Distribution of contributions of each feature in instances neighborhoods.

    Graph type is box plot * if distribution == “violin”: Distribution of contributions of each feature in instances neighborhoods. Graph type is violin plot

top_interactions_plot(nb_top_interactions=5, selection=None, violin_maxf=10, max_points=500, width=900, height=600, file_name=None, auto_open=False)[source]

Displays a dynamic plot with the nb_top_interactions most important interactions existing between two variables. The most important interactions are determined computing the sum of all absolute shap interactions values between all existing pairs of variables. A button allows to select and display the corresponding features values and their shap contribution values. :param nb_top_interactions: Number of top interactions to display. :type nb_top_interactions: int :param selection: Contains list of index, subset of the input DataFrame that we want to plot :type selection: list (optional) :param violin_maxf: maximum number modality to plot violin. If the feature specified with col argument

has more modalities than violin_maxf, a scatter plot will be choose

Parameters
  • max_points (int (optional, default: 500)) – maximum number to plot in contribution plot. if input dataset is bigger than max_points, a sample limits the number of points to plot. nb: you can also limit the number using ‘selection’ parameter.

  • width (Int (default: 900)) – Plotly figure - layout width

  • height (Int (default: 600)) – Plotly figure - layout height

  • file_name (string (optional)) – File name to use to save the plotly bar chart. If None the bar chart will not be saved.

  • auto_open (Boolean (optional)) – Indicate whether to open the bar plot or not.

Return type

go.Figure

Example

>>> xpl.plot.top_interactions_plot()
class shapash.explainer.consistency.Consistency[source]

Bases: object

Consistency class

consistency_plot(selection=None, max_features=20)[source]

The Consistency_plot has the main objective of comparing explainability methods.

Because explainability methods are different from each other, they may not give the same explanation to the same instance. Then, which method should be selected? Answering this question is tough. This method compares methods between them and evaluates how close the explanations are from each other. The idea behind this is pretty simple: if underlying assumptions lead to similar results, we would be more confident in using those methods. If not, careful conideration should be taken in the interpretation of the explanations

Parameters
  • selection (list) – Contains list of index, subset of the input DataFrame that we use for the compute of consitency statistics, by default None

  • max_features (int, optional) – Maximum number of displayed features, by default 20