SmartExplainer Object¶
SmartExplainer declaration and data preparation¶
- class shapash.explainer.smart_explainer.SmartExplainer(model, backend='shap', preprocessing=None, postprocessing=None, features_groups=None, features_dict=None, label_dict=None, title_story: Optional[str] = None, palette_name=None, colors_dict=None, **backend_kwargs)[source]¶
Bases:
objectThe main class of the Shapash library, designed to make machine learning model results more interpretable and understandable.
SmartExplainer links together the model, encoders, datasets, predictions, and label dictionaries. It provides a variety of methods to visualize and analyze model explanations both in notebooks and in the Shapash WebApp.
- Parameters
model (object) – The model to be explained. Used for consistency checks and, in some cases, to compute predict and predict_proba values.
backend (str or shapash.backend.BaseBackend, default='shap') – Defines the backend used to compute feature contributions and importances. Options: - ‘shap’: use SHAP as backend. - ‘lime’: use LIME as backend. You can also pass a custom backend class that inherits from shapash.backend.BaseBackend.
preprocessing (category_encoders, ColumnTransformer, list, dict, optional (default: None)) – –> Differents types of preprocessing are available: - A single category_encoders (OrdinalEncoder/OnehotEncoder/BaseNEncoder/BinaryEncoder/TargetEncoder) - A single ColumnTransformer with scikit-learn encoding or category_encoders transformers - A list with multiple category_encoders with optional (dict, list of dict) - A list with a single ColumnTransformer with optional (dict, list of dict) - A dict - A list of dict
postprocessing (dict, optional (default: None)) –
Dictionnary of postprocessing modifications to apply in x_init dataframe. Dictionnary with feature names as keys (or number, or well labels referencing to features names), which modifies dataset features by features. –> Different types of postprocessing are available, but the syntax is this one: One key by features, 5 different types of modifications:
features_groups : dict, optional (default: None)
Dictionnary containing features that should be grouped together. This option allows to compute and display the contributions and importance of this group of features. Features that are grouped together will still be displayed in the webapp when clicking on a group. >>> { ‘feature1’ : { ‘type’ : ‘prefix’, ‘rule’ : ‘age: ‘ }, ‘feature2’ : { ‘type’ : ‘suffix’, ‘rule’ : ‘$/week ‘ }, ‘feature3’ : { ‘type’ : ‘transcoding’, ‘rule‘: { ‘code1’ : ‘single’, ‘code2’ : ‘married’}}, ‘feature4’ : { ‘type’ : ‘regex’ , ‘rule‘: { ‘in’ : ‘AND’, ‘out’ : ‘ & ‘ }}, ‘feature5’ : { ‘type’ : ‘case’ , ‘rule‘: ‘lower’‘ } } Only one transformation by features is possible.
features_groups (dict, optional) –
Groups of features to be aggregated together in plots and importance computations. Each key defines a group name, and its value is a list of feature names.
Example: >>> { … ‘feature_group_1’: [‘feature3’, ‘feature7’, ‘feature24’], … ‘feature_group_2’: [‘feature1’, ‘feature12’] … }
features_dict (dict, optional) – Mapping from technical feature names to domain-specific (readable) names.
label_dict (dict, optional) – Mapping from numeric labels to human-readable class names (for classification tasks).
title_story (str, optional) – Custom title used in visualizations and reports. Default is empty.
palette_name (str, optional) – Name of the color palette used for visualizations (see the style folder for options).
colors_dict (dict, optional) – Dictionary containing the full color palette configuration. Can be used to override default plot colors.
**backend_kwargs (dict) – Additional keyword arguments passed to the backend.
- data¶
Data dictionary has 3 entries. Each key returns a pd.DataFrame (regression) or a list of pd.DataFrame (classification - The length of the lists is equivalent to the number of labels). All pd.DataFrame have she same shape (n_samples, n_features). For the regression case, data that should be regarded as a single array of size (n_samples, n_features, 3). data[‘contrib_sorted’]: pandas.DataFrame (regression) or list of pandas.DataFrame (classification)
Contains local contributions of the prediction set, with common line index. Columns are ‘contrib_1’, ‘contrib_2’, … and contains the top contributions for each line from left to right. In multi-class problems, this is a list of contributions, one for each class.
- data[‘var_dict’]: pandas.DataFrame (regression) or list of pandas.DataFrame (classification)
Must contain only ints. It gives, for each line, the list of most import features regarding the local decomposition. In order to save space, columns are denoted by integers, the conversion being done with the columns_dict member. In multi-class problems, this is a list of dataframes, one for each class.
- data[‘x_sorted’]: pandas.DataFrame (regression) or list of pandas.DataFrame (classification)
It gives, for each line, the list of most important features values regarding the local decomposition. These values can only be understood with respect to data[‘var_dict’]
- Type
dict
- backend_name¶
Name of the backend if specified as a string.
- Type
str
- x_encoded¶
Preprocessed dataset used by the model.
- Type
pandas.DataFrame
- x_init¶
Inverse-transformed dataset (after preprocessing) with optional postprocessing.
- Type
pandas.DataFrame
- x_contrib_plot¶
Inverse-transformed dataset without postprocessing (used for plots).
- Type
pandas.DataFrame
- y_pred¶
Model predictions.
- Type
pandas.DataFrame
- contributions¶
Local feature contributions. Aggregated if preprocessing expands features (e.g., one-hot encoding).
- Type
pandas.DataFrame or list
- features_dict¶
Mapping from technical feature names to domain names.
- Type
dict
- inv_features_dict¶
Reverse mapping of features_dict.
- Type
dict
- label_dict¶
Mapping from numeric labels to class names.
- Type
dict
- inv_label_dict¶
Reverse mapping of label_dict.
- Type
dict
- columns_dict¶
Mapping from feature index to technical feature name.
- Type
dict
- plot¶
Object providing access to all plotting functions.
- Type
- model¶
The model being explained.
- Type
object
- features_desc¶
Number of unique values per feature in x_init.
- Type
dict
- features_imp¶
Computed feature importance values.
- Type
pandas.Series or list
- local_neighbors¶
Data displayed in local neighbor plots (normalized SHAP values, etc.).
- Type
dict
- features_stability¶
Data used for stability plots, including: - ‘amplitude’: average contribution values for selected instances. - ‘stability’: metric assessing stability across neighborhoods.
- Type
dict
- preprocessing¶
Preprocessing transformations applied to raw input data.
- Type
category_encoders object, ColumnTransformer, list, or dict
- postprocessing¶
Postprocessing rules applied after inverse preprocessing.
- Type
dict
- y_target¶
True target values.
- Type
pandas.Series or pandas.DataFrame, optional
Example
>>> xpl = SmartExplainer(model, features_dict=featd, label_dict=labeld) >>> xpl.compile(x=x_encoded, y_target=y) >>> xpl.plot.features_importance()
- add(y_pred=None, proba_values=None, y_target=None, label_dict=None, features_dict=None, title_story: Optional[str] = None, columns_order=None, additional_data=None, additional_features_dict=None)[source]¶
Add or update metadata and outputs without recompiling the explainer.
The add method lets users attach or modify supplementary information such as predictions, label or feature dictionaries, and display options without rerunning the full compile() process (which can be time-consuming for large datasets).
It can be used to: - Add or update y_pred (used to color plots or export results). - Add or update label_dict and features_dict for clearer labels in visualizations. - Include additional data or adjust column display order in the WebApp.
- Parameters
y_pred (pandas.Series or pandas.DataFrame, optional) – Model predictions (one column only). Must have the same index as x_init. Used in plots (e.g., to color scatter plots) and in export methods like to_pandas().
proba_values (pandas.Series or pandas.DataFrame, optional) – Prediction probabilities (one column only). Must have the same index as x_init. Useful for visualizations or probabilistic outputs.
y_target (pandas.Series or pandas.DataFrame, optional) – True target values (one column only). Must have the same index as x_init. Used for comparison and performance-oriented visualizations.
label_dict (dict, optional) – Mapping of integer labels to domain names (for classification targets). Enables clearer class naming in plots and tables.
features_dict (dict, optional) – Mapping of technical feature names to human-readable (domain) names. Improves interpretability of plots and exported data.
title_story (str, optional) – Custom title for reports or visualizations. Default is empty.
columns_order (list or str, optional) – Defines the display order of columns in the dataset. - If a list is provided, it specifies the exact order of columns. Columns not included will be appended automatically. - If set to ‘additional_data_first’, additional columns appear first. - If set to ‘additional_data_last’, additional columns appear last. Especially useful for controlling display order in the Shapash SmartApp.
additional_data (pandas.DataFrame, optional) – Extra dataset containing features outside the model. Must have the same index as x_init. Useful for filtering and enrichment in the Shapash WebApp.
additional_features_dict (dict, optional) – Dictionary mapping technical feature names to human-readable names for columns in additional_data.
Example
>>> # Add predictions and friendly feature names after compiling >>> xpl.add(y_pred=preds, features_dict=feat_dict) >>> xpl.plot.local_plot(index=5)
- compile(x, contributions=None, y_pred=None, proba_values=None, y_target=None, columns_order=None, additional_data=None, additional_features_dict=None)[source]¶
Prepare and structure all data needed for interpreting the model and its predictions.
The compile method is the first essential step to make your model explainable with Shapash. It organizes the model’s inputs, outputs, and contributions into a consistent format, applies inverse preprocessing, and computes all elements required for visualization and summaries.
Depending on dataset size and backend, this step may take some time.
- Parameters
x (pandas.DataFrame) – Prediction dataset — the same data seen by the end user. It should correspond to the raw prediction input (post-preprocessing). Shapash will use this dataset to compute and align explanations.
contributions (pandas.DataFrame, numpy.ndarray, or list, optional) – Local feature contributions for each sample. - If a DataFrame, its index and columns must match those of x. - If a numpy.ndarray, Shapash will automatically generate the corresponding index and column names based on x. - In multi-class settings, provide a list of contributions (one per class).
y_pred (pandas.Series or pandas.DataFrame, optional) – Model predictions. Must have the same index as x_init. Useful for customizing predicted values, for example when applying a custom threshold in classification tasks.
proba_values (pandas.Series or pandas.DataFrame, optional) – Prediction probabilities. Must have the same index as x_init. Useful for visualizations and for comparing probabilities across classes.
y_target (pandas.Series or pandas.DataFrame, optional) – True target values used for comparison or performance display. Must have the same index as x_init.
columns_order (list or str, optional) – Defines the display order of columns in the dataset. - If a list is provided, it specifies the exact order of columns. Any columns not included in the list will be added automatically. - If set to ‘additional_data_first’, all additional columns are placed first. - If set to ‘additional_data_last’, all additional columns are placed last. This option helps control column order in the Shapash WebApp and SmartApp.
additional_data (pandas.DataFrame, optional) – Additional features not used by the model but relevant for visualization or filtering in the WebApp. Must have the same index as x_init.
additional_features_dict (dict, optional) – Mapping of additional feature names (technical names) to user-friendly domain names, used to improve readability in plots and dashboards. Must have the same index as x_init.
Example
>>> xpl.compile(x=x_test) >>> xpl.plot.features_importance()
- filter(features_to_hide=None, threshold=None, positive=None, max_contrib=None, display_groups=None)[source]¶
Apply filtering rules to summarize local explainability results.
The filter method allows users to control which feature contributions are displayed or hidden when visualizing local explanations. It is typically used in combination with the local_plot method of SmartPlotter to display a filtered local contribution bar chart.
For detailed examples, see the Local Plot tutorial in the Shapash documentation.
- Parameters
features_to_hide (list of str, optional) – List of feature names to hide from the visualization. These can be individual feature names or group names if display_groups=True.
threshold (float, optional) – Absolute value threshold below which contributions are hidden. For example, setting threshold=0.01 hides all features with contribution magnitudes smaller than 0.01.
positive (bool, optional) – Defines whether to hide contributions by sign: - If True, hides negative contributions. - If False, hides positive contributions. - If None (default), all contributions are displayed.
max_contrib (int, optional) – Maximum number of contributions to display. Only the top max_contrib features (by absolute contribution) will be shown.
display_groups (bool, optional) – If True, feature groups defined in features_groups are displayed and filtered together. If False, only individual features are considered. By default, this is automatically set to True if feature groups are defined.
Notes
The filtering configuration is stored in self.mask_params.
The resulting filtered contributions are available in
self.masked_contributions.
Example
>>> # Hide specific features and small contributions >>> xpl.filter(features_to_hide=['Age', 'Gender'], threshold=0.01, max_contrib=10) >>> xpl.plot.local_plot(index=5)
- generate_report(output_file, project_info_file, x_train=None, y_train=None, y_test=None, title_story=None, title_description=None, metrics=None, working_dir=None, notebook_path=None, kernel_name=None, max_points=200, display_interaction_plot=False, nb_top_interactions=5)[source]¶
Generate an interactive HTML report summarizing the model and its explainability.
This method produces a comprehensive HTML report containing visual and textual insights about the project, dataset, and model performance. It leverages a predefined or custom Jupyter notebook template to analyze the model, generate plots, compute metrics, and export the final report.
A project information YAML file is required to describe key project details (e.g., model name, author, date, context).
- Parameters
output_file (str) – Path to the output HTML file where the report will be saved.
project_info_file (str) – Path to a YAML file containing project metadata to be displayed in the report (e.g., project name, author, date, description).
x_train (pandas.DataFrame, optional) – Training dataset used to fit the model. Used for generating feature summaries and training-related analyses.
y_train (pandas.Series or pandas.DataFrame, optional) – Target values corresponding to x_train.
y_test (pandas.Series or pandas.DataFrame, optional) – Target values for the test dataset.
title_story (str, optional) – Title displayed at the top of the report.
title_description (str, optional) – Short descriptive text displayed below the main title.
metrics (list of dict, optional) – List of metrics to compute and display in the performance section. Each dictionary should include: - ‘path’: str — import path to the metric function (e.g., “sklearn.metrics.f1_score”) - ‘name’: str, optional — display name for the metric - ‘use_proba_values’: bool, optional — if True, use predicted probabilities instead of labels Example: metrics=[{‘name’: ‘F1 score’, ‘path’: ‘sklearn.metrics.f1_score’}]
working_dir (str, optional) – Directory used to temporarily store generated files (e.g., notebook, outputs). If None, a temporary directory is automatically created and deleted after report generation.
notebook_path (str, optional) – Path to a custom notebook used as a template for generating the report. If None, the default Shapash report notebook is used.
kernel_name (str, optional) – Name of the Jupyter kernel to use for report execution. Useful when multiple kernels are available and the default one is incorrect.
max_points (int, optional, default=200) – Maximum number of points displayed in contribution plots.
display_interaction_plot (bool, optional, default=False) – If True, includes interaction plots in the report. (Note: this can increase computation time.)
nb_top_interactions (int, optional, default=5) – Number of top feature interactions to include in the report.
- Returns
The report is saved as an HTML file at the specified output_file location.
- Return type
None
- Raises
AssertionError – If the SmartExplainer instance is not compiled before report generation.
Exception – If an unexpected error occurs during report execution or export.
Notes
The method internally executes a notebook that generates the report content.
Temporary files are automatically cleaned up unless a custom working_dir is provided.
Interaction plots can be disabled to optimize runtime performance.
Example
>>> xpl.generate_report( ... output_file="report.html", ... project_info_file="utils/project_info.yml", ... x_train=x_train, ... y_train=y_train, ... y_test=y_test, ... title_story="House Prices Project Report", ... title_description="Comprehensive interpretability analysis for the Kaggle house prices dataset.", ... metrics=[ ... {"path": "sklearn.metrics.mean_squared_error", "name": "Mean Squared Error"}, ... {"path": "sklearn.metrics.mean_absolute_error", "name": "Mean Absolute Error"}, ... ], ... display_interaction_plot=True, ... nb_top_interactions=5, ... )
- classmethod load(path)[source]¶
Load a previously saved SmartExplainer object from a pickle file.
This class method restores a SmartExplainer instance that was saved using the save method. It allows users to quickly reload a compiled explainer without repeating the full preprocessing and explanation steps.
- Parameters
path (str) – File path to the pickle file containing the saved SmartExplainer object.
- Returns
A reloaded SmartExplainer instance identical to the one saved on disk.
- Return type
- Raises
ValueError – If the provided file does not contain a valid SmartExplainer object.
Example
>>> xpl = SmartExplainer.load("path_to_file/xpl.pkl") >>> xpl.plot.features_importance()
- run_app(port: Optional[int] = None, host: Optional[str] = None, title_story: Optional[str] = None, settings: Optional[dict] = None) shapash.utils.custom_thread.CustomThread[source]¶
Launch the Shapash interpretability WebApp associated with this SmartExplainer.
This method starts the interactive Shapash WebApp that enables users to explore model predictions, feature importances, and local explanations directly in their browser. It can be called directly from a Jupyter notebook — the application link will appear in the notebook output.
To stop the running app, use the .kill() method on the returned object.
Examples of usage are provided in the WebApp tutorial in the Shapash documentation.
- Parameters
port (int, optional) – Port number for the WebApp server. Defaults to 8050 if not specified.
host (str, optional) – Host address for the WebApp server. Defaults to “0.0.0.0”, allowing external access.
title_story (str, optional) – Custom title to display in the WebApp interface. This title can also be reused in reports or other visualizations.
settings (dict, optional) – Dictionary specifying default configuration values for the WebApp. Possible keys include: - ‘rows’ : int — number of rows displayed by default - ‘points’ : int — number of points in scatter plots - ‘violin’ : int — number of points in violin plots - ‘features’ : int — number of features shown in graphs All values must be positive integers.
- Returns
A thread instance running the WebApp server.
- Return type
CustomThread
- Raises
ValueError – If the SmartExplainer has not been compiled before launching the app.
Example
>>> # Launch the WebApp in a Jupyter notebook >>> app = xpl.run_app(port=8050) >>> # Stop the app >>> app.kill()
- save(path)[source]¶
Save the SmartExplainer object to disk as a pickle file.
This method serializes the current SmartExplainer instance and saves it to a .pkl file. It allows users to reload an explainer later without recompiling, which is especially useful for large datasets or models.
- Parameters
path (str) – Destination file path where the pickle file will be saved.
Notes
The smartapp attribute is removed before saving to avoid serialization issues.
The saved object can be reloaded using the load method.
Example
>>> xpl.save("path_to_file/xpl.pkl") >>> xpl_loaded = SmartExplainer.load("path_to_file/xpl.pkl")
- to_pandas(features_to_hide=None, threshold=None, positive=None, max_contrib=None, proba=False, use_groups=None)[source]¶
Export a summarized view of local explainability results as a pandas DataFrame.
The to_pandas method summarizes the local contributions of each feature for every sample, returning a DataFrame that combines predictions, probabilities (if applicable), and the top feature contributions.
If no filtering parameters are provided, the method automatically reuses the configuration from the most recent call to the filter method.
In classification tasks, this summary corresponds to the predicted values specified by the user (using either compile() or add()). You can also choose to include prediction probabilities using the proba parameter.
There are two main usage modes in classification: 1. Provide a real prediction set to explain. 2. Focus on a constant target value and analyze its explainability and associated probabilities (using a constant pd.Series passed during compile() or add()).
See the Local Plot tutorial for detailed examples.
- Parameters
features_to_hide (list of str, optional) – List of feature names to hide from the output summary.
threshold (float, optional) – Absolute value threshold below which feature contributions are hidden.
positive (bool, optional) – Determines which contribution signs to hide: - True: hide negative values. - False: hide positive values. - None (default): show all contributions.
max_contrib (int, optional) – Maximum number of top feature contributions to include for each sample. Default is 5.
proba (bool, optional) – If True, adds predicted probability values to the output DataFrame. Default is False.
use_groups (bool, optional) – If True, aggregates feature contributions by groups defined in features_groups (if available). Default automatically activates grouping if features_groups were defined during compile().
- Returns
A DataFrame summarizing local explanations for each sample. Columns typically include: - Predicted class or value (pred) - Probability (proba, if proba=True) - Top N feature names, values, and corresponding contributions
- Return type
pandas.DataFrame
- Raises
ValueError – If predictions (y_pred) are missing. Use compile() or add() before calling this method.
Example
>>> # Export a summary of local explanations with probabilities >>> summary_df = xpl.to_pandas(max_contrib=2, proba=True) >>> summary_df.head()
pred proba feature_1 value_1 contribution_1 feature_2 value_2 contribution_2
0 0 0.756416 Sex 1.0 0.322308 Pclass 3.0 0.155069 1 3 0.628911 Sex 2.0 0.585475 Pclass 1.0 0.370504 2 0 0.543308 Sex 2.0 -0.486667 Pclass 3.0 0.255072
The Plot Methods¶
- class shapash.explainer.smart_plotter.SmartPlotter(explainer, colors_dict=None)[source]¶
Bases:
objectSmartPlotter is a Bridge pattern decoupling plotting functions from SmartExplainer. The smartplotter class includes all the methods used to display graphics Each SmartPlotter method is easy to use from a Smart explainer object, just use the following syntax Attributes : explainer: object
SmartExplainer instance to point to.
Example
>>> xpl.plot.my_plot_method(param=value)
- compacity_plot(selection=None, max_points=2000, force=False, approx=0.9, nb_features=5, file_name=None, auto_open=False, height=600, width=900)[source]¶
The Compacity_plot has the main objective of determining if a small subset of features can be extracted to provide a simpler explanation of the model. indeed, having too many features might negatively affect the model explainability and make it harder to undersand. The following two plots are proposed: * We identify the minimum number of required features (based on the top contribution values) that well approximate the model, and thus, provide accurate explanations. In particular, the prediction with the chosen subset needs to be close enough (see distance definition below) to the one obtained with all features. * Conversely, we determine how close we get to the output with all features by using only a subset of them. Distance definition * For regression: .. math:
distance = \frac{|output_{allFeatures} - output_{currentFeatures}|}{|output_{allFeatures}|}
For classification:
\[distance = |output_{allFeatures} - output_{currentFeatures}|\]- Parameters
selection (list) – Contains list of index, subset of the input DataFrame that we use for the compute of stability statistics
max_points (int, optional) – Maximum number to plot in compacity plot, by default 2000
force (bool, optional) – force == True, force the compute of stability values, by default False
approx (float, optional) – How close we want to be from model with all features, by default 0.9 (=90%)
nb_features (int, optional) – Number of features used, by default 5
file_name (string, optional) – Specify the save path of html files. If it is not provided, no file will be saved, by default None
auto_open (bool, optional) – open automatically the plot, by default False
height (int, optional) – height of the plot, by default 600
width (int, optional) – width of the plot, by default 900
- compare_plot(index=None, row_num=None, label=None, max_features=20, width=900, height=550, show_predict=True, file_name=None, auto_open=True)[source]¶
Plotly comparison plot of several individuals’ contributions. Plots contributions feature by feature. Allows to see the differences of contributions between two or more individuals, with each individual represented by a unique line. :param index: 1st option to select individual rows.
Int list of index referencing rows.
- Parameters
row_num (list) – 2nd option to select individual rows. int list corresponding to the row numbers of individuals (starting at 0).
label (int or string (default: None)) – If the label is of string type, check if it can be changed to integer to select the good dataframe object.
max_features (int (optional, default: 20)) – Number of contributions to show. If greater than the total of features, shows all.
width (int (default: 900)) – Plotly figure - layout width.
height (int (default: 550)) – Plotly figure - layout height.
show_predict (boolean (default: True)) – Shows predict or predict_proba value.
file_name (string (optional)) – File name to use to save the plotly bar chart. If None the bar chart will not be saved.
auto_open (boolean (optional)) – Indicates whether to open the bar plot or not.
- Returns
Comparison plot of the contributions of the different individuals.
- Return type
Plotly Figure Object
Example
>>> xpl.plot.compare_plot(row_num=[0, 1, 2])
- contribution_plot(col, selection=None, label=- 1, violin_maxf=10, max_points=2000, proba=True, width=900, height=600, file_name=None, auto_open=False, zoom=False)[source]¶
Display a contribution plot using Plotly for a selected feature.
This method visualizes the contribution of a given feature to model predictions, helping users understand how the feature’s value influences the prediction outcome. Depending on the feature type, use case (regression or classification), and model outputs (e.g., predicted values), the plot will be either a scatter or a violin plot.
For large datasets, the plot will automatically sample data points to maintain performance. Users can specify a subset of data via the selection parameter.
- Parameters
col (str or int) – The name, label, or column index of the feature to be plotted.
selection (list, optional) – List of row indices to plot (i.e., a subset of the input DataFrame). If None, all rows are considered (up to max_points).
label (int or str, default=-1) – Class label to select (used in classification). If a string is provided, it will be cast to an integer if possible.
violin_maxf (int, default=10) – Maximum number of unique values (modalities) allowed for a feature to use a violin plot. If the feature has more than this, a scatter plot will be used instead.
max_points (int, default=2000) – Maximum number of data points to display. If the dataset is larger, a sample will be drawn. Can be overridden using the selection parameter.
proba (bool, default=True) – Whether to use predicted probabilities (via predict_proba) to color the plot. Applies to classification problems.
width (int, default=900) – Width of the Plotly figure in pixels.
height (int, default=600) – Height of the Plotly figure in pixels.
file_name (str, optional) – If provided, the plot will be saved to this file (Plotly-supported formats).
auto_open (bool, default=False) – Whether to automatically open the saved plot in a web browser.
zoom (bool, default=False) – Indicates whether the plot should start in a zoomed-in state.
- Returns
The generated Plotly figure object.
- Return type
plotly.graph_objects.Figure
Examples
>>> xpl.plot.contribution_plot(0) >>> xpl.plot.contribution_plot("Age", selection=[0, 1, 2], label=1)
Notes
For more usage examples, refer to the contribution plot tutorial in the documentation.
- features_importance(mode='global', max_features=20, page='top', selection=None, label=- 1, group_name=None, display_groups=True, force=False, width=900, height=500, file_name=None, auto_open=False, zoom=False, normalize_by_nb_samples=False, degree='slider')[source]¶
Display a Plotly feature importance plot.
This method generates a feature importance plot for both classification and regression models. For multiclass classification, the plot will focus on the specified label.
- Parameters
mode (str, optional, default: 'global') – Defines the type of plot to display. - ‘global’: Displays the feature importance plot from a global perspective. - ‘global-local’: Shows the global feature importance plot with local importance indicators. - ‘cumulative’: Shows the cumulative sum of feature contributions, ordered by descending importance.
max_features (int, optional, default: 20) – Limits the number of features to display in the plot. For example, max_features=20 will display the 20 most important features.
page (int or str, optional, default: 'top') – Allows the user to select which set of features to display. - ‘top’: Shows the most important features. - ‘worst’: Shows the least important features. - Page number (integer) allows navigation between different sets of features.
selection (list, optional, default: None) – Specifies a subset of features to compare to the global feature importance. This is only applicable when mode is set to ‘global’. If provided, the list must contain indices corresponding to the subset of features to be displayed.
label (int or str, optional, default: -1) – Specifies the label for which to display feature importance in multiclass classification. If a string label is provided, it will be converted to an integer if applicable.
group_name (str, optional, default: None) – Displays feature importance for a specific group of features. This is only available if the SmartExplainer object has been compiled with feature groups. The group name must correspond to a key in the features_groups dictionary.
display_groups (bool, optional, default: True) – If feature groups are declared in the SmartExplainer object, this parameter specifies whether or not to display them in the plot.
force (bool, optional, default: False) – If True, forces recomputation of feature importance, even if it has already been computed.
width (int, optional, default: 900) – The width of the Plotly figure layout.
height (int, optional, default: 500) – The height of the Plotly figure layout.
file_name (str, optional) – The name of the file to save the Plotly bar chart. If None, the chart will not be saved.
auto_open (bool, optional) – If True, automatically opens the generated plot.
zoom (bool, optional, default: False) – Indicates whether the graph is currently zoomed in.
normalize_by_nb_samples (bool, optional, default: False) – Normalizes feature importance by the number of samples. This is only applicable when mode is set to ‘cumulative’.
degree (int, optional, default: 0) – Degree of adjustment to apply to the cumulative feature contributions curve. This is only applicable when mode is set to ‘cumulative’.
- Returns
The generated Plotly figure object containing the feature importance plot.
- Return type
plotly.graph_objs._figure.Figure
Examples
>>> xpl.plot.features_importance()
- local_neighbors_plot(index, max_features=10, file_name=None, auto_open=False, height='auto', width=900)[source]¶
The Local_neighbors_plot has the main objective of increasing confidence in interpreting the contribution values of a selected instance. This plot analyzes the local neighborhood of the instance, and compares its contribution values with those of its neighbors. Intuitively, for similar instances, we would expect similar contributions. Those neighbors are selected as follows : * We select top N neighbors for each instance (using L1 norm + variance normalization) * We discard neighbors whose model output is too different (see equations below) from the instance output * We discard additional neighbors if their distance to the instance is bigger than a predefined value (to remove outliers) In this neighborhood, we would expect instances to have similar SHAP values. If not, one might need to be cautious when interpreting SHAP values. The difference between outputs is measured with the following distance definition : * For regression: .. math:
distance = \frac{|output_{allFeatures} - output_{currentFeatures}|}{|output_{allFeatures}|}
For classification:
\[distance = |output_{allFeatures} - output_{currentFeatures}|\]- Parameters
index (int) – Contains index row of the input DataFrame that we use to display contribution values in the neighborhood
max_features (int, optional) – Maximum number of displayed features, by default 10
file_name (string, optional) – Specify the save path of html files. If it is not provided, no file will be saved, by default None
auto_open (bool, optional) – open automatically the plot, by default False
height (str or int, optional) – Height of the figure. Default is ‘auto’.
width (int, optional) – Width of the figure. Default is 900.
- Returns
The figure that will be displayed
- Return type
fig
- local_plot(index=None, row_num=None, query=None, label=None, show_masked=True, show_predict=True, display_groups=None, yaxis_max_label=12, width=900, height=550, file_name=None, auto_open=False, zoom=False)[source]¶
The local_plot method is used to display the local contributions of an individual in the dataset. The plot returned is a summary of local explainability. you could use the method filter beforehand to modify the parameters of this summary. preprocessing is used here to make this graph more intelligible index, row_num or query parameter can be used to select the local explanations to display local_plot tutorial offers a lot of examples (please check tutorial part of this doc) :param index: 1rst option, to select a row whose local contribution will be displayed.
Use this parameter to select a row by index
- Parameters
row_num (int (default None)) – 2nd option, specify the row number to select the row whose local contribution will be displayed.
query (string) – 3rd option: Boolean condition that must filter only one line of the prediction set before plotting.
label (integer or string (default None)) – If the label is of string type, check if it can be changed to integer to select the good dataframe object.
show_masked (bool (default: False)) – show the sum of the contributions of the hidden variable
show_predict (bool (default: True)) – show predict or predict proba value
yaxis_max_label (int) – Maximum number of variables to display labels on the y axis
display_groups (bool (default: None)) – Whether or not to display groups of features. This option is only useful if groups of features are declared when compiling SmartExplainer object.
width (Int (default: 900)) – Plotly figure - layout width
height (Int (default: 550)) – Plotly figure - layout height
file_name (string (optional)) – File name to use to save the plotly bar chart. If None the bar chart will not be saved.
auto_open (Boolean (optional)) – Indicate whether to open the bar plot or not.
zoom (bool (default=False)) – graph is currently zoomed
- Returns
Input arrays updated with masked contributions.
- Return type
Plotly Figure Object
Example
>>> xpl.plot.local_plot(row_num=0)
- stability_plot(selection=None, max_points=500, force=False, max_features=10, distribution='none', file_name=None, auto_open=False, height='auto', width=900)[source]¶
The Stability_plot has the main objective of increasing confidence in contribution values, and helping determine if we can trust an explanation. The idea behind local stability is the following : if instances are very similar, then one would expect the explanations to be similar as well. Therefore, locally stable explanations are an important factor that help build trust around a particular explanation method. The generated graphs can take multiple forms, but they all analyze the same two aspects: for each feature we look at Amplitude vs. Variability. in order terms, how important the feature is on average vs. how the feature impact changes in the instance neighborhood. The average importance of the feature is the average SHAP value of the feature acros all considered instances The neighborhood is defined as follows : * We select top N neighbors for each instance (using L1 norm + variance normalization) * We discard neighbors whose model output is too different (see equations below) from the instance output * We discard additional neighbors if their distance to the instance is bigger than a predefined value (to remove outliers) The difference between outputs is measured with the following distance definition: * For regression: .. math:
distance = \frac{|output_{allFeatures} - output_{currentFeatures}|}{|output_{allFeatures}|}
For classification:
\[distance = |output_{allFeatures} - output_{currentFeatures}|\]- Parameters
selection (list) – Contains list of index, subset of the input DataFrame that we use for the compute of stability statistics
max_points (int, optional) – Maximum number to plot in compacity plot, by default 500
force (bool, optional) – force == True, force the compute of stability values, by default False
distribution (str, optional) – Add distribution of variability for each feature, by default ‘none’. The other values are ‘boxplot’ or ‘violin’ that specify the type of plot
file_name (string, optional) – Specify the save path of html files. If it is not provided, no file will be saved, by default None
auto_open (bool, optional) – open automatically the plot, by default False
height (int or 'auto') – Plotly figure - layout height
width (int) – Plotly figure - layout width
- Returns
If single instance –
plot – Normalized contribution values of instance and neighbors
If multiple instances –
if distribution == “none”: Mean amplitude of each feature contribution vs. mean variability across neighbors
if distribution == “boxplot”: Distribution of contributions of each feature in instances neighborhoods.
Graph type is box plot * if distribution == “violin”: Distribution of contributions of each feature in instances neighborhoods. Graph type is violin plot
- top_interactions_plot(nb_top_interactions=5, selection=None, violin_maxf=10, max_points=500, width=900, height=600, file_name=None, auto_open=False)[source]¶
Displays a dynamic plot with the nb_top_interactions most important interactions existing between two variables. The most important interactions are determined computing the sum of all absolute shap interactions values between all existing pairs of variables. A button allows to select and display the corresponding features values and their shap contribution values. :param nb_top_interactions: Number of top interactions to display. :type nb_top_interactions: int :param selection: Contains list of index, subset of the input DataFrame that we want to plot :type selection: list (optional) :param violin_maxf: maximum number modality to plot violin. If the feature specified with col argument
has more modalities than violin_maxf, a scatter plot will be choose
- Parameters
max_points (int (optional, default: 500)) – maximum number to plot in contribution plot. if input dataset is bigger than max_points, a sample limits the number of points to plot. nb: you can also limit the number using ‘selection’ parameter.
width (Int (default: 900)) – Plotly figure - layout width
height (Int (default: 600)) – Plotly figure - layout height
file_name (string (optional)) – File name to use to save the plotly bar chart. If None the bar chart will not be saved.
auto_open (Boolean (optional)) – Indicate whether to open the bar plot or not.
- Return type
go.Figure
Example
>>> xpl.plot.top_interactions_plot()
- class shapash.explainer.consistency.Consistency(palette_name='default')[source]¶
Bases:
objectConsistency class
- consistency_plot(selection=None, max_features=20)[source]¶
The Consistency_plot has the main objective of comparing explainability methods.
Because explainability methods are different from each other, they may not give the same explanation to the same instance. Then, which method should be selected? Answering this question is tough. This method compares methods between them and evaluates how close the explanations are from each other. The idea behind this is pretty simple: if underlying assumptions lead to similar results, we would be more confident in using those methods. If not, careful conideration should be taken in the interpretation of the explanations
- Parameters
selection (list) – Contains list of index, subset of the input DataFrame that we use for the compute of consitency statistics, by default None
max_features (int, optional) – Maximum number of displayed features, by default 20