{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring and Visualizing Data Distributions\n", "\n", "This tutorial demonstrates how to use Shapash to explore and visualize feature distributions in a dataset. By analyzing distributions, we gain a better understanding of the data, identify patterns, and spot potential issues such as outliers or imbalances.\n", "\n", "We will use the Kaggle [Titanic dataset](https://www.kaggle.com/c/titanic/data)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from sklearn.ensemble import ExtraTreesClassifier\n", "from sklearn.model_selection import train_test_split\n", "from shapash.data.data_loader import data_loading\n", "from category_encoders import OrdinalEncoder\n", "from shapash import SmartExplainer\n", "from shapash.plots.plot_correlations import plot_correlations\n", "from shapash.plots.plot_univariate import plot_distribution\n", "from shapash.plots.plot_evaluation_metrics import plot_confusion_matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Titanic Dataset\n", "We start by loading and preprocessing the Titanic dataset, which contains passenger information. The target variable is `Pclass`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Survived | \n", "Pclass | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Fare | \n", "Embarked | \n", "Title | \n", "
|---|---|---|---|---|---|---|---|---|---|
| PassengerId | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| 1 | \n", "0 | \n", "Third class | \n", "male | \n", "22.0 | \n", "1 | \n", "0 | \n", "7.25 | \n", "Southampton | \n", "Mr | \n", "
| 2 | \n", "1 | \n", "First class | \n", "female | \n", "38.0 | \n", "1 | \n", "0 | \n", "71.28 | \n", "Cherbourg | \n", "Mrs | \n", "
| 3 | \n", "1 | \n", "Third class | \n", "female | \n", "26.0 | \n", "0 | \n", "0 | \n", "7.92 | \n", "Southampton | \n", "Miss | \n", "
| 4 | \n", "1 | \n", "First class | \n", "female | \n", "35.0 | \n", "1 | \n", "0 | \n", "53.10 | \n", "Southampton | \n", "Mrs | \n", "
| 5 | \n", "0 | \n", "Third class | \n", "male | \n", "35.0 | \n", "0 | \n", "0 | \n", "8.05 | \n", "Southampton | \n", "Mr | \n", "
ExtraTreesClassifier(n_estimators=200, random_state=7)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
ExtraTreesClassifier(n_estimators=200, random_state=7)