{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# From model training to deployment - an introduction to the SmartPredictor object\n",
    "\n",
    "Shapash provides a SmartPredictor Object to make prediction and local explainability for operational needs in deployment context.<br />\n",
    "It gives a summary of the local explanation of your prediction. <br />\n",
    "SmartPredictor allows users to configure the summary to suit their use. <br />\n",
    "It is an object dedicated to deployment, lighter than SmartExplainer Object with additionnal consistency checks. <br />\n",
    "SmartPredictor can be used with an API or in batch mode. <br />\n",
    "\n",
    "This tutorial provides more information to help you getting started with the SmartPredictor Object of Shapash.\n",
    "\n",
    "Contents:\n",
    "- Build a SmartPredictor\n",
    "- Save and Load a Smartpredictor\n",
    "- Add input\n",
    "- Use label and wording\n",
    "- Summarize explaination\n",
    "\n",
    "We used Kaggle's [Titanic](https://www.kaggle.com/c/titanic) dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Exploration and training of the model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we need to import a dataset. Here we chose the famous dataset Titanic from Kaggle."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from category_encoders import OrdinalEncoder\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.model_selection import train_test_split\n",
    "import shap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from shapash.explainer.smart_predictor import SmartPredictor\n",
    "from shapash.utils.load_smartpredictor import load_smartpredictor\n",
    "from shapash.data.data_loader import data_loading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "titan_df, titan_dict = data_loading('titanic')\n",
    "del titan_df['Name']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Title</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>Third class</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.25</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>Mr</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>First class</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.28</td>\n",
       "      <td>Cherbourg</td>\n",
       "      <td>Mrs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>Third class</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.92</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>Miss</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>First class</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.10</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>Mrs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>Third class</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.05</td>\n",
       "      <td>Southampton</td>\n",
       "      <td>Mr</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived       Pclass     Sex   Age  SibSp  Parch   Fare  \\\n",
       "PassengerId                                                             \n",
       "1                   0  Third class    male  22.0      1      0   7.25   \n",
       "2                   1  First class  female  38.0      1      0  71.28   \n",
       "3                   1  Third class  female  26.0      0      0   7.92   \n",
       "4                   1  First class  female  35.0      1      0  53.10   \n",
       "5                   0  Third class    male  35.0      0      0   8.05   \n",
       "\n",
       "                Embarked Title  \n",
       "PassengerId                     \n",
       "1            Southampton    Mr  \n",
       "2              Cherbourg   Mrs  \n",
       "3            Southampton  Miss  \n",
       "4            Southampton   Mrs  \n",
       "5            Southampton    Mr  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titan_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Classification Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, we train a Machine Learning supervized model with our data. In our example, we are confronted to a classification problem."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = titan_df['Survived']\n",
    "X = titan_df.drop('Survived', axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "varcat=['Pclass', 'Sex', 'Embarked', 'Title']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Preprocessing Step"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Encoding Categorical Features "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "categ_encoding = OrdinalEncoder(cols=varcat, \\\n",
    "                                handle_unknown='ignore', \\\n",
    "                                return_df=True).fit(X)\n",
    "X = categ_encoding.transform(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Train Test split + Random Forest fit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RandomForestClassifier(min_samples_leaf=3)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.75, random_state=1)\n",
    "\n",
    "rf = RandomForestClassifier(n_estimators=100, min_samples_leaf=3)\n",
    "rf.fit(Xtrain, ytrain)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "ypred=pd.DataFrame(rf.predict(Xtest), columns=['pred'], index=Xtest.index)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Explore your trained model results Step with SmartExplainer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "from shapash import SmartExplainer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Use Label and Wording"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we use labels and wording to get a more understandable explainabily.\n",
    "- features_dict : allow users to rename features of their datasets\n",
    "- label_dict : allow users in classification problems to rename label predicted\n",
    "- postprocessing : allow users to apply some wording to the features wanted"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "feature_dict = {\n",
    "                'Pclass': 'Ticket class',\n",
    "                 'Sex': 'Sex',\n",
    "                 'Age': 'Age',\n",
    "                 'SibSp': 'Relatives such as brother or wife',\n",
    "                 'Parch': 'Relatives like children or parents',\n",
    "                 'Fare': 'Passenger fare',\n",
    "                 'Embarked': 'Port of embarkation',\n",
    "                 'Title': 'Title of passenger'\n",
    "               }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "label_dict = {0: \"Not Survived\", 1: \"Survived\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "postprocessing = {\"Pclass\": {'type': 'transcoding', 'rule': { 'First class': '1st class', 'Second class': '2nd class', \"Third class\": \"3rd class\"}}}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Define a SmartExplainer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "xpl = SmartExplainer(\n",
    "    model=rf,\n",
    "    preprocessing=categ_encoding,\n",
    "    postprocessing=postprocessing,\n",
    "    label_dict=label_dict, \n",
    "    features_dict=feature_dict\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**compile()<br />** This method is the first step to understand model and prediction. It performs the sorting\n",
    "of contributions, the reverse preprocessing steps and  all the calculations necessary for\n",
    "a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Backend: Shap TreeExplainer\n"
     ]
    }
   ],
   "source": [
    "xpl.compile(x=Xtest, y_pred=ypred)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Understand results of your trained model with SmartExplainer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can easily get a first summary of the explanation of the model results.\n",
    "- We choose to get the 3 most contributive features for each prediction.\n",
    "- We use a wording to get features names more understandable in operationnal case.\n",
    "- We rename the predicted label to show a more explicit prediction.\n",
    "- We apply a post-processing to transform some feature's values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pred</th>\n",
       "      <th>feature_1</th>\n",
       "      <th>value_1</th>\n",
       "      <th>contribution_1</th>\n",
       "      <th>feature_2</th>\n",
       "      <th>value_2</th>\n",
       "      <th>contribution_2</th>\n",
       "      <th>feature_3</th>\n",
       "      <th>value_3</th>\n",
       "      <th>contribution_3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>863</th>\n",
       "      <td>Survived</td>\n",
       "      <td>Title of passenger</td>\n",
       "      <td>Mrs</td>\n",
       "      <td>0.170394</td>\n",
       "      <td>Sex</td>\n",
       "      <td>female</td>\n",
       "      <td>0.168492</td>\n",
       "      <td>Ticket class</td>\n",
       "      <td>1st class</td>\n",
       "      <td>0.110185</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>224</th>\n",
       "      <td>Not Survived</td>\n",
       "      <td>Title of passenger</td>\n",
       "      <td>Mr</td>\n",
       "      <td>0.0913801</td>\n",
       "      <td>Sex</td>\n",
       "      <td>male</td>\n",
       "      <td>0.0835547</td>\n",
       "      <td>Passenger fare</td>\n",
       "      <td>7.9</td>\n",
       "      <td>0.0654677</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>Survived</td>\n",
       "      <td>Title of passenger</td>\n",
       "      <td>Miss</td>\n",
       "      <td>0.204571</td>\n",
       "      <td>Sex</td>\n",
       "      <td>female</td>\n",
       "      <td>0.169965</td>\n",
       "      <td>Ticket class</td>\n",
       "      <td>2nd class</td>\n",
       "      <td>0.102774</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>681</th>\n",
       "      <td>Survived</td>\n",
       "      <td>Title of passenger</td>\n",
       "      <td>Miss</td>\n",
       "      <td>0.193106</td>\n",
       "      <td>Sex</td>\n",
       "      <td>female</td>\n",
       "      <td>0.153766</td>\n",
       "      <td>Port of embarkation</td>\n",
       "      <td>Queenstown</td>\n",
       "      <td>0.13015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>536</th>\n",
       "      <td>Survived</td>\n",
       "      <td>Title of passenger</td>\n",
       "      <td>Miss</td>\n",
       "      <td>0.206245</td>\n",
       "      <td>Ticket class</td>\n",
       "      <td>2nd class</td>\n",
       "      <td>0.128066</td>\n",
       "      <td>Sex</td>\n",
       "      <td>female</td>\n",
       "      <td>0.112756</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             pred           feature_1 value_1 contribution_1     feature_2  \\\n",
       "863      Survived  Title of passenger     Mrs       0.170394           Sex   \n",
       "224  Not Survived  Title of passenger      Mr      0.0913801           Sex   \n",
       "85       Survived  Title of passenger    Miss       0.204571           Sex   \n",
       "681      Survived  Title of passenger    Miss       0.193106           Sex   \n",
       "536      Survived  Title of passenger    Miss       0.206245  Ticket class   \n",
       "\n",
       "       value_2 contribution_2            feature_3     value_3 contribution_3  \n",
       "863     female       0.168492         Ticket class   1st class       0.110185  \n",
       "224       male      0.0835547       Passenger fare         7.9      0.0654677  \n",
       "85      female       0.169965         Ticket class   2nd class       0.102774  \n",
       "681     female       0.153766  Port of embarkation  Queenstown        0.13015  \n",
       "536  2nd class       0.128066                  Sex      female       0.112756  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xpl.to_pandas(max_contrib=3).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: SmartPredictor in production"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**to_smartpredictor()**\n",
    "- It allows users to switch from a SmartExplainer used for data mining to the SmartPredictor.\n",
    "- It keeps the attributes needed for deployment to be lighter than the SmartExplainer object.\n",
    "- Smartpredictor performs additional consistency checks before deployment.\n",
    "- This object is dedicated to the deployment. \n",
    "\n",
    "In this section, we learn how to initialize a SmartPredictor.\n",
    "- It makes new predictions and summarize explainability that you configured  to make it operational to your needs.\n",
    "- SmartPredictor can be used with API or in batch mode.\n",
    "- It handles dataframes and dictionnaries input data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Switch from SmartExplainer Object to SmartPredictor Object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor = xpl.to_smartpredictor()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Save your predictor in Pickle File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.save('./predictor.pkl')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Load your predictor in Pickle File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor_load = load_smartpredictor('./predictor.pkl')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Make a prediction with your SmartPredictor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Once our SmartPredictor has been initialized, we can compute new predictions and explain them.\n",
    "- First, we specify a new dataset which can be a pandas.DataFrame or a dictionnary. (usefull when you decide to use an API in your deployment process)\n",
    "- We use the add_input method of the SmartPredictor. (see the documentation of this method)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Add data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "person_x = {'Pclass': 'First class',\n",
    "             'Sex': 'female',\n",
    "             'Age': 36,\n",
    "             'SibSp': 1,\n",
    "             'Parch': 0,\n",
    "             'Fare': 7.25,\n",
    "             'Embarked': 'Cherbourg',\n",
    "             'Title': 'Miss'\n",
    "           }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor_load.add_input(x=person_x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you don't specify an ypred in the add_input method, SmartPredictor use its predict method to automatically affect the predicted value to ypred."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Make prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's display ypred which has been automatically computed in add_input method. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ypred</th>\n",
       "      <th>proba</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Survived</td>\n",
       "      <td>0.744009</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      ypred     proba\n",
       "0  Survived  0.744009"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predictor_load.data[\"ypred\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The predict_proba method of Smartpredictor computes the probabilties associated to each label."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "prediction_proba = predictor_load.predict_proba()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>class_0</th>\n",
       "      <th>class_1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.255991</td>\n",
       "      <td>0.744009</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    class_0   class_1\n",
       "0  0.255991  0.744009"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prediction_proba"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get detailed explanability associated to the prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- You can use the method detail_contributions for detailed contributions of each of your features for each row of your new dataset.\n",
    "- For classification problems, it automatically associates contributions with the right predicted label.\n",
    "- The predicted label are computed automatically or you can specify an ypred with add_input method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "detailed_contributions = predictor_load.detail_contributions()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The ypred has already been renamed with the value that we've given in the label_dict."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ypred</th>\n",
       "      <th>proba</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Title</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Survived</td>\n",
       "      <td>0.744009</td>\n",
       "      <td>0.0950201</td>\n",
       "      <td>0.153742</td>\n",
       "      <td>-0.0111338</td>\n",
       "      <td>0.0192229</td>\n",
       "      <td>-0.00411547</td>\n",
       "      <td>-0.0879743</td>\n",
       "      <td>0.0316235</td>\n",
       "      <td>0.177685</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      ypred     proba     Pclass       Sex        Age      SibSp       Parch  \\\n",
       "0  Survived  0.744009  0.0950201  0.153742 -0.0111338  0.0192229 -0.00411547   \n",
       "\n",
       "        Fare   Embarked     Title  \n",
       "0 -0.0879743  0.0316235  0.177685  "
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "detailed_contributions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summarize explanability of the predictions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- You can use the summarize method to summarize your local explainability.\n",
    "- This summary can be configured with the modify_mask method to suit your use case.\n",
    "- When you initialize the SmartPredictor, you can also specify  :\n",
    ">- postprocessing: to apply a wording to several values of your dataset.\n",
    ">- label_dict: to rename your label for classification problems.\n",
    ">- features_dict: to rename your features."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use modify_mask method to only get the 4 most contributives features in our local summary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor_load.modify_mask(max_contrib=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "explanation = predictor_load.summarize()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- The dictionnary of mapping given to the SmartExplainer Object allows us to rename the 'Title' feature into 'Title of passenger'. \n",
    "- The value of this features has been worded correctly: 'First class' became '1st class'.\n",
    "- Our explanability is focused on the 4 most contributive features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ypred</th>\n",
       "      <th>proba</th>\n",
       "      <th>feature_1</th>\n",
       "      <th>value_1</th>\n",
       "      <th>contribution_1</th>\n",
       "      <th>feature_2</th>\n",
       "      <th>value_2</th>\n",
       "      <th>contribution_2</th>\n",
       "      <th>feature_3</th>\n",
       "      <th>value_3</th>\n",
       "      <th>contribution_3</th>\n",
       "      <th>feature_4</th>\n",
       "      <th>value_4</th>\n",
       "      <th>contribution_4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Survived</td>\n",
       "      <td>0.744009</td>\n",
       "      <td>Title of passenger</td>\n",
       "      <td>Miss</td>\n",
       "      <td>0.177685</td>\n",
       "      <td>Sex</td>\n",
       "      <td>female</td>\n",
       "      <td>0.153742</td>\n",
       "      <td>Ticket class</td>\n",
       "      <td>1st class</td>\n",
       "      <td>0.0950201</td>\n",
       "      <td>Passenger fare</td>\n",
       "      <td>7.25</td>\n",
       "      <td>-0.0879743</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      ypred     proba           feature_1 value_1 contribution_1 feature_2  \\\n",
       "0  Survived  0.744009  Title of passenger    Miss       0.177685       Sex   \n",
       "\n",
       "  value_2 contribution_2     feature_3    value_3 contribution_3  \\\n",
       "0  female       0.153742  Ticket class  1st class      0.0950201   \n",
       "\n",
       "        feature_4 value_4 contribution_4  \n",
       "0  Passenger fare    7.25     -0.0879743  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "explanation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Classification - choose the predicted value and customize the summary\n",
    "\n",
    "#### Configure summary: define the predicted label\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can change the ypred or the x given in add_input method to make new prediction and summary of your explanability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor_load.add_input(x=person_x, ypred=pd.DataFrame({\"ypred\": [0]}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor_load.modify_mask(max_contrib=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "explanation = predictor_load.summarize()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The displayed contributions and summary adapt to changing the predicted value of y_pred from 1 to 0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ypred</th>\n",
       "      <th>proba</th>\n",
       "      <th>feature_1</th>\n",
       "      <th>value_1</th>\n",
       "      <th>contribution_1</th>\n",
       "      <th>feature_2</th>\n",
       "      <th>value_2</th>\n",
       "      <th>contribution_2</th>\n",
       "      <th>feature_3</th>\n",
       "      <th>value_3</th>\n",
       "      <th>contribution_3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Not Survived</td>\n",
       "      <td>0.255991</td>\n",
       "      <td>Title of passenger</td>\n",
       "      <td>Miss</td>\n",
       "      <td>-0.177685</td>\n",
       "      <td>Sex</td>\n",
       "      <td>female</td>\n",
       "      <td>-0.153742</td>\n",
       "      <td>Ticket class</td>\n",
       "      <td>1st class</td>\n",
       "      <td>-0.0950201</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          ypred     proba           feature_1 value_1 contribution_1  \\\n",
       "0  Not Survived  0.255991  Title of passenger    Miss      -0.177685   \n",
       "\n",
       "  feature_2 value_2 contribution_2     feature_3    value_3 contribution_3  \n",
       "0       Sex  female      -0.153742  Ticket class  1st class     -0.0950201  "
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "explanation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Configure summary: mask one feature, select positives contributions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- The modify_mask method allows us to configure the summary parameters of your explainability.\n",
    "- Here, we hide some features from our explanability and only get the one which has positives contributions. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor_load.modify_mask(features_to_hide=[\"Fare\"], positive=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "explanation = predictor_load.summarize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ypred</th>\n",
       "      <th>proba</th>\n",
       "      <th>feature_1</th>\n",
       "      <th>value_1</th>\n",
       "      <th>contribution_1</th>\n",
       "      <th>feature_2</th>\n",
       "      <th>value_2</th>\n",
       "      <th>contribution_2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Not Survived</td>\n",
       "      <td>0.255991</td>\n",
       "      <td>Age</td>\n",
       "      <td>36</td>\n",
       "      <td>0.0111338</td>\n",
       "      <td>Relatives like children or parents</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00411547</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          ypred     proba feature_1 value_1 contribution_1  \\\n",
       "0  Not Survived  0.255991       Age      36      0.0111338   \n",
       "\n",
       "                            feature_2 value_2 contribution_2  \n",
       "0  Relatives like children or parents       0     0.00411547  "
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "explanation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Configure summary: the threshold parameter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We display features which has contributions greater than 0.01."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor_load.modify_mask(threshold=0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "explanation = predictor_load.summarize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ypred</th>\n",
       "      <th>proba</th>\n",
       "      <th>feature_1</th>\n",
       "      <th>value_1</th>\n",
       "      <th>contribution_1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Not Survived</td>\n",
       "      <td>0.255991</td>\n",
       "      <td>Age</td>\n",
       "      <td>36</td>\n",
       "      <td>0.0111338</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          ypred     proba feature_1 value_1 contribution_1\n",
       "0  Not Survived  0.255991       Age      36      0.0111338"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "explanation"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Aucun(e)",
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}