{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Shapash model in production - Overview\n", "\n", "With this tutorial you:
\n", "Understand how to create a Shapash SmartPredictor to make prediction and have local explanation in production\n", "with a simple use case.
\n", "\n", "This tutorial describes the different steps from training the model to Shapash SmartPredictor deployment.\n", "A more detailed tutorial allows you to know more about the SmartPredictor Object.\n", "\n", "Contents:\n", "- Build a Regressor\n", "- Compile Shapash SmartExplainer\n", "- From Shapash SmartExplainer to SmartPredictor\n", "- Save Shapash Smartpredictor Object in pickle file\n", "- Make a prediction\n", "\n", "Data from Kaggle [House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from category_encoders import OrdinalEncoder\n", "from lightgbm import LGBMRegressor\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 : Exploration and training of the model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Building Supervized Model " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this section, we train a Machine Learning supervized model with our data House Prices." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from shapash.data.data_loader import data_loading\n", "house_df, house_dict = data_loading('house_prices')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "y_df=house_df['SalePrice'].to_frame()\n", "X_df=house_df[house_df.columns.difference(['SalePrice'])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Preprocessing step " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Encoding Categorical Features" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from category_encoders import OrdinalEncoder\n", "\n", "categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']\n", "\n", "encoder = OrdinalEncoder(cols=categorical_features,\n", " handle_unknown='ignore',\n", " return_df=True).fit(X_df)\n", "\n", "X_encoded=encoder.transform(X_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Train / Test Split" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "Xtrain, Xtest, ytrain, ytest = train_test_split(X_encoded, y_df, train_size=0.75, random_state=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Model Fitting" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "regressor = LGBMRegressor(n_estimators=200).fit(Xtrain, ytrain)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "y_pred = pd.DataFrame(regressor.predict(Xtest), columns=['pred'], index=Xtest.index)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Understand my model with shapash" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this section, we use the SmartExplainer Object from shapash.\n", "- It allows users to understand how the model works with the specified data. \n", "- This object must be used only for data mining step. Shapash provides another object for deployment.\n", "- In this tutorial, we are not exploring possibilites of the SmartExplainer but others will. (see other tutorials)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Declare and Compile SmartExplainer " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from shapash import SmartExplainer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Use wording on features names to better understanding results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we use a wording to rename our features label with more understandable terms. It's usefull to make our local explainability more operational and understandable for users.\n", "- To do this, we use the house_dict dictionary which maps a description to each features.\n", "- We can then use it features_dict as a parameter of the SmartExplainer." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "xpl = SmartExplainer(\n", " model=regressor,\n", " preprocessing=encoder, # Optional: compile step can use inverse_transform method\n", " features_dict=house_dict\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**compile()
** This method is the first step to understand model and prediction.
It performs the sorting\n", "of contributions, the reverse preprocessing steps and all the calculations necessary for\n", "a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Backend: Shap TreeExplainer\n" ] } ], "source": [ "xpl.compile(x=Xtest,\n", " y_pred=y_pred,\n", " y_target=ytest, # Optional: allows to display True Values vs Predicted Values\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Understand results of your trained model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we can easily get a first summary of the explanation of the model results.\n", "- Here, we chose to get the 3 most contributive features for each prediction.\n", "- We used a wording to get features names more understandable in operationnal case." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
predfeature_1value_1contribution_1feature_2value_2contribution_2feature_3value_3contribution_3
259209141.256921Ground living area square feet179213710.4Overall material and finish of the house712776.3Total square feet of basement area963-5103.03
268178734.474531Ground living area square feet219229747Overall material and finish of the house5-26151.3Overall condition of the house89190.84
289113950.844570Overall material and finish of the house5-24730Ground living area square feet900-16342.6Total square feet of basement area882-5922.64
65074957.162142Overall material and finish of the house4-33927.7Ground living area square feet630-23234.4Total square feet of basement area630-11687.9
1234135305.243500Overall material and finish of the house5-25445.7Ground living area square feet1188-11476.6Condition of saleAbnormal Sale-5071.82
\n", "
" ], "text/plain": [ " pred feature_1 value_1 \\\n", "259 209141.256921 Ground living area square feet 1792 \n", "268 178734.474531 Ground living area square feet 2192 \n", "289 113950.844570 Overall material and finish of the house 5 \n", "650 74957.162142 Overall material and finish of the house 4 \n", "1234 135305.243500 Overall material and finish of the house 5 \n", "\n", " contribution_1 feature_2 value_2 \\\n", "259 13710.4 Overall material and finish of the house 7 \n", "268 29747 Overall material and finish of the house 5 \n", "289 -24730 Ground living area square feet 900 \n", "650 -33927.7 Ground living area square feet 630 \n", "1234 -25445.7 Ground living area square feet 1188 \n", "\n", " contribution_2 feature_3 value_3 \\\n", "259 12776.3 Total square feet of basement area 963 \n", "268 -26151.3 Overall condition of the house 8 \n", "289 -16342.6 Total square feet of basement area 882 \n", "650 -23234.4 Total square feet of basement area 630 \n", "1234 -11476.6 Condition of sale Abnormal Sale \n", "\n", " contribution_3 \n", "259 -5103.03 \n", "268 9190.84 \n", "289 -5922.64 \n", "650 -11687.9 \n", "1234 -5071.82 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xpl.to_pandas(max_contrib=3).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 : SmartPredictor in production" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Switch from SmartExplainer to SmartPredictor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you are satisfied by your results and the explainablity given by Shapash, you can use the SmartPredictor object for deployment. \n", "- In this section, we learn how to easily switch from SmartExplainer to a SmartPredictor.\n", "- SmartPredictor allows you to make predictions, detail and summarize contributions on new data automatically.\n", "- It only keeps the attributes needed for deployment to be lighter than the SmartExplainer object. \n", "- SmartPredictor performs additional consistency checks before deployment.\n", "- SmartPredictor allows you to configure the way of summary to suit your use cases.\n", "- It can be used with API or in batch mode." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "predictor = xpl.to_smartpredictor()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Save and Load your SmartPredictor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can easily save and load your SmartPredictor Object in pickle." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Save your SmartPredictor in Pickle File" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "predictor.save('./predictor.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Load your SmartPredictor in Pickle File" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "from shapash.utils.load_smartpredictor import load_smartpredictor" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "predictor_load = load_smartpredictor('./predictor.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make a prediction with your SmartPredictor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor.\n", "- The add_input method is the first step to add a dataset for prediction and explainability.\n", "- It checks the structure of the dataset, the prediction and the contribution if specified. \n", "- It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation of this method)\n", "- In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Add data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The x input in add_input method doesn't have to be encoded, add_input applies preprocessing." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "predictor_load.add_input(x=X_df, ypred=y_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Make prediction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we can see ypred is the one given in add_input method by checking the attribute data[\"ypred\"]. If not specified, it will automatically be computed in the method. " ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SalePrice
Id
1208500
2181500
3223500
4140000
5250000
\n", "
" ], "text/plain": [ " SalePrice\n", "Id \n", "1 208500\n", "2 181500\n", "3 223500\n", "4 140000\n", "5 250000" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "predictor_load.data[\"ypred\"].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Get detailed explanability associated to the prediction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset.\n", "- For classification problems, it automatically associates contributions with the right predicted label. \n", "- The predicted label can be computed automatically in the method or you can specify an ypred with add_input method." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "detailed_contributions = predictor_load.detail_contributions()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SalePrice1stFlrSF2ndFlrSF3SsnPorchBedroomAbvGrBldgTypeBsmtCondBsmtExposureBsmtFinSF1BsmtFinSF2...SaleTypeScreenPorchStreetTotRmsAbvGrdTotalBsmtSFUtilitiesWoodDeckSFYearBuiltYearRemodAddYrSold
Id
1208500-1104.9941761281.4458560.0375.67966112.259902157.224629-233.025420-738.445396-59.294761...-104.645827-351.6211160.0-498.228775-5165.5034760.0-944.0400923870.9616812219.31376117.478037
21815002249.403962-655.8611670.0123.907278-9.270166139.4318602699.2475065102.469936-84.771341...-153.842142-236.5268620.0-705.1129932988.9812790.02090.785074323.902986-3861.776078424.382977
3223500-1426.795115-616.1131120.0369.5369579.210944199.2137261032.288162-92.179454-93.169310...-91.178667-280.8324510.0-324.734175-5338.3405970.0-777.7467433837.7611022192.921648-98.965041
4140000-653.873832121.4598650.0307.6778929.720006252.786934-530.156452-2987.649814-77.039912...-114.608224-338.4356990.0-635.065828-6548.4538640.0-974.503140-3386.361210-5232.5378391633.763619
5250000-9531.577733-1097.6207880.0-1574.9883237.453569130.470247623.939546-2396.572526-92.929525...-481.118248-366.2500070.0-4733.603060-4675.7067620.0165.6534552334.6520631355.358932-395.126541
\n", "

5 rows × 73 columns

\n", "
" ], "text/plain": [ " SalePrice 1stFlrSF 2ndFlrSF 3SsnPorch BedroomAbvGr BldgType \\\n", "Id \n", "1 208500 -1104.994176 1281.445856 0.0 375.679661 12.259902 \n", "2 181500 2249.403962 -655.861167 0.0 123.907278 -9.270166 \n", "3 223500 -1426.795115 -616.113112 0.0 369.536957 9.210944 \n", "4 140000 -653.873832 121.459865 0.0 307.677892 9.720006 \n", "5 250000 -9531.577733 -1097.620788 0.0 -1574.988323 7.453569 \n", "\n", " BsmtCond BsmtExposure BsmtFinSF1 BsmtFinSF2 ... SaleType \\\n", "Id ... \n", "1 157.224629 -233.025420 -738.445396 -59.294761 ... -104.645827 \n", "2 139.431860 2699.247506 5102.469936 -84.771341 ... -153.842142 \n", "3 199.213726 1032.288162 -92.179454 -93.169310 ... -91.178667 \n", "4 252.786934 -530.156452 -2987.649814 -77.039912 ... -114.608224 \n", "5 130.470247 623.939546 -2396.572526 -92.929525 ... -481.118248 \n", "\n", " ScreenPorch Street TotRmsAbvGrd TotalBsmtSF Utilities WoodDeckSF \\\n", "Id \n", "1 -351.621116 0.0 -498.228775 -5165.503476 0.0 -944.040092 \n", "2 -236.526862 0.0 -705.112993 2988.981279 0.0 2090.785074 \n", "3 -280.832451 0.0 -324.734175 -5338.340597 0.0 -777.746743 \n", "4 -338.435699 0.0 -635.065828 -6548.453864 0.0 -974.503140 \n", "5 -366.250007 0.0 -4733.603060 -4675.706762 0.0 165.653455 \n", "\n", " YearBuilt YearRemodAdd YrSold \n", "Id \n", "1 3870.961681 2219.313761 17.478037 \n", "2 323.902986 -3861.776078 424.382977 \n", "3 3837.761102 2192.921648 -98.965041 \n", "4 -3386.361210 -5232.537839 1633.763619 \n", "5 2334.652063 1355.358932 -395.126541 \n", "\n", "[5 rows x 73 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "detailed_contributions.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summarize explanability of the predictions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- You can use the summarize method to summarize your local explainability\n", "- This summary can be configured with modify_mask method so that you have explainability that meets your operational needs.\n", "- When you initialize the SmartPredictor, you can also specify :\n", ">- postprocessing: to apply a wording to several values of your dataset.\n", ">- label_dict: to rename your label for classification problems.\n", ">- features_dict: to rename your features." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "predictor_load.modify_mask(max_contrib=3)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "explanation = predictor_load.summarize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, here, we chose to build a summary with 3 most contributive features of your dataset.\n", "- As you can see below, the wording defined in the first step of this tutorial has been kept by the SmartPredictor and used in the summarize method. " ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SalePricefeature_1value_1contribution_1feature_2value_2contribution_2feature_3value_3contribution_3
1208500Overall material and finish of the house78248.82Total square feet of basement area856-5165.5Original construction date20033870.96
2181500Overall material and finish of the house6-14419.4Ground living area square feet1262-9238.07Overall condition of the house86371.61
3223500Ground living area square feet178615880.4Overall material and finish of the house79651.28Size of garage in square feet6086259.46
4140000Total square feet of basement area756-6548.45Remodel date1970-5232.54Size of garage in square feet6424384.29
5250000Overall material and finish of the house855722.1Ground living area square feet219817176.5Size of garage in square feet83614907.7
\n", "
" ], "text/plain": [ " SalePrice feature_1 value_1 contribution_1 \\\n", "1 208500 Overall material and finish of the house 7 8248.82 \n", "2 181500 Overall material and finish of the house 6 -14419.4 \n", "3 223500 Ground living area square feet 1786 15880.4 \n", "4 140000 Total square feet of basement area 756 -6548.45 \n", "5 250000 Overall material and finish of the house 8 55722.1 \n", "\n", " feature_2 value_2 contribution_2 \\\n", "1 Total square feet of basement area 856 -5165.5 \n", "2 Ground living area square feet 1262 -9238.07 \n", "3 Overall material and finish of the house 7 9651.28 \n", "4 Remodel date 1970 -5232.54 \n", "5 Ground living area square feet 2198 17176.5 \n", "\n", " feature_3 value_3 contribution_3 \n", "1 Original construction date 2003 3870.96 \n", "2 Overall condition of the house 8 6371.61 \n", "3 Size of garage in square feet 608 6259.46 \n", "4 Size of garage in square feet 642 4384.29 \n", "5 Size of garage in square feet 836 14907.7 " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "explanation.head()" ] } ], "metadata": { "celltoolbar": "Aucun(e)", "hide_input": false, "kernelspec": { "display_name": "Python 3.9.13", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } }, "vscode": { "interpreter": { "hash": "6dbaec60c0b0d722a3fa908c2fd7b738d946da6332c67fea5eea602801fdaf43" } } }, "nbformat": 4, "nbformat_minor": 4 }