{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Start the Shapash Web App on a sample dataset\n", "\n", "With this tutorial you:
\n", "Understand how shapash works with a simple use case
\n", "Start WebApp to understand your model and save these results\n", "\n", "Contents:\n", "- Build a Regressor\n", "- Compile Shapash SmartExplainer\n", "- Start Shapash WebApp\n", "- Export synt with to_pandas function\n", "- Save Shapash object in pickle file\n", "\n", "Data from Kaggle [House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from category_encoders import OrdinalEncoder\n", "from lightgbm import LGBMRegressor\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.ensemble import ExtraTreesRegressor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Building Supervized Model " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from shapash.data.data_loader import data_loading\n", "house_df, house_dict = data_loading('house_prices')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "y_df=house_df['SalePrice'].to_frame()\n", "X_df=house_df[house_df.columns.difference(['SalePrice'])]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1460, 73)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "house_df.shape" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
12-Story 1946 & NewerResidential Low Density8450PavedRegularNear Flat/LevelAll public Utilities (E,G,W,& S)Inside lotGentle slopeCollege Creek...0000022008Warranty Deed - ConventionalNormal Sale208500
21-Story 1946 & Newer All StylesResidential Low Density9600PavedRegularNear Flat/LevelAll public Utilities (E,G,W,& S)Frontage on 2 sides of propertyGentle slopeVeenker...0000052007Warranty Deed - ConventionalNormal Sale181500
32-Story 1946 & NewerResidential Low Density11250PavedSlightly irregularNear Flat/LevelAll public Utilities (E,G,W,& S)Inside lotGentle slopeCollege Creek...0000092008Warranty Deed - ConventionalNormal Sale223500
42-Story 1945 & OlderResidential Low Density9550PavedSlightly irregularNear Flat/LevelAll public Utilities (E,G,W,& S)Corner lotGentle slopeCrawford...272000022006Warranty Deed - ConventionalAbnormal Sale140000
52-Story 1946 & NewerResidential Low Density14260PavedSlightly irregularNear Flat/LevelAll public Utilities (E,G,W,& S)Frontage on 2 sides of propertyGentle slopeNorthridge...00000122008Warranty Deed - ConventionalNormal Sale250000
\n", "

5 rows × 73 columns

\n", "
" ], "text/plain": [ " MSSubClass MSZoning LotArea Street \\\n", "Id \n", "1 2-Story 1946 & Newer Residential Low Density 8450 Paved \n", "2 1-Story 1946 & Newer All Styles Residential Low Density 9600 Paved \n", "3 2-Story 1946 & Newer Residential Low Density 11250 Paved \n", "4 2-Story 1945 & Older Residential Low Density 9550 Paved \n", "5 2-Story 1946 & Newer Residential Low Density 14260 Paved \n", "\n", " LotShape LandContour Utilities \\\n", "Id \n", "1 Regular Near Flat/Level All public Utilities (E,G,W,& S) \n", "2 Regular Near Flat/Level All public Utilities (E,G,W,& S) \n", "3 Slightly irregular Near Flat/Level All public Utilities (E,G,W,& S) \n", "4 Slightly irregular Near Flat/Level All public Utilities (E,G,W,& S) \n", "5 Slightly irregular Near Flat/Level All public Utilities (E,G,W,& S) \n", "\n", " LotConfig LandSlope Neighborhood ... \\\n", "Id ... \n", "1 Inside lot Gentle slope College Creek ... \n", "2 Frontage on 2 sides of property Gentle slope Veenker ... \n", "3 Inside lot Gentle slope College Creek ... \n", "4 Corner lot Gentle slope Crawford ... \n", "5 Frontage on 2 sides of property Gentle slope Northridge ... \n", "\n", " EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold \\\n", "Id \n", "1 0 0 0 0 0 2 2008 \n", "2 0 0 0 0 0 5 2007 \n", "3 0 0 0 0 0 9 2008 \n", "4 272 0 0 0 0 2 2006 \n", "5 0 0 0 0 0 12 2008 \n", "\n", " SaleType SaleCondition SalePrice \n", "Id \n", "1 Warranty Deed - Conventional Normal Sale 208500 \n", "2 Warranty Deed - Conventional Normal Sale 181500 \n", "3 Warranty Deed - Conventional Normal Sale 223500 \n", "4 Warranty Deed - Conventional Abnormal Sale 140000 \n", "5 Warranty Deed - Conventional Normal Sale 250000 \n", "\n", "[5 rows x 73 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "house_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Encoding Categorical Features " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from category_encoders import OrdinalEncoder\n", "\n", "categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']\n", "\n", "encoder = OrdinalEncoder(\n", " cols=categorical_features,\n", " handle_unknown='ignore',\n", " return_df=True).fit(X_df)\n", "\n", "X_df=encoder.transform(X_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Train / Test Split" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Model Fitting" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "regressor = LGBMRegressor(n_estimators=100).fit(Xtrain,ytrain)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Understanding my model with shapash" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Declare and Compile SmartExplainer " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from shapash import SmartExplainer" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "xpl = SmartExplainer(\n", " model=regressor,\n", " preprocessing=encoder, # Optional: compile step can use inverse_transform method\n", " features_dict=house_dict # optional parameter, specifies label for features name \n", ")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "xpl.compile(x=Xtest,\n", " y_target=ytest # Optional: allows to display True Values vs Predicted Values\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Start WebApp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "app = xpl.run_app(title_story='House Prices', port=8020)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Link to App:
\n", "[shapash-monitor link](https://shapash-demo.ossbymaif.fr/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Stop the WebApp after using it" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "app.kill()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Export local explaination in DataFrame" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "summary_df= xpl.to_pandas(\n", " max_contrib=3, # Number Max of features to show in summary\n", " threshold=5000,\n", ")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
259203102.618265Ground living area square feet179210170.153594Overall material and finish of the house79886.60162NaNNaNNaN
268165504.066858Overall material and finish of the house5-21896.320133Ground living area square feet219216807.388625NaNNaNNaN
289141844.323422Overall material and finish of the house5-20785.923401Ground living area square feet900-10577.685NaNNaNNaN
650116849.365350Overall material and finish of the house4-27677.524884Ground living area square feet630-12140.106966Total square feet of basement area630-7142.980699
1234160989.488908Overall material and finish of the house5-20986.378284Ground living area square feet1188-8761.318312Total square feet of basement area11885591.086195
\n", "
" ], "text/plain": [ " pred feature_1 value_1 \\\n", "259 203102.618265 Ground living area square feet 1792 \n", "268 165504.066858 Overall material and finish of the house 5 \n", "289 141844.323422 Overall material and finish of the house 5 \n", "650 116849.365350 Overall material and finish of the house 4 \n", "1234 160989.488908 Overall material and finish of the house 5 \n", "\n", " contribution_1 feature_2 value_2 \\\n", "259 10170.153594 Overall material and finish of the house 7 \n", "268 -21896.320133 Ground living area square feet 2192 \n", "289 -20785.923401 Ground living area square feet 900 \n", "650 -27677.524884 Ground living area square feet 630 \n", "1234 -20986.378284 Ground living area square feet 1188 \n", "\n", " contribution_2 feature_3 value_3 contribution_3 \n", "259 9886.60162 NaN NaN NaN \n", "268 16807.388625 NaN NaN NaN \n", "289 -10577.685 NaN NaN NaN \n", "650 -12140.106966 Total square feet of basement area 630 -7142.980699 \n", "1234 -8761.318312 Total square feet of basement area 1188 5591.086195 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summary_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Save SmartExplainer in Pickle File\n", "\n", "You can save the SmartExplainer Object in a pickle file to make new plots later or launch the WebApp again" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "xpl.save('./xpl.pkl')" ] } ], "metadata": { "celltoolbar": "Aucun(e)", "hide_input": false, "kernelspec": { "display_name": "shapash_picking", "language": "python", "name": "shapash_picking" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }