{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Start the Shapash Web App on a sample dataset\n",
"\n",
"With this tutorial you:
\n",
"Understand how shapash works with a simple use case
\n",
"Start WebApp to understand your model and save these results\n",
"\n",
"Contents:\n",
"- Build a Regressor\n",
"- Compile Shapash SmartExplainer\n",
"- Start Shapash WebApp\n",
"- Export synt with to_pandas function\n",
"- Save Shapash object in pickle file\n",
"\n",
"Data from Kaggle [House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from category_encoders import OrdinalEncoder\n",
"from lightgbm import LGBMRegressor\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.ensemble import ExtraTreesRegressor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building Supervized Model "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from shapash.data.data_loader import data_loading\n",
"house_df, house_dict = data_loading('house_prices')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"y_df=house_df['SalePrice'].to_frame()\n",
"X_df=house_df[house_df.columns.difference(['SalePrice'])]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1460, 73)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"house_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | MSSubClass | \n", "MSZoning | \n", "LotArea | \n", "Street | \n", "LotShape | \n", "LandContour | \n", "Utilities | \n", "LotConfig | \n", "LandSlope | \n", "Neighborhood | \n", "... | \n", "EnclosedPorch | \n", "3SsnPorch | \n", "ScreenPorch | \n", "PoolArea | \n", "MiscVal | \n", "MoSold | \n", "YrSold | \n", "SaleType | \n", "SaleCondition | \n", "SalePrice | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
1 | \n", "2-Story 1946 & Newer | \n", "Residential Low Density | \n", "8450 | \n", "Paved | \n", "Regular | \n", "Near Flat/Level | \n", "All public Utilities (E,G,W,& S) | \n", "Inside lot | \n", "Gentle slope | \n", "College Creek | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "2 | \n", "2008 | \n", "Warranty Deed - Conventional | \n", "Normal Sale | \n", "208500 | \n", "
2 | \n", "1-Story 1946 & Newer All Styles | \n", "Residential Low Density | \n", "9600 | \n", "Paved | \n", "Regular | \n", "Near Flat/Level | \n", "All public Utilities (E,G,W,& S) | \n", "Frontage on 2 sides of property | \n", "Gentle slope | \n", "Veenker | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "5 | \n", "2007 | \n", "Warranty Deed - Conventional | \n", "Normal Sale | \n", "181500 | \n", "
3 | \n", "2-Story 1946 & Newer | \n", "Residential Low Density | \n", "11250 | \n", "Paved | \n", "Slightly irregular | \n", "Near Flat/Level | \n", "All public Utilities (E,G,W,& S) | \n", "Inside lot | \n", "Gentle slope | \n", "College Creek | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "9 | \n", "2008 | \n", "Warranty Deed - Conventional | \n", "Normal Sale | \n", "223500 | \n", "
4 | \n", "2-Story 1945 & Older | \n", "Residential Low Density | \n", "9550 | \n", "Paved | \n", "Slightly irregular | \n", "Near Flat/Level | \n", "All public Utilities (E,G,W,& S) | \n", "Corner lot | \n", "Gentle slope | \n", "Crawford | \n", "... | \n", "272 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "2 | \n", "2006 | \n", "Warranty Deed - Conventional | \n", "Abnormal Sale | \n", "140000 | \n", "
5 | \n", "2-Story 1946 & Newer | \n", "Residential Low Density | \n", "14260 | \n", "Paved | \n", "Slightly irregular | \n", "Near Flat/Level | \n", "All public Utilities (E,G,W,& S) | \n", "Frontage on 2 sides of property | \n", "Gentle slope | \n", "Northridge | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "12 | \n", "2008 | \n", "Warranty Deed - Conventional | \n", "Normal Sale | \n", "250000 | \n", "
5 rows × 73 columns
\n", "\n", " | pred | \n", "feature_1 | \n", "value_1 | \n", "contribution_1 | \n", "feature_2 | \n", "value_2 | \n", "contribution_2 | \n", "feature_3 | \n", "value_3 | \n", "contribution_3 | \n", "
---|---|---|---|---|---|---|---|---|---|---|
259 | \n", "203102.618265 | \n", "Ground living area square feet | \n", "1792 | \n", "10170.153594 | \n", "Overall material and finish of the house | \n", "7 | \n", "9886.60162 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
268 | \n", "165504.066858 | \n", "Overall material and finish of the house | \n", "5 | \n", "-21896.320133 | \n", "Ground living area square feet | \n", "2192 | \n", "16807.388625 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
289 | \n", "141844.323422 | \n", "Overall material and finish of the house | \n", "5 | \n", "-20785.923401 | \n", "Ground living area square feet | \n", "900 | \n", "-10577.685 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
650 | \n", "116849.365350 | \n", "Overall material and finish of the house | \n", "4 | \n", "-27677.524884 | \n", "Ground living area square feet | \n", "630 | \n", "-12140.106966 | \n", "Total square feet of basement area | \n", "630 | \n", "-7142.980699 | \n", "
1234 | \n", "160989.488908 | \n", "Overall material and finish of the house | \n", "5 | \n", "-20986.378284 | \n", "Ground living area square feet | \n", "1188 | \n", "-8761.318312 | \n", "Total square feet of basement area | \n", "1188 | \n", "5591.086195 | \n", "