Start the Shapash Web App on a sample dataset

With this tutorial you: Understand how shapash works with a simple use case Start WebApp to understand your model and save these results

Contents: - Build a Regressor - Compile Shapash SmartExplainer - Start Shapash WebApp - Export synt with to_pandas function - Save Shapash object in pickle file

Data from Kaggle House Prices

[1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesRegressor

Building Supervized Model

[2]:
from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')
[3]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]
[4]:
house_df.shape
[4]:
(1460, 73)
[5]:
house_df.head()
[5]:
MSSubClass MSZoning LotArea Street LotShape LandContour Utilities LotConfig LandSlope Neighborhood ... EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold SaleType SaleCondition SalePrice
Id
1 2-Story 1946 & Newer Residential Low Density 8450 Paved Regular Near Flat/Level All public Utilities (E,G,W,& S) Inside lot Gentle slope College Creek ... 0 0 0 0 0 2 2008 Warranty Deed - Conventional Normal Sale 208500
2 1-Story 1946 & Newer All Styles Residential Low Density 9600 Paved Regular Near Flat/Level All public Utilities (E,G,W,& S) Frontage on 2 sides of property Gentle slope Veenker ... 0 0 0 0 0 5 2007 Warranty Deed - Conventional Normal Sale 181500
3 2-Story 1946 & Newer Residential Low Density 11250 Paved Slightly irregular Near Flat/Level All public Utilities (E,G,W,& S) Inside lot Gentle slope College Creek ... 0 0 0 0 0 9 2008 Warranty Deed - Conventional Normal Sale 223500
4 2-Story 1945 & Older Residential Low Density 9550 Paved Slightly irregular Near Flat/Level All public Utilities (E,G,W,& S) Corner lot Gentle slope Crawford ... 272 0 0 0 0 2 2006 Warranty Deed - Conventional Abnormal Sale 140000
5 2-Story 1946 & Newer Residential Low Density 14260 Paved Slightly irregular Near Flat/Level All public Utilities (E,G,W,& S) Frontage on 2 sides of property Gentle slope Northridge ... 0 0 0 0 0 12 2008 Warranty Deed - Conventional Normal Sale 250000

5 rows × 73 columns

Encoding Categorical Features

[ ]:
from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(
    cols=categorical_features,
    handle_unknown='ignore',
    return_df=True).fit(X_df)

X_df=encoder.transform(X_df)

Train / Test Split

[7]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)

Model Fitting

[8]:
regressor = LGBMRegressor(n_estimators=100).fit(Xtrain,ytrain)

Understanding my model with shapash

Declare and Compile SmartExplainer

[9]:
from shapash import SmartExplainer
[10]:
xpl = SmartExplainer(
    model=regressor,
    preprocessing=encoder,   # Optional: compile step can use inverse_transform method
    features_dict=house_dict # optional parameter, specifies label for features name
)
[11]:
xpl.compile(x=Xtest,
            y_target=ytest # Optional: allows to display True Values vs Predicted Values
           )

Start WebApp

[ ]:
app = xpl.run_app(title_story='House Prices', port=8020)

Link to App: shapash-monitor link

Stop the WebApp after using it

[13]:
app.kill()

Export local explaination in DataFrame

[15]:
summary_df= xpl.to_pandas(
    max_contrib=3, # Number Max of features to show in summary
    threshold=5000,
)
[16]:
summary_df.head()
[16]:
pred feature_1 value_1 contribution_1 feature_2 value_2 contribution_2 feature_3 value_3 contribution_3
259 203102.618265 Ground living area square feet 1792 10170.153594 Overall material and finish of the house 7 9886.60162 NaN NaN NaN
268 165504.066858 Overall material and finish of the house 5 -21896.320133 Ground living area square feet 2192 16807.388625 NaN NaN NaN
289 141844.323422 Overall material and finish of the house 5 -20785.923401 Ground living area square feet 900 -10577.685 NaN NaN NaN
650 116849.365350 Overall material and finish of the house 4 -27677.524884 Ground living area square feet 630 -12140.106966 Total square feet of basement area 630 -7142.980699
1234 160989.488908 Overall material and finish of the house 5 -20986.378284 Ground living area square feet 1188 -8761.318312 Total square feet of basement area 1188 5591.086195

Save SmartExplainer in Pickle File

You can save the SmartExplainer Object in a pickle file to make new plots later or launch the WebApp again

[17]:
xpl.save('./xpl.pkl')