Start the Shapash Web App on a sample dataset¶
With this tutorial you: Understand how shapash works with a simple use case Start WebApp to understand your model and save these results
Contents: - Build a Regressor - Compile Shapash SmartExplainer - Start Shapash WebApp - Export synt with to_pandas function - Save Shapash object in pickle file
Data from Kaggle House Prices
[1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesRegressor
Building Supervized Model¶
[2]:
from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')
[3]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]
[4]:
house_df.shape
[4]:
(1460, 73)
[5]:
house_df.head()
[5]:
MSSubClass | MSZoning | LotArea | Street | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | ... | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Id | |||||||||||||||||||||
1 | 2-Story 1946 & Newer | Residential Low Density | 8450 | Paved | Regular | Near Flat/Level | All public Utilities (E,G,W,& S) | Inside lot | Gentle slope | College Creek | ... | 0 | 0 | 0 | 0 | 0 | 2 | 2008 | Warranty Deed - Conventional | Normal Sale | 208500 |
2 | 1-Story 1946 & Newer All Styles | Residential Low Density | 9600 | Paved | Regular | Near Flat/Level | All public Utilities (E,G,W,& S) | Frontage on 2 sides of property | Gentle slope | Veenker | ... | 0 | 0 | 0 | 0 | 0 | 5 | 2007 | Warranty Deed - Conventional | Normal Sale | 181500 |
3 | 2-Story 1946 & Newer | Residential Low Density | 11250 | Paved | Slightly irregular | Near Flat/Level | All public Utilities (E,G,W,& S) | Inside lot | Gentle slope | College Creek | ... | 0 | 0 | 0 | 0 | 0 | 9 | 2008 | Warranty Deed - Conventional | Normal Sale | 223500 |
4 | 2-Story 1945 & Older | Residential Low Density | 9550 | Paved | Slightly irregular | Near Flat/Level | All public Utilities (E,G,W,& S) | Corner lot | Gentle slope | Crawford | ... | 272 | 0 | 0 | 0 | 0 | 2 | 2006 | Warranty Deed - Conventional | Abnormal Sale | 140000 |
5 | 2-Story 1946 & Newer | Residential Low Density | 14260 | Paved | Slightly irregular | Near Flat/Level | All public Utilities (E,G,W,& S) | Frontage on 2 sides of property | Gentle slope | Northridge | ... | 0 | 0 | 0 | 0 | 0 | 12 | 2008 | Warranty Deed - Conventional | Normal Sale | 250000 |
5 rows × 73 columns
Encoding Categorical Features¶
[ ]:
from category_encoders import OrdinalEncoder
categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']
encoder = OrdinalEncoder(
cols=categorical_features,
handle_unknown='ignore',
return_df=True).fit(X_df)
X_df=encoder.transform(X_df)
Train / Test Split¶
[7]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)
Model Fitting¶
[8]:
regressor = LGBMRegressor(n_estimators=100).fit(Xtrain,ytrain)
Understanding my model with shapash¶
Declare and Compile SmartExplainer¶
[9]:
from shapash import SmartExplainer
[10]:
xpl = SmartExplainer(
model=regressor,
preprocessing=encoder, # Optional: compile step can use inverse_transform method
features_dict=house_dict # optional parameter, specifies label for features name
)
[11]:
xpl.compile(x=Xtest,
y_target=ytest # Optional: allows to display True Values vs Predicted Values
)
Start WebApp¶
[ ]:
app = xpl.run_app(title_story='House Prices', port=8020)
Link to App: shapash-monitor link
Stop the WebApp after using it¶
[13]:
app.kill()
Export local explaination in DataFrame¶
[15]:
summary_df= xpl.to_pandas(
max_contrib=3, # Number Max of features to show in summary
threshold=5000,
)
[16]:
summary_df.head()
[16]:
pred | feature_1 | value_1 | contribution_1 | feature_2 | value_2 | contribution_2 | feature_3 | value_3 | contribution_3 | |
---|---|---|---|---|---|---|---|---|---|---|
259 | 203102.618265 | Ground living area square feet | 1792 | 10170.153594 | Overall material and finish of the house | 7 | 9886.60162 | NaN | NaN | NaN |
268 | 165504.066858 | Overall material and finish of the house | 5 | -21896.320133 | Ground living area square feet | 2192 | 16807.388625 | NaN | NaN | NaN |
289 | 141844.323422 | Overall material and finish of the house | 5 | -20785.923401 | Ground living area square feet | 900 | -10577.685 | NaN | NaN | NaN |
650 | 116849.365350 | Overall material and finish of the house | 4 | -27677.524884 | Ground living area square feet | 630 | -12140.106966 | Total square feet of basement area | 630 | -7142.980699 |
1234 | 160989.488908 | Overall material and finish of the house | 5 | -20986.378284 | Ground living area square feet | 1188 | -8761.318312 | Total square feet of basement area | 1188 | 5591.086195 |
Save SmartExplainer in Pickle File¶
You can save the SmartExplainer Object in a pickle file to make new plots later or launch the WebApp again
[17]:
xpl.save('./xpl.pkl')