PSI - Issue 73
Lenganji Simwanda et al. / Procedia Structural Integrity 73 (2025) 138–145 Simwanda et al. / Structural Integrity Procedia 00 (2025) 000–000
141
4
models to infer such effects from the inputs. Solar angles were included to capture daily and seasonal patterns, though partly redundant with radiation fluxes. Feature importance analysis led to the removal of low-impact variables like wind direction. Most features were continuous; categorical handling was unnecessary, as the models could make distinction between snow and rain precipitation from temperature. Despite the zero-inflated target (roof SWE often being zero), ensemble trees handled this naturally without resampling or a two-stage model. 3.2. Training and validation strategy and Model interpretation To evaluate model performance, the dataset was randomly split into 80% training and 20% testing (25 and 6 years of data, respectively), ensuring a representative mix of conditions across the observed period. This random split assessed generalization under the assumption of climate stationarity. No test data was used during training. Hyperparameters for each model were tuned using Bayesian optimization with 5-fold cross-validation to maximize validation coefficient of determination (R 2 ) (Turner et al., 2021). Key parameters included tree count, depth, learning rate, and subsampling. Early stopping (100 rounds) was used for boosting models to avoid overfitting. Final models were retrained on the full training set, and performance was evaluated using R 2 , root mean square error (RMSE), and mean absolute error (MAE), on the held-out test set. We also assessed feature importance to interpret the models. We employed SHAP (Shapley Additive Explanations) values to rank features by their influence on the predictions. SHAP values provide a consistent way to quantify how much each feature contributes to a particular prediction, and by averaging these, we get a global importance ranking (Lundberg and Lee, 2017). This helps ensure the model’s behaviours align with physical effects. 4. Results and Discussion 4.1. Overall Prediction Performance All four machine learning models—RF, GBM, CatBoost, and XGBoost—exhibited excellent predictive performance for estimating roof snow load under U = 1.0 W/m²K conditions (Table 2). On the independent test set, all models achieved R² values above 0.995, with XGBoost and CatBoost leading slightly (R² ≈ 0.997). Root mean square error (RMSE) values ranged from 0.045 to 0.060 kN/m², while mean absolute errors (MAE) were between 0.010 and 0.015 kN/m² —indicating extremely low deviation relative to the observed roof load range (up to ~2.3 kN/m²). The models not only reproduced magnitudes accurately but also captured snow load dynamics (see Fig. 1), including accumulation during snowfall and reduction during melt events. They maintained near-zero error in snow free periods and correctly tracked seasonal trends. Slight underprediction of peak snow loads was seen with the RF model, likely because it doesn't include the extra training layers (called "boosting") that help improve accuracy for extreme values. Overall, model generalization was strong, with minimal overfitting and near-identical performance across train and test sets. This confirms that the models successfully learned the underlying physical relationships governing snow accumulation and melt, supported by high-volume, high-quality climate data inputs. Table 2. Performance of ML models in predicting roof snow load (U = 1.0, no sliding) on independent test set. Model R 2 RMSE (kN/m²) MAE (kN/m²)
RF
0.997
0.050
0.012
GBM
0.996
0.060
0.015
CatBoost
0.997
0.048
0.011
XGBoost
0.997
0.045
0.010
Made with FlippingBook - Online Brochure Maker