PSI - Issue 78

Nicola Di Battista et al. / Procedia Structural Integrity 78 (2026) 412–417

415

Table 1. Scoring system for the nine structural vulnerability features used to compute the global index for masonry buildings. Each entry shows the assigned score for low / medium / high vulnerability levels. Vulnerability feature Score (low / medium / high)

Quality of masonry

0 / 5 / 10 0 / 1 / 3 0 / 0 / 2 0 / 1 / 2 0 / 2 / 4 0 / 3 / 6 0 / 0 / 4 0 / 1 / 2 0 / 2 / 3

Orthogonal-wall connections

Misaligned walls

Spacing of orthogonal walls

Roof vulnerability Floor vulnerability

O ff set floors

Non-structural components Plan / elevation irregularities

3. Preliminary results

A hold–out strategy was adopted: model selection and hyper-parameter tuning relied on an 80 / 20 training–test split coupled with five-fold cross-validation. Decision Trees (DT) and Gradient-Boosted Trees (BT) were retained after preliminary screening. Performance was quantified with standard classification indices—precision, recall and F 1 —together with overall and balanced accuracy to compensate for class imbalance [7, 17]. Table 2 summarises the metrics for the four cost bands used in this study. Although the plain DT posts a seemingly strong accuracy (0.90), its balanced accuracy drops to 0.60, confirming a marked bias towards majority classes. The ensemble BT alleviates that bias (balanced accuracy 0.80) by aggregating multiple weak learners, yet prediction quality still deteriorates in the highest cost range (1800–2400 € / m 2 ).

Table 2. Hold-out test metrics for Decision Tree (DT) and Boosted Tree (BT) classifiers. Performance is reported across four cost bands in € / m 2 . Model Cost band ( € / m 2 ) Precision Recall F 1

0–600

0.53 0.92 0.38 0.13

0.67 0.85 0.55 0.25

0.59 0.88 0.44 0.17 0.90 0.60 0.80 0.93 0.72 0.48 0.89 0.80

600–1200 1 200–1800 1 800–2400

DT

Accuracy

Balanced accuracy

0–600

0.75 0.96 0.64 0.40

0.87 0.91 0.82 0.60

600–1200 1 200–1800 1 800–2400

BT

Accuracy

Balanced accuracy

The main findings are:

• Minority-class weakness. The poorest scores occur in the sparsely populated upper-cost band, underscoring the di ffi culty of learning tail behaviour from skewed data. • Uniform binning. Equal 600 € / m 2 intervals concentrate most records in the second band (600–1 200 € / m 2 ), amplifying imbalance. Quantile-based or domain-informed bin edges could spread the records more evenly. • Model family. Ensemble methods already dampen variance, yet alternative imbalance-aware algorithms (e.g. XGBoost with scale pos weight or LightGBM’s class weights ) may yield further gains. • Feature redundancy. Dimensionality-reduction techniques such as principal component analysis (PCA) could suppress noise in the nine vulnerability attributes, exposing latent patterns.

Made with FlippingBook Digital Proposal Maker