PSI - Issue 78

Angelo Aloisio et al. / Procedia Structural Integrity 78 (2026) 1–8

4

Table 1. Explanatory variables. For numerical variables the four numbers are mean, standard deviation, maximum and minimum. Variable Data type Categories / statistics Structural typology Categorical { Concrete, Masonry } Concrete compressive strength [MPa] Numerical { 21.06, 6.93, 42.50, 6.63 } Total floor area [m 2 ] Numerical { 298.63, 210.14, 2 104.60, 30 } Building height [m] Numerical { 9.40, 3.00, 21.50, 2.47 } Volume [m 3 ] Numerical { 2 541.88, 1 425.10, 10 420, 500 } Residential units [–] Numerical { 6.18, 3.01, 24, 1 } Construction year Numerical { 1977, 12.09, 2006, 1938 } Structural configuration Categorical { Clustered, Isolated } Elevated floors [–] Numerical { 2.76, 0.83, 7, 1 } Basement floors [–] Numerical { 0.21, 0.41, 1, 0 } Pilotis floor Categorical { Yes, No } Structural interventions Categorical { Yes, No } Observed damage Categorical { None, Low, Moderate, High } Peak ground acceleration [g] Numerical { 0.24, 0.02, 0.34, 0.16 } Soil type Categorical { A, B, C, D }

outcomes suggest that the fifteen predictors, while easy to obtain, do not hold enough information to estimate the continuous index reliably.

3.2. Classification models

The authors next recast the task as a classification problem. Splitting the index into four, then three, ordered classes yielded modest accuracies (roughly 0.60–0.65) even with complex ANNs, which is not satisfactory given the inherent uncertainty of the underlying assessments. Better results emerged once the index was divided into just two groups by a threshold α t . Buildings with α < α t were tagged Severe; the remainder were Moderate. Binary labels simplify interpretation and often give stronger classifiers. With α t = 0 . 10, the critical value suggested by ATER, the authors compared four algorithms: multinomial logistic regression, a classification tree, an ANN (a fully connected multilayer perceptron with 15 neurons per hidden layer) and XGBoost. Other candidates such as SVM, random forest and na¨ıve Bayes brought no clear advantage and were omitted. Because the dataset is imbalanced (67 Severe versus 209 Moderate cases), Table 2 reports class-wise precision, recall and F 1 as well as accuracy.

Table 2. Performance comparison for the four binary classifiers (Severe vs. Moderate). Logistic regression

Treemodel Metric (%) Severe Moderate Severe Moderate Precision 29.41 90.38 20.00 87.10 Recall 50.00 79.66 33.33 77.10 F 1 37.33 84.62 25.00 81.86 Accuracy 75.36 70.73 ANN XGBoost Precision 60.00 96.80 73.44 86.24 Recall 85.71 88.20 51.60 93.53 F 1 70.59 92.31 59.10 89.64 Accuracy 87.80 83.54

The minority (Severe) class shows much lower precision and recall for logistic regression and the tree. Only the ANN and XGBoost strike a better balance, with the ANN achieving both the highest accuracy (about 88 %) and equal recall for the two classes. Hence the ANN is preferred for this imbalanced set. Performance was then checked against di ff erent thresholds α t . Twenty percent of the buildings were reserved for validation in a Monte-Carlo loop until the coe ffi cient of variation of recall dropped below 1 %. Figure 2 plots accuracy

Made with FlippingBook Digital Proposal Maker