PSI - Issue 64
Christoph Brenner et al. / Procedia Structural Integrity 64 (2024) 1240–1247 Christoph Brenner et al./ Structural Integrity Procedia 00 (2019) 000 – 000
1245
6
Table 1. Detailed values of Gini impurities for different tree depths, types of splits, and minimum number of samples per leaf Training data set / Test data set Min. samples per leaf Depth 1 Depth 2 Depth 3 Depth 4 Depth 5 Depth 6 Depth 7
0.120 / 0.120 0.119 / 0.119 0.145 / 0.144 0.144 / 0.144
0.186 / 0.185 0.185 / 0.184 0.221 / 0.220 0.210 / 0.209
0.232 / 0.230 0.229 / 0.228 0.242 / 0.240 0.241 / 0.237
0.262 / 0.257 0.257 / 0.256 0.267 / 0.263 0.265 / 0.262
0.282 / 0.278 0.278 / 0.275 0.286 / 0.283 0.287 / 0.282
0.300 / 0.294 0.298 / 0.293 0.211 / 0.304 0.313 / 0.305
0.328 / 0.318 0.324 / 0.314 0.334 / 0.322 0.336 / 0.324
Axis-parallel splits
-
0.1 %
Hyperplane splits, sparsity = 2
-
0.1 %
Based on these results, a tree depth of five is chosen for further analysis, representing a balance between prediction accuracy and interpretability while mitigating overfitting risks. Since the SHM system's sensors being widely spaced, OCTs with hyperplane splits only marginally outperform OCTs with axis-parallel splits. For a comparable Gini impurity, the same tree depths are required for both split types. Moreover, considering the decrease in interpretability and the substantial increase in training complexity associated with hyperplane splits, only OCTs with axis-parallel splits are considered in the following. Across both scenarios with different minimum numbers of samples per leaf at a tree depth of five, the results remain consistent. Given the smaller disparity in Gini impurity between training and test datasets at a minimum of 0.1%, this threshold is adopted for further investigations, leading a more robust behavior of the OCTs. 3.2. OCTs for the classification of multiple parameters using measurement data Two distinct approaches for determining the optimal model with six parameters using OCTs are explored. Firstly, individual trees are trained for each parameter without considering correlations, similar to the example outlined by Kapteyn et al. (2022). Secondly, a multi-classification tree is developed to predict all six parameters simultaneously. In both instances, the hyperparameters of the OCTs are selected as described in the preceding section. For the six separate trees, the average Gini impurity is calculated as 0.27 for the training dataset and 0.26 for the test dataset. Notably, the extreme values stand at 0.34/0.33 for the upper bounds and 0.19/0.19 for the lower bounds in the training/test dataset, respectively. Conversely, the multi-classification tree yields a Gini impurity of 0.13 for the training and test dataset. These Gini impurity outcomes align well with the preliminary tests conducted in the previous section.
Fig. 4. Comparison of the predicted parameters and the resulting RMSE for the initial model and both OCT models.
Made with FlippingBook Digital Proposal Maker