PSI - Issue 41
America Califano et al. / Procedia Structural Integrity 41 (2022) 145–157 Author name / Structural Integrity Procedia 00 (2019) 000–000
153
9
The ROC curve typically shows the True Positive Rate (TPR) on the y -axis and the False Positive Rate (FPR) on the x -axis. These two scores are defined as follows: � � (1) � � (2) with FP = false positives and TN = true negatives. Based on Eq. 1 and Eq. 2, the more the ROC curve is vertical, the more the true positive rate (TPR) is relevant if compared to the false positive rate (FPR), the better the prediction. The FNR is defined as follows: � � (3) with FN = false negatives and TP = true positives. In the current case, the positives are the features labelled as 1 (unsafe) while the negatives are the features labelled as 0 (safe). According to this, the FN are those features that are actually 1s but that have been predicted as 0s by the algorithm, while the TP are those features that are actually 1s and have been predicted as 1s by the algorithm. Therefore, the FNR represents the portion of false 0s over the sum of the false 0s and true 1s. In this framework, keeping the FNR as low as possible is highly desirable as, otherwise, it would lead to an underestimation of the damage risk. It can be noticed that � � � . This means that � � ��� is the area under the curve obtained in the plane . The XGBoost classification model has been implemented through a simple Jupyter® notebook in Python® language by means of the sklearn packages. The first training of the XGBoost algorithm has been carried out by considering the configurations n. 1, 2, 3, 5, 6 and 7. The input data provided to the algorithm are the ( x, y, z ) coordinates of the points (filled scatters in Figure 5) and the two geometrical parameters identifying the configurations ( � and � � � � ), while the target data are the labels (0 or 1) for each of the considered points. The input data have been rescaled between 0 and 1, to be coherent with the target data. In addition, the objective function of the classification model has been set to a logistic function, being the desired output a binary output. The training phase has, then, been carried out by implementing the AUC as evaluation metric for the training performances. After twelve iterations, the XGBoost classification model has been fully trained gaining a final AUC of about 86%; at this point, the model has been tested on all the other configurations that had not been taken into consideration for the training phase, that is to say configurations n. 4, 8, 9, 10, 11 and 12. Later, the training and test configuration sets have been swapped in order to test the robustness of the algorithm. Finally, a quick sensitivity analysis has been carried out by varying the considered z -plane within the 3D model. The main results are shown and discussed in the following Section. 4. Results and discussion The first training and test phases have been carried out considering the following configurations’ sets respectively: i) 1, 2, 3, 5, 6 and 7; ii) 4, 8, 9, 10, 11 and 12. After the training of the XGBoost classification model, the test have been carried out and the goodness of the classification predictor has been evaluated in terms of ROC curve, AUC and FNR. For this first case, an average FNR of about 28% has been computed (Figure 7c), while the results in terms of ROC curve and AUC for each of the test configuration are reported in Figure 7a. It can be seen that the minimum value for the AUC is of about 0.77, which is considered highly acceptable as its optimal value is 1. Then, the second training and test phases have been carried out swapping the two configurations’ sets used previously. Even in this case an average FNR of about 28% (Figure 7d) and a minimum AUC of about 0.77 have been obtained, while the ROC curves for the test configurations are reported in Figure 7b.
Made with FlippingBook - Online magazine maker