PSI - Issue 57
Olivier Vo Van et al. / Procedia Structural Integrity 57 (2024) 104–111
109
6
O. VoVan / Fatigue Design 2023 00 (2023) 000–000
scikit-learn. For the estimation of the SHAP values, the TreeExplainer method [12] of library shap is used. For the calculation of MDI, PI and SHAP, results are averaged over the cross-validation samples for a more robust estimate. The SHAP values are given as individual explanation as well as global explanation, as the calculation is particularly time-consuming, a sub-sample of 100,000 rail segments is selected. Sensitivity analysis. The Figure 5 illustrates the selected methods applied to evaluate the significance of variables. Several similarities can be observed across these methods. Firstly, the notable dominance of the ”freight” variable over other features is an interesting finding. Additionally, when employing the permutation of values, a significant decline in performance is evident concerning the PI criterion. The light areas in the AIR criterion represent the de-biasing phase of variable importance estimation. This phase does not result in any significant changes for our study, particularly no reordering of variables. The adjustment is relatively weak for the various variables, regardless of their individual characteristics. This method is likely to have a more pronounced e ff ect on datasets that have a limited number of inputs but a large number of variables (for instance in the context of genomic analysis). It is worth noting that the method was originally developed in such context.
Fig. 5: Variable importance issued from the Random Forest assessed using several methods.
Regarding the current study, the temperature damage variable remains significant regardless of the measurement used. By examining the Figure 6 displaying the individual importance of each test point, we can ascertain the ”direc tion” of the impact associated with each feature. In this regard, the impacts of freight, speed, and temperature damage align with intuitive expectations : as the value of the variable increases, so does the risk of squats. The PI method stands out by providing distinct importance scores compared to other methods. It e ff ectively emphasizes the impact of freight tonnage and speed, as expected. Additionally, it accurately captures the significance of the linear mass of the rail, which other criteria often fail to do. However, one notable limitation of this method is that if two variables are entirely redundant, the importance score will be the same for both variables in models that include either one. Consequently, the importance of both variables will be reduced to zero. Regarding the current study, the temperature damage variable remains significant regardless of the measurement used. By examining the Figure 6 displaying the individual importance of each test point, we can ascertain the ”direction” of the impact associated with each feature. In this regard, the impacts of freight, speed, and temperature damage align with intuitive expectations : as the value of the variable increases, so does the risk of squats. The PI method stands out by providing distinct importance scores compared to other methods. It e ff ectively emphasizes the impact of freight tonnage and speed, as expected. Addition ally, it accurately captures the significance of the linear mass of the rail, which other criteria often fail to do. Another advantage of this method is that it allows the identification of variables whose contribution is negative in the sense
Made with FlippingBook Ebook Creator