PSI - Issue 57
Olivier Vo Van et al. / Procedia Structural Integrity 57 (2024) 104–111
107
4
O. VoVan / Fatigue Design 2023 00 (2023) 000–000
Several details have been omitted pertaining to the problem at hand. These include models selection, regularization techniques employed, and the evaluation method based on the ROC curve criterion. More details can be found in [10]. The evaluation is further quantified using the ROC-AUC (Area Under the ROC Curve) as a scalar metric. This scalar metric is a ranking metric, implying that for any scoring function s the AUC is defined as AUC( s ) = P ( s ( X ) ⩾ s ( X ′ ) | Y =+ 1 , Y ′ = − 1) (3) where ( X , Y ) and ( X ′ , Y ′ ) are i.i.d copies. This measure assesses the capability of function s to rank observations X (rail segments) with respect to the binary label Y (presence of squats). Several methods of estimating the importance of variables coexist, each with their advantages and disadvantages. We highlight here the usefulness of proposing a set of methods of analysis of variables, by giving each one its own in terpretation, sometimes complementary, sometimes additional. The choice is made here to calculate them in 4 di ff erent ways: • MDI (Gini) :The M ean D ecrease I mpurity of the Gini index is a most common measure of variable importance and was first introduced in [2]. It measures the decrease in impurity as the di ff erence between the impurity of a node and the average of the impurity of children nodes, where the impurity of a node N is defined as I gini N = i ∈{− 1 , + 1 } X ∈N P ( Y = i )(1 − P ( Y = i )). While being very popular, some authors show several limits of such measure, fo example its bias, see [17, 22] to only name a few. • PI (ROC-AUC) : The P ermutation I mportance measures the decrease of accuracy between between one dataset x D and permuted datasets x ˜ D i where all values are randomly permuted in the i -th column. The importance is then assessed using an accuracy measure as the distance between the dataset D and the permuted dataset ˜ D i . Here we use the ROC-AUC , the variable importance then becomes Imp PI ( X ( i ) ) = AUC(ˆ η D ) − AUC(ˆ η ˜ D i ) . (4) • AIR(Gini) [17] The A ctual I mpurity R eduction is a simple method which aims at unbias variable importance by introducing permuted dataset x ˜ D and learning on x = [ x D , x ˜ D ]. This method aims to eliminate biases due to the characteristics of X alone, in particular the fact that the importance of the variables calculated by the Gini index tends to favor continuous variables or those containing many categories. • SHAP [11]The SH apley A dditiveex P lanation is a game-theoretic approach that o ff ers an interpretation method by creating a model that quantifies the contribution of each feature to the prediction. In this approach, features can be seen as participants in a game where the distribution of the prediction, acting as the payo ff , is allocated among these players (features). As demonstrated in [21] there is a connection between a local version of MDI (referred as local MDI importances) and SHAP values for totally randomized trees and a specific characteristic function. 2.3. Variable importance and individual explanation
3. Presentation of data and results
3.1. Analysis of temperature damage data
Figure 3 shows the cumulative damage increasing over days for two configurations. In blue , all the available data were used, and ordered as [ T min , T mean , T max ] each day. In green , only T mean were used each day. The low and high value of the blue envelop shows that some regions are subject to 5 times more thermal damage than others. The temperature evolution by day represents the seasonal variation, as opposite to variations during the day that can be referred as daily variations. Comparison between blue and green lines shows that daily variations contribute 10 times more than seasonal variations. The Figure 4 shows the damage due to daily temperature variations. We note that the spatial distribution of the damage corresponds to what was expected: a lower value on the coasts (oceanic and Mediterranean climate) and higher inland, in the central and eastern region of France. Mountainous regions experience
Made with FlippingBook Ebook Creator