PSI - Issue 81
Oleh Yasniy et al. / Procedia Structural Integrity 81 (2026) 116–122
118
Nomenclature a
the crack length the stress ratio
R N the number of loading cycles the overloading factor n the size of the test dataset ( ) the experimental value of the crack length in the test dataset . ( ) the predicted value of the crack length in the test dataset 2. Material and methods
In studies of fatigue crack growth, traditional experimental methods are extremely labour-intensive, material-intensive and costly. They require high-precision equipment, a long testing cycle and specialised experimental conditions, especially when modelling overload effects or variable temperature-force regimes. This creates a need for alternative tools that can effectively reproduce and predict fatigue processes based on existing experimental data. Modern machine learning methods open up broad opportunities for analysing such problems thanks to their ability to identify complex nonlinear dependencies without the need for a formal description of physical laws. Among the promising approaches are ensemble models, in particular the random forest method and the boosted trees method. These algorithms have proven themselves well in prediction tasks, are highly resistant to overtraining, and demonstrate high accuracy even in the case of noisy or limited data. The paper considers the construction of both models in order to evaluate their ability to reproduce the growth patterns of fatigue cracks based on an experimental data set. The principles of each approach, their configuration parameters, and the results of modelling are presented below. 2.1. Boosted trees Boosted trees are an ensemble method in which models are built sequentially, with each subsequent model focusing on correcting the errors of the previous ones by Hastie et al. (2009). The most common variant is gradient boosting, where each new tree minimises the loss function calculated based on the residual errors of the previous ensemble by Kaszyński et al. (2020) . This step-by-step correction ensures high accuracy even on complex and noisy data. The adjustable hyperparameters are the number of trees, the learning rate, the maximum tree depth, the number of minimum observations in the nodes, etc. Cross-validation and early stopping techniques are used to prevent overtraining. Method of boosted trees has the ability to model complex nonlinear dependencies and interactions between variables, such as crack growth retardation after overload. Feature importance analysis is based on the total increase in the loss function when splitting by certain variables. The method allows us to obtain models with high accuracy even with limited data volumes and is particularly effective in tasks where prediction accuracy is critical. 2.2. Random forest The random forest method is based on creating an ensemble of independent decision trees that are trained on random subsets of data with bootstrapping. Each tree makes decisions based on a randomly selected subset of features, which reduces the correlation between trees and ensures high generalisation ability. The final prediction is formed as an average value (in regression tasks) or as a voting result (in classification tasks) among all trees by Kuhn et al. (2013). This approach allows the model to be resistant to noise and reduces the risk of overtraining by Li et al. (2013). The main hyperparameters are the number of trees, the maximum tree depth, the number of features for splitting, and the minimum number of observations in a leaf. Method of random forest allows us to evaluate the importance of features by analysing the reduction in error when splitting by a given variable. The method copes well with nonlinear dependencies and interactions between variables. It is effective even in the presence of outliers and uneven data distribution. The application of this method in fatigue crack growth prediction tasks allows building stable models with high accuracy without prior modelling of the functional form of the relationship. 2.3. Experimental dataset In this work, the fatigue crack growth of automotive steel was predicted using the ensemble machine learning methods, namely, random forest and boosted trees. Experimental data describing the dependence of the fatigue crack length a on the number of loading cycles N and the overloading factor R ol , for the stress ratio R = 0.1 were used to train and test the models. The input parameters were the number of loading cycles N and the overloading factor R ol for a stress ratio R = 0.1, while a was considered as an output parameter. The hyperparameters were optimised to achieve the best accuracy on a specific data set. The sample contains 738 elements, with 70% randomly selected for the training sample from all experimental data at different overloading factor R ol , and 30% reserved to assess prediction quality. This split (70/30) provides sufficient data for effective training of pattern recognition models and allows for a reliable assessment of prediction accuracy. The mean absolute percentage error (MAPE) indicator was used to quantitatively assess the accuracy of model predictions. This
Made with FlippingBook flipbook maker