PSI - Issue 77
6
Adam Polanský / Structural Integrity Procedia 00 (2026) 000 – 000
A. Polanský et al. / Procedia Structural Integrity 77 (2026) 529–536
534
In the initial experiment, we compared several CNN architectures: MiniVGGNet [19], Resnet50 [20], ConvNeXt Tiny [21]. Pretrained weights on the ImageNet dataset were used for each model. To ensure direct comparability, each training run used the identical set of hyperparameters, as shown in Table 2. To obtain reliable results, each model was trained five times within a single run, using five different random seeds. The image input size was determined manually by gradually reducing the resolution as long as the defects were still visible to the observer. By trying various combinations, 400x400 pixels was considered the best compromise between reduced size and lost information. The same training runs were also performed using 800x800 pixels to compare the models' performance based on the input size. The batch size was set to 4 in that case due to GPU memory limits. The training pipeline consisted of the first phase, when fully-connected layers and the last block in the feature extractor were unfrozen. After completing a defined number of epochs or applying early stopping, the best model based on the validation loss was saved. This model was used in the second phase, where the learning rate was adjusted and the remaining blocks in the feature extractor were unfrozen to perform fine-tuning.
Table 2. Hyperparameters used for models
Parameter Input size Batch size
Value
Comment
400x400
16
Learning rate 1 Learning rate 2 Optimizer Loss function Number of epochs Patience
1e-4 1e-5 Adam Soft Focal Loss 50 5
Used only for the first phase Used only for the second phase
alpha = weights, gamma = 1.0, ce_ratio = 0.3, reduction = mean Maximum number of epochs for both phases Early stopping patience for both phases
5. Results and Discussion For our dataset, ConvNeXT-Tiny achieved the best results in comparison with other models, see Table 3. The difference is especially visible when the input size is set to 400x400 pixels. MiniVGGNet and Resnet50 showed major improvements when using the 800x800 pixels input. As visible in Table 4, all models were suffering from classifying NOK images. Therefore, an analysis of the incorrectly classified images was done. We found that these images typically contained small defects, such as tiny scratches on the edges of the powder bd or small spots of missing powder. This can be caused by the scarcity of these types of defects in the training set. This fact is probably the reason why ConvNeXT-Tiny performed the best: its modern architecture does not require as much training data as the other models. Based on these findings, we believe that enriching our training dataset is necessary. We will address this problem in our future research. Another issue with these small defects is that several annotators could classify the same image either as an OK or a NOK layer. This is the limitation of the binary classification. Nevertheless, the binary classification approach could still be useful for preliminary quality control.
Made with FlippingBook flipbook maker