PSI - Issue 62
Sergio Ruggieri et al. / Procedia Structural Integrity 62 (2024) 129–136 Author name / Structural Integrity Procedia 00 (2022) 000 – 000
133
5
On the available dataset, several architectures of YOLOv5 were tested and applied. For each model, the backbone (CNN-based architecture used by the model to extract relevant features from images), the neck (reduce the loss during the feature extraction process and introduce additional contextual information), and the head (convolutional layers representing the bounding boxes and the category information in the input image) were defined. In addition, to train all models on the available dataset, which is relatively small, the technique of transfer learning was adopted. In fact, using trans fer learning allows of “freezing” the weights of the backbone using the general knowledge acquired by the network on large datasets for detecting features and to be used on smaller datasets. Talking about the different versions of YOLOv5, each model is characterized by its proper capacity, with a different overall number of parameters. For the case at hand, five versions of YOLOv5 were used, as shown in Table 2, using the same conditions (i.e., pixels of the images). Using a machine equipped with an Intel Core-i9 13900K as CPU, a 64 GBs DDR4 as RAM, and an NVIDIA GeForce 3090 with 24 GBs VRAM as GPU, all images were rescaled to 960 pixels and the phases of training/test/validation were performed. The obtained results were evaluated in terms of P, R, and mAP. Full definition of the metrics is reported in Equations (1), (2), and (3). The above metrics are reported in Table 2.
Table 2. Summary of results obtained by using different versions of YOLOv5 and transfer learning. Version of YOLOv5 P (%) R (%) mAP – 0.5 (%) mAP – 0.95 (%) YOLOv5n 60.52 46.13 49.50 27.46 YOLOv5s 76.56 55.70 60.48 44.18 YOLOv5m 87.81 57.86 64.57 51.29 YOLOv5l 87.65 57.54 62.91 53.08 YOLOv5x 86.90 58.17 63.62 54.52
In Table 2, the mAP was provided for a confidence score of 0.5 and for a confidence score in the range 0.5-0.95. As observed, the nano version (v5n) presents lesser capacity than the small version (v5s). This latter presents less capacity than the medium version (v5m), and so on. The architecture presenting higher capacity is surely YOLOv5x, while the one presenting the lower capacity is YOLOv5n. As an example, the precision-confidence curves for the worst and the best models are provided in Fig. 2, where differences in terms of prediction can be appreciated. In addition, in Fig. 3, the comparison between the defect detection capacity of the worst and the best models are provided.
Fig. 2. From left to right, precision confidence curves forYOLOv5n and YOLOv5x.
Made with FlippingBook Ebook Creator