PSI - Issue 77

Hugo Mesquita Vasconcelos et al. / Procedia Structural Integrity 77 (2026) 601–610 Hugo Mesquita Vasconcelos/ Structural Integrity Procedia 00 (2026) 000–000

606

6

Training accuracy increased rapidly, exceeding 98% by epoch 2 and stabilizing above 99% thereafter. Validation accuracy reached a maximum of 0.7769 at epoch 18, with the corresponding validation loss attaining its minimum value of 2.0680 at epoch 8. The macro-averaged F1-score peaked at 0.3416 at epoch 19, while the weighted F1-score reached 0.7800 at the same epoch, this was the selected final model. The average epoch duration was 832 s, with low variance across the forty epochs and a total of 9.5 hours of training. The analysis confirmed strong performance for dominant classes, particularly background (Precision = 0.96, Recall = 0.94) and tug (Precision = 0.68, Recall = 0.78). Intermediate recall levels were observed for cargo (0.66), dredger (0.68), and passenger ship (0.86). Other classes showed reduced recognition rates, with precision and recall values near zero. The confusion matrices for the test set, Figures 3.(a), display the class-level outcomes for this configuration. Performance results show that this configuration achieved high precision and recall for the background and major vessel classes. Background recordings achieved precision and recall values above 0.95 and 0.90, respectively, while large and frequent vessel types such as cargo, dredger, tug, and passenger ship all maintained F1-scores between 0.65 and 0.75. Mid-frequency classes such as fishing exhibited considerably lower recall values, and rare classes, including pilot vessel, sailing, and pleasure craft, showed no correct predictions. The final evaluation confirmed consistent performance on the validation and test sets. The overall accuracy reached 0.7769, with macro-averaged and weighted precision, recall, and F1-scores of 0.3342 / 0.8109, 0.3149 / 0.7769, and 0.3416 / 0.7800, respectively. The confusion matrix, Figure 3. (a), shows the class-level distribution of predictions for this configuration. 3.2. Two-Stage Approach In the two-stage configuration, the pre- operational (α) and concrete (β) heads were trained sequentially. The α head operated during the first eight epochs to perform binary discrimination between background and vessel , after which the β head was introduced for multi -class classification. During training, binary accuracy exceeded 99% by the end of the α phase, and after activation of β, validation accuracy and F1-scores increased steadily. Validation accuracy reached its maximum at epoch 29, while the macro and weighted F1-scores peaked at epochs 26 and 22, respectively. According to the maximum macro F1 approach to choose the epoch, the 26 th was the final model. The minimum validation loss was observed at epoch 9, coinciding with the α – β transition. The mean epoch duration was 869 seconds w ith training taking 9.9 hours. The model maintained strong results for background and major vessel types, showing precision and recall levels above 0.93 and 0.89 for background and 0.75 for large vessel classes such as cargo, dredger, tug, and passenger ship. Compared with the single-stage configuration, the two-stage setup produced marginally lower overall accuracy and weighted averages but slightly higher macro-averaged precision and recall. The improvement was particularly noticeable for the dredger and passenger ship categories, where F1-scores rose by approximately 0.02–0.04, while the remaining large classes remained stable. Minor categories, including fishing and rescue, showed small numerical increases in recall but continued to display low absolute values. The confusion matrix for the final model on the test set is presented in Figure 3. (b).

Made with FlippingBook flipbook maker