PSI - Issue 66

Andrii Kompanets et al. / Procedia Structural Integrity 66 (2024) 388–395 Author name / Structural Integrity Procedia 00 (2025) 000–000

389

2

unnecessary maintenance or unsafe situations when existing cracks are not identified as such. A system for automatic visual inspections of bridges may support and improve the reliability of existing inspection practices. Fatigue cracks are one of the most frequent types of bridge damage, therefore an algorithm for fatigue crack detection on images should be a part of an automatic visual inspection system. Since crack length and width are relevant geometrical parameters in a structural assessment, crack segmentation has to be performed for the automated visual inspection system. Image segmentation is the process of partitioning a digital image into multiple image segments, where each pixel is assigned to one of the segments. In the case of crack segmentation, two types of segments are considered: a segment of the image background and one or more segments of cracks. Deep learning algorithms are the most promising approach for the task of crack detection and segmentation, therefore, the main goal of this work is to develop such a deep learning algorithm. CNN's (LeCun, 1998) are powerful image processing tools developed and widely used for image classification, and segmentation, amongst other applications. It is common to distinguish parts of a CNN into a feature encoder, and classifier (or output layer). The CNN encoder involves multiple layers of filters and down-samplers to hierarchically generate feature maps, that encode important semantic image information. The filters of the encoder extract geometrical features of an image, whilst the downsamplers decrease the spatial dimension of the feature maps, thus, providing a bigger receptive field (Luo, 2016) of filters deeper in the neural network and reducing computational load. To perform pixel-to-pixel image segmentation fully convolutional networks (FCN) were proposed (Long, 2015) as an extension of CNNs. In the original FCN, the classifier of the CNN is replaced with an up-sampling layer transforming the encoded feature maps with low spatial dimension and rich semantic information into a segmentation map. U-Net architecture (Ronnenberg, 2015) extends the original FCN by replacing its relatively simple upsampling layer with a multi-layer decoder. Thus, U-Net consists of an encoder and decoder being an example of an encoder-decoder network. A more complex decoder of the U-Net, compared to the one in the original FCN, allows gradual spatial upsampling of the feature maps and more efficient conversion of the feature maps into an output segmentation map. The long-range skip connections between the encoder and decoder introduced by (Ronnenberg, 2015) help to restore spatial information that would be lost due to downsampling in the encoder. A recent example of adopting the encoder decoder architecture for the cracks segmentation task was proposed by König et al. (König, 2021). This method shows state-of-the-art performance on the CFD dataset presented in (Shi, 2016). In this paper, we present an encoder-decoder convolutional neural network for the segmentation of cracks in images of steel bridges. We will use the method proposed by König et al. (König, 2021) as our primary baseline over which we develop our architectural and training strategies leading to significant improvements. 2. Dataset The CSB dataset is a publicly available collection of images of steel bridges, accessible at Kompanets (2024a). It includes 755 images with cracks and 300 without cracks, all gathered during routine bridge inspections. The images vary in size, with a maximum resolution of 4608×3456 pixels, and were captured from different distances and angles, ranging from about 0.5 meters to up to 5 meters, as visually estimated. Pixel-wise annotations for the images were created using a semi-automatic tool developed specifically for this task (Kompanets, 2024b). The tool leverages a geometric tracking algorithm (Duits, 2018), as outlined in (Kompanets, 2024b). It requires manual input to mark the positions of two crack endpoints on the image. The geometric tracking algorithm utilizes these endpoints to trace the path of the crack between them, after which the segmentation of the crack is automatically completed. The use of this algorithm significantly reduced the annotation time, cutting it down from 30 minutes to under 1 minute per image. However, this gain in efficiency came at the cost of a slight reduction in annotation accuracy. In (Kompanets, 2024b), it was demonstrated that the tool used for pixel-wise annotations achieved 83% of the accuracy of manual annotation, as measured by the F1 -score. To mitigate the loss in accuracy, the annotations were manually refined afterward, which still required considerably less time than a fully manual pixel-wise annotation process. A detailed analysis of how

Made with FlippingBook Ebook Creator