PSI - Issue 44

Gabriella Tocchi et al. / Procedia Structural Integrity 44 (2023) 1972–1979 Gabriella Tocchi et al./ Structural Integrity Procedia 00 (2022) 000–000

1975

4

3. Machine learning- based approach for compiling building inventory The analysis of Cartis database in previous paragraph has shown that there are some correlation between the Cartis data and census ones: generally, in municipalities with less than 5000 inhabitant only two different homogeneous areas (i.e., TCs) are defined, while for the municipalities with a number of inhabitants between 5000 and 50000, these areas are at least three; usually, the first TC defined (TC 01) corresponds to the historical center of the city and it is characterized by the higher percentage of oldest buildings, built before 1945, while in the second TC (TC 02), that could correspond to the first expansion area, there are predominantly buildings built between 1946 and 1981. These observations highlight the possibility to establish criteria for identifying TCs given census data at census tract level. To this aim, we propose the use of a logistic regression algorithm for the identification of the TC to which census tracts belong. Logistic regression is a supervised learning algorithm that use labelled dataset (i.e., input data for which labelled output is already known) to train a forecast model for predicting outcomes based on one or more input data. In this study input data are information about buildings and population of census tracts for which the labelled output (i.e., TC of belonging) is known, i.e., census tracts of the municipalities included in Cartis database. Using these data, logistic regression model to predict the census tracts included in the first TC, the Historical center, that usually is the one with a large number of ancient and highly vulnerable buildings, is trained. In the following paragraphs the methodology adopted to train the algorithm and the dataset used will be described and the resulting predicting model will be shown. Moreover, a proposal for the association of main vulnerability factors to census tracts’ buildings based on the relative TC which they belong will be presented. 3.1 Logistic regression for Historical center identification The database used to train the model consists of both ISTAT and Cartis data of several municipality investigated in Campania region. In this region, 40 municipalities with a number of inhabitants lower than 50000 are selected, while data related to the two municipalities of Frignano and San Cipriano d’Aversa are used to perform a first test on the validity of the proposed procedure (see paragraph 4). As shown previously, a significant parameter for the identification of the Historical Center (HC) is the period of construction for the buildings: if the percentage of buildings built before 1945 in the considered census tract is quite high, probably this tract will be in the HC. Another significant parameter is the population density. As a matter of fact, the analysis of the extension of the first TC through a GIS software, show that generally this is mainly constituted by small census tracts characterized by a great number of inhabitants. Therefore, the number of buildings built before 1945 and the population density (i.e., census tract’s population over its surface area) are the input variables considered for training the model.

Fig. 2. Decision boundary (yellow line) of the logistic regression model for the identification of census tracts belong to HC. Cross markers represent observations belong to HC, circles the ones do not belong to it. The observation reported in grey are the ones not correctly classified by the model.

Made with FlippingBook flipbook maker