PSI - Issue 44
Gabriella Tocchi et al. / Procedia Structural Integrity 44 (2023) 1972–1979 Gabriella Tocchi et al./ Structural Integrity Procedia 00 (2022) 000–000
1976
5
The number of the observation in the dataset are the number of census tracts in the selected municipalities. Each observation (i.e., census tract) is categorized assigning the type of TC to which the census tract belongs; such association is assigned using a GIS procedure. Thus, if the census tract belongs to the HC, a value equal to 1 is assigned, otherwise this value is equal to 0. A linear logistic regression model is built allowing the association of the TC type (HC or not) for census tracts depending on the two considered variables: population density and number of buildings built before 1945. Such variables are scaled so to reduce the time required for algorithm calculations and to facilitate results interpretation. Thus, the number of buildings built before 1945 are divided by the total number of buildings in the tract while the population density in each census tract was divided by the maximum value of the same parameter over all census tract for the considered municipal area. Fig. 2 shows the decision boundary of the logistic regression model developed for the identification of the historical center. The linear decision boundary (yellow line) divides the observations belonging to the historical center (cross marker) and the ones not belonging to it (circle). The observations reported in the figure (i.e., cross markers and circles) are the census tracts in the Test set, based on which the model accuracy is calculated. To ensure that the model with the best forecasting performance is obtained, the cross-validation approach is used. This technique involves randomly dividing the entire dataset into a training set whose observations are used to fit the model, and a test set, that is used to evaluate its performance. The procedure is repeated k=1000 times, each time different training and test sets are defined and a different model may be obtained. The best model is then chosen among the k obtained based on the performance on the test set, that is the model that show the best accuracy on the test set. In this application, the accuracy of the selected model is 0.75, meaning that the 75% of the observations in the test are correctly classified.
Fig. 3. Comparison between area included in the TC 01 according to map reported in Cartis form (green outline) and the corresponding area resulting from the application of the proposed model (red outline) for the municipality of Frignano.
It is observed that the resulting regression model assigns the census tracts with a high population density (> 0.8) to the HC, regardless of the number of oldest buildings within it (see decision boundary in Fig. 2). As a matter of fact, is not uncommon that in a municipality the number of buildings built before 1945 is very low (e.g., Casal di Principe) and in this case the population density is the sole parameter that affect the classification. On the contrary, tracts with a very low value of the population density usually represent census tract in peripheral areas of the town, and the high percentage of pre 1945 buildings may indicate the presence of old buildings scattered in the area. 4. Application in two towns in the south of Italy The proposed ML model is applied to two different municipalities in Campania region and in the province of Caserta (CE), Frignano and San Cipriano d’Aversa, belonging to population class 4 and 5 respectively. The selected
Made with FlippingBook flipbook maker