PSI - Issue 68

Hussam Safieh et al. / Procedia Structural Integrity 68 (2025) 245 – 251 H. Safieh et al / Structural Integrity Procedia 00 (2025) 000–000

248

4

2.3. Data Normalization Data normalization was applied to ensure that the input features are on a comparable scale, which is critical for optimizing model performance, especially when using distance-based algorithms. The Z-score normalization method also known as standardization was employed. The standardized value of x ( ) for a given data point ( x ) is computed by subtracting the mean of the data ( ) from each data point and dividing the result by the standard deviation of the data ( ) as illustrated in Equation (1). This method of normalization is particularly useful when the data has a Gaussian (normal) distribution and when the algorithm assumes normally distributed data, such as Linear Regression and Support Vector Machines. = − 2.4. Machine Learning Models 2.4.1. Linear Regression Linear regression is a simple machine learning technique used to model the relationship between a dependent variable and one or more independent variables. It assumes a connection between input features and the output variable. The method works by adjusting the weights of the input variables to minimize the difference between predicted and actual values, typically using the least squares method. While linear regression is easy to implement and interpret, it's important to note that it may not always provide the best accuracy, especially when some variables have a negative correlation. Positive correlation means both variables increase together, while in a negative correlation, one increases as the other decreases. The key is whether the model captures the true relationship between the variables, regardless of the direction of the correlation. 2.4.2. Linear Support Vector Machine Linear Support Vector Machines (SVM) are a key advancement in machine learning, known for classifying data by finding the optimal separating hyperplane. It performs well with high-dimensional data and often surpasses other algorithms in terms of accuracy and efficiency. SVM maximizes the margin between data points, improving its generalization ability. Linear SVM is particularly useful in classification problems due to its ability to handle feature spaces and resist overfitting. Additionally, with the use of kernel tricks, SVMs can manage nonlinearly separable data, making them a versatile and efficient tool for many real-world applications. 2.4.3. Random Forest Random Forest is a powerful ensemble learning technique that builds on the strengths of decision trees. It creates multiple decision trees during training and averages their predictions for regression tasks. By introducing randomness through bootstrap sampling (selecting a subset of training data) and considering a random subset of features at each tree split, Random Forest reduces the risk of overfitting. This method improves accuracy and provides insights into feature importance, helping to understand data patterns. Known for its simplicity and adaptability, Random Forest is widely used for classification and regression tasks. Its resistance to overfitting and ability to handle high-dimensional data make it a fundamental tool for data scientists. When applied to estimating UHPC strength, Random Forest effectively captures relationships between variables such as FA%, superplasticizers, and curing period, enhancing precision by combining multiple decision trees.

Made with FlippingBook - Online Brochure Maker