PSI - Issue 38

First page Table of contents Previous page 26 Next page Last page

L. Heindel et al. / Procedia Structural Integrity 38 (2022) 159–167

161

L. Heindel et al. / Structural Integrity Procedia 00 (2021) 000–000

and measuring the responses. Further details are provided in the literature, see Natke (1988), Dodds and Plummer (2001) or Hay and Roberts (2007). If the system behavior is linear and time invariant, the frequency response function matrix perfectly predicts the output channel data using the corresponding system inputs. In the general case of non-linear systems with multiple input and output channels (MIMO), this method still provides a linear estimate of the true system behavior.

2.2. Long Short-Term Memory network prediction and windowing

The Long Short-Term Memory network is the most commonly used variant of recurrent artificial neural networks. Since this paper only gives an overview of the algorithm, the reader is referred to Hochreiter and Schmidhuber (1997); Gers et al. (1999) for a thorough derivation. LSTM networks are composed of one or multiple memory blocks, which in turn consist of one or multiple memory cells. These cells have inner states, which can be written as a vector c ( t i ). For each discrete point in time t i with i = 1 . . . L , the network processes the corresponding input data vector x ( t i ), modifies the cell state c ( t i ) accordingly and generates a block output h ( t i ). L is the total number of time steps in the input sequence. The di ff erent types of data processing, namely writing information into the inner state, forgetting information and generating an output are handled by respective gate networks G store , G forget , G out and the input network N in . At a discrete time step t i , the input of each network consists of the current block input vector x ( t i ), concatenated with the last cell output h ( t i − 1 ). From this combined input x ∗ ( t i ), each network computes a corresponding network activation a ( t i ) following the scheme a ( t i ) = ξ Wx ∗ ( t i ) + b , (3) where W and b are the weight matrix and bias vector of the network and ξ ( · ) is its activation function. All gate networks use the sigmoid activation function σ ( x ) = 1 / (1 + e − x ), while the input network applies the hyperbolic tangent. Depending on the network, the resulting activations are denoted by g store , g forget , g out or a in . They are used to update the cell state of the previous time step c ( t i ) = c ( t i − 1 ) g forget ( t i ) + a in ( t i ) g store ( t i ) (4) and to provide the block output of the current time step h ( t i ) = tanh c ( t i ) g out ( t i ) (5) using the Hadamard product for element-wise multiplication. In networks with multiple memory blocks, the block output of one block is forwarded as the input vector of the next block. The network prediction y ∗ is finally obtained as a linear combination of the final block output components and bias vector b FC . The process of determining suitable values for the parameters W and b inside the LSTM blocks is called training the network. After random initialization, examples of input data can be used to generate an LSTM prediction. Since the corresponding output data is known in the training dataset, the mean squared error can be used to measure the quality of the prediction. The network parameters are then iteratively updated using the RMSProp gradient-descend optimization algorithm proposed by Hinton (2012) with the learning rate hyper-parameter λ . The training process last for a chosen number of training dataset repetitions called epochs. In order to process measurement data with the LSTM algorithm, it is first split into short subsequences of equal length L . The overlap factor o determines the number of shared samples between subsequences relative to L . After all subsequences are extracted from the measurements, the mean and standard deviation is computed separately for each channel of the training dataset. The data is now shifted and rescaled by first subtracting the mean, then dividing y ∗ ( t i ) = W FC h ( t i ) + b FC , (6) which is realized by a fully connected layer with weight matrix W FC

Made with FlippingBook Digital Publishing Software