Gradient descent method for Perceptron - Albert Masoliver's learning site

# Description The main difference between [[Perceptron]] and more advanced neural network models is the training method used. The [[Perceptron]] algorithm is based on a simple gradient descent approach to minimize the loss function. However, this method is not suitable for more complex models, as it may get stuck in local minima and fail to achieve a global optimum. The objective of a neural network's training process is to adjust the weights in such a way that the network correctly predicts the output. This adjustment is done by minimizing a loss function, often represented as: $ \epsilon = \frac{1}{2} (c - y)^2 $ where $c$ is the class value of the training instance, and $y$ is the output produced by the network. To minimize this error, the gradient descent method is employed. The gradient of the loss function with respect to each weight is calculated, and the weights are updated accordingly: $ \Delta w = \eta (c - y)d $ where $\eta$ is the learning rate. By iteratively applying this update rule, the network gradually reduces the error, aiming to produce more accurate predictions. However, due to the non-convex nature of the loss function in more complex models, gradient descent may sometimes converge to a local minimum rather than the global minimum, leading to suboptimal solutions. The design and training of a neural network are fundamentally similar to the training of other machine learning models that employ gradient descent. However, a key distinction lies in the inherent nonlinearity of neural networks, which often results in nonconvex loss functions. Consequently, the optimization landscape of neural networks is more complex, necessitating the use of iterative, gradient-based optimization methods to minimize the cost function effectively. Unlike linear equation solvers or convex optimization algorithms, which offer guarantees of global convergence, these methods in neural networks typically drive the cost function to very low values, though they may not always achieve a global minimum. --- ## References - [[Deep learning - Anna Bosch Rué Jordi Casas Roma Toni Lozano Bagén]]