Fully Connected Neural Network Architecture

Lab: Neural Network Rectified Linear Unit -ReLU- vs Sigmoid-v2

Neural Network Rectified Linear Unit -ReLU- vs Sigmoid-v2

Reading: Training a Neural Network with Momentum and Data Augmentation

Momentum

In standard gradient descent, parameter updates rely solely on the current gradient, which can make learning slow—especially when the loss surface has sharp curves or narrow valleys. Momentum addresses this by incorporating information from previous gradients, creating a 'velocity' that smooths updates and builds speed in consistent directions.

\(v_{t} = \gamma v_{t-1} + \eta \nabla J(\theta)\)
\(\theta = \theta - v_{t}\)

Data Augmentation

Data augmentation is a powerful technique for reducing overfitting, a common challenge in neural network training where the model performs well on training data but poorly on unseen data. It works by generating new, varied versions of existing training samples through random transformations such as cropping, resizing, flipping, rotating, or adding noise.

In practice, data augmentation is often applied on the fly during training using tools such as tf.image in TensorFlow or torchvision in PyTorch, ensuring each training batch contains fresh variations for improved generalization.

Lab: Training A Neural Network with Momentum

Training A Neural Network with Momentum