Fully Connected Neural Network Architecture

1 hidden layer is a deep neural network
Deeper are more accurate but take longer to train
ReLU is 0 < 0 and linear above 0
Dropout helps with overfitting

Lab: Neural Network Rectified Linear Unit -ReLU- vs Sigmoid-v2

Neural Network Rectified Linear Unit -ReLU- vs Sigmoid-v2

Reading: Training a Neural Network with Momentum and Data Augmentation

Momentum - optimizes training process
Data Augmentation - expands and diversifies the training data

Momentum

In standard gradient descent, parameter updates rely solely on the current gradient, which can make learning slow—especially when the loss surface has sharp curves or narrow valleys. Momentum addresses this by incorporating information from previous gradients, creating a 'velocity' that smooths updates and builds speed in consistent directions.

\(v_{t} = \gamma v_{t-1} + \eta \nabla J(\theta)\)
\(\theta = \theta - v_{t}\)

\(v_t\) represents the accumulated volocity (or smoothed gradient)
\(\gamma\) is the momentum coefficient (usually around 0.9)
\(\eta\) is the learning rate, and
\(\nabla J(\theta)\) is the gradient of the loss functoin converning the parameters.

Data Augmentation

Data augmentation is a powerful technique for reducing overfitting, a common challenge in neural network training where the model performs well on training data but poorly on unseen data. It works by generating new, varied versions of existing training samples through random transformations such as cropping, resizing, flipping, rotating, or adding noise.

In practice, data augmentation is often applied on the fly during training using tools such as tf.image in TensorFlow or torchvision in PyTorch, ensuring each training batch contains fresh variations for improved generalization.

Lab: Training A Neural Network with Momentum

Training A Neural Network with Momentum