Training Process

Forward Pass: Compute model predictions.
Loss Calculation: CrossEntropyLoss.
Backward Pass: Backpropagate gradients.
Optimization Step: Update weights (Adam optimizer).

The training pipeline is robust, supporting distributed training, mixed precision, and dynamic learning rate scheduling.

Training Concepts

Standard PyTorch training loop:

We use torch.cuda.amp (Automatic Mixed Precision) to speed up training and reduce memory usage.

The train.py script is designed to run on multiple GPUs.

Key hyperparameters (defaults):

The model is saved only when validation loss improves.

Format: .pth file containing model state, optimizer state, and scheduler state.
Naming: checkpoint_{timestamp}-signs_{num_signs}/{epoch}.pth