Underrated fact about training in the very large regime: you don't have to worry about overfitting/early stopping because single-epoch training is the default, and it turns out it's No Big Deal at all if you do single-digit number of epochs on these huge AF overparameterized models!
Academic benchmark datasets that are in the order of tens of thousands of samples are annoying in this way.