the stream

2022/12/31 4:11

Underrated fact about training in the very large regime: you don't have to worry about overfitting/early stopping because single-epoch training is the default, and it turns out it's No Big Deal at all if you do single-digit number of epochs on these huge AF overparameterized models!

Academic benchmark datasets that are in the order of tens of thousands of samples are annoying in this way.