hyperparameter optimization
3 articles · 6 co-occurring · 0 contradictions · 5 briefs
accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc." — Art
accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc." — Art
At just 30B parameters, it scores 87/120 on this year's Putnam" — Provides evidence that competitive mathematical reasoning capability can be achieved with relatively modest 30B parameter model size
adding constraints like single file to edit, single metric to track, a time limit, and a well-written program.md is what makes this work. that combination is what makes @karpathy autoresearch actually