training optimization
2 articles · 3 co-occurring · 0 contradictions · 5 briefs
Listwise 损失 vs Pointwise BCE:+30.7pp(后者在高度同质池中失效)" — The comparison between Listwise loss and Pointwise BCE reveals a significant 30.7pp improvement and identifies a failure mode of BCE in homogeneous
2026-W15 10
Listwise 损失 vs Pointwise BCE:+30.7pp(后者在高度同质池中失效)" — The comparison between Listwise loss and Pointwise BCE reveals a significant 30.7pp improvement and identifies a failure mode of BCE in homogeneous
@vivek_2332: definitely agree. the concept of autoresearch isn't new, letting llms optimiz... extends
for weight matrices the frobenius norm gradient (what adam and sgd use) is geometrically wrong. the "correct" steepest descent direction for a weight matrix is the one that minimizes the loss subject
query this concept
$ db.articles("training-optimization")
$ db.cooccurrence("training-optimization")
$ db.contradictions("training-optimization")