← All concepts

training optimization

2 articles · 3 co-occurring · 0 contradictions · 5 briefs

Listwise 损失 vs Pointwise BCE:+30.7pp(后者在高度同质池中失效)" — The comparison between Listwise loss and Pointwise BCE reveals a significant 30.7pp improvement and identifies a failure mode of BCE in homogeneous

2026-W15
10

Listwise 损失 vs Pointwise BCE:+30.7pp(后者在高度同质池中失效)" — The comparison between Listwise loss and Pointwise BCE reveals a significant 30.7pp improvement and identifies a failure mode of BCE in homogeneous

for weight matrices the frobenius norm gradient (what adam and sgd use) is geometrically wrong. the "correct" steepest descent direction for a weight matrix is the one that minimizes the loss subject

query this concept
$ db.articles("training-optimization")
$ db.cooccurrence("training-optimization")
$ db.contradictions("training-optimization")