training optimization
2 articles · 3 co-occurring · 0 contradictions · 47 briefs
Listwise 损失 vs Pointwise BCE:+30.7pp(后者在高度同质池中失效)" — The comparison between Listwise loss and Pointwise BCE reveals a significant 30.7pp improvement and identifies a failure mode of BCE in homogeneous
2026-W22 2
2026-W21 12
2026-W20 14
2026-W19 10
2026-W18 14
2026-W17 14
2026-W16 14
2026-W15 14
Listwise 损失 vs Pointwise BCE:+30.7pp(后者在高度同质池中失效)" — The comparison between Listwise loss and Pointwise BCE reveals a significant 30.7pp improvement and identifies a failure mode of BCE in homogeneous
@vivek_2332: definitely agree. the concept of autoresearch isn't new, letting llms optimiz... extends
for weight matrices the frobenius norm gradient (what adam and sgd use) is geometrically wrong. the "correct" steepest descent direction for a weight matrix is the one that minimizes the loss subject
Get daily briefs + MCP graph access.
Subscribe free →query this concept
$ db.articles("training-optimization")
$ db.cooccurrence("training-optimization")
$ db.contradictions("training-optimization")