model evaluation benchmarking
2 articles · 3 co-occurring · 0 contradictions · 48 briefs
scores 87/120 on this year's Putnam, one of the world's most prestigious math competitions" — Nomos 1 is evaluated on the Putnam mathematical reasoning benchmark, demonstrating model performance asses
2026-W22 4
2026-W21 12
2026-W20 14
2026-W19 10
2026-W18 14
2026-W17 14
2026-W16 14
2026-W15 14
@NousResearch: Today we open source Nomos 1. At just 30B parameters, it scores 87/120 on thi... example_of
scores 87/120 on this year's Putnam, one of the world's most prestigious math competitions" — Nomos 1 is evaluated on the Putnam mathematical reasoning benchmark, demonstrating model performance asses
@xeophon: okay, small thread! example_of
[INFERRED] "i gave my students a challenge which claude code, codex *and* gemini failed" — Article describes comparative evaluation of three different AI code models on a single challenging task, prov
Get daily briefs + MCP graph access.
Subscribe free →query this concept
$ db.articles("model-evaluation-benchmarking")
$ db.cooccurrence("model-evaluation-benchmarking")
$ db.contradictions("model-evaluation-benchmarking")