mid training optimization
3 articles · 15 co-occurring · 1 contradictions · 5 briefs
Mid-training after long-context pretraining gives the largest gains in math, code, and science while preserving general reasoning. Mid-training at 8k context degrades long-context ability, but this ca
[STRONG] "A couple of AI researchers have already contacted me about the results, interested in the idea that maybe fine-tunning isn't the only way towards improving math performance." — Article challenges conventional assumption that fine-tuning is primary lever for performance improvement; suggests representation discovery is orthogonal and equally or more effective.
Mid-training after long-context pretraining gives the largest gains in math, code, and science while preserving general reasoning. Mid-training at 8k context degrades long-context ability, but this ca
New research explores alternatives to fine-tuning and improving reproducibility, with open datasets supporting diverse languages" — Article documents emerging research into alternatives to fine-tuning
A couple of AI researchers have already contacted me about the results, interested in the idea that maybe fine-tunning isn't the only way towards improving math performance." — Article challenges conv