E2LM Competition 2025

Competition Rules

Size of submitted solutions: Each submitted evaluation benchmark must contain between 100 and 15,000 samples. The upper bound aligns with the size of MMLU test split of 14,042 samples, among the largest state-of-the-art benchmark, allowing participants to build upon or adapt MMLU if desired. We have successfully tested a baseline using MMLU-var (an adaptation of MMLU), confirming that our infrastructure can handle submissions at this scale. This cap also helps discourage oversized submissions that combine multiple benchmarks unnecessarily. The lower bound of 100 samples ensures accessibility for individuals or teams with limited resources who wish to explore new ideas and experiment with smaller, focused benchmarks.
Competition phases: This challenge is open to anyone and runs in 3 phases: during phase 1 & 2, participants can submit their code and view their results on the regularly updated leaderboard; specifically during the first phase, the organizers may adjust the global score formula.
Additional baselines: Additional baseline results may be released by the organizers to stimulate participation.
Submission limits: Each participant or team (using a group account) may submit up to 10 entries per day.
Code availability: Participants are strongly encouraged to make their code publicly available with their submissions.
Final ranking: The final ranking will be based on the global score, calculated by the organizers and shared with all participants. The scoring algorithm will be made available, so participants can evaluate their own solutions locally.
Team accounts: Teams must use a single group account and email. Use of multiple accounts is not permitted.
Prize eligibility: To qualify for a prize, a team must make its code open-source no later than two weeks prior to the NeurIPS competition workshop.