Starter kit
This starter kit includes a set of Jupyter notebooks designed to help participants better understand the use case and learn how to contribute to the competition.
Prerequisites
Most of the notebooks provided in this repository are runnable through Google Colab or Kaggle notebook using the free-tier NVIDIA-T4 GPUs with minimal installation requirements.
Notebooks description
In the following, we describe the content of the jupyter notebooks:
-
0-Basic_Competition_Information: This notebook contains general information regarding the competition organization, phases, deadlines and terms. The content is the same as the one shared in the competition Codabench page.
-
1-How_to_interact_with_model: This notebook aims to familiarize the participants with the tools used to interact with the model and perform some easy text generation tasks.
-
2-How_to_evaluate_a_model: This notebook shows participants how a checkpoint can be evaluated using
lm-evaluation-harness
package. -
3-Reproduce_baseline_results: This notebook shows how to reproduce the baseline results (MMLU-Var on a single checkpoint). It includes integrating MMLU-Var benchmark within the
lm-evaluation-harness
package and running it to get the results. -
4-How_to_Contribute: This notebook explains how to fully integrate a new task within
lm-evaluation-harness
package -
5-Scoring: This notebook first explains how the score is calculated by detailing its various components. It then provides a script that participants can use locally to evaluate their contributions. We encourage participants to assess their solutions on Codabench, which utilizes the same scoring module described in this notebook.
-
6-Submission: This notebook presents the composition of a submission bundle for our Hugging Face Space (to be available soon).
-
7-Scientific-Alignment Check: This notebook exposes how we will assess the scientific alignment of the proposed benchmarks using GPT4 as a judge.
Please join us on Discord for discussions and up-to-date announcements:
Join our discord here.