Starter kit
This starter kit includes a set of Jupyter notebooks designed to help participants better understand the use case and learn how to contribute to the competition.
Prerequisites
Most of the notebooks provided in this repository are runnable through Google Colab or Kaggle notebook using the free-tier NVIDIA-T4 GPUs with minimal installation requirements.
Notebooks description
In the following, we describe the content of the jupyter notebooks:
-
1-How_to_interact_with_model: This notebook aims to familiarize the participants with the tools used to interact with the model and perform some easy text generation tasks.
-
2-How_to_evaluate_a_model: This notebook shows participants how a checkpoint can be evaluated using
lm-evaluation-harness
package. -
3-Reproduce_baseline_results: This notebook shows how to reproduce the baseline results (MMLU-Var on a single checkpoint). It includes integrating MMLU-Var benchmark within the
lm-evaluation-harness
package and running it to get the results. -
4.1-How_to_Contribute: This notebook explains how to fully integrate a new task within
lm-evaluation-harness
package -
4.2-How_to_Contribute_Advanced: This notebook offers an in-depth exploration of the
lm-evaluation-harness
package architecture and functionality. -
5-Scoring: This notebook explains how the signal score is calculated.
-
6-Submission: This notebook presents the composition of a submission bundle for our Hugging Face Space
-
7-Scientific-Alignment Check: This notebook exposes how we will assess the scientific alignment of the proposed benchmarks using GPT4 as a judge.
Please join us on Discord for discussions and up-to-date announcements:
Join our discord here.