Reproducibility Checklist

For all reported experimental results:

  • A clear description of the mathematical setting, algorithm, and/or model
  • A link to a downloadable source code, with specification of all dependencies, including external libraries (recommended for camera ready)
  • A description of computing infrastructure used
  • The average runtime for each model or algorithm, or estimated energy cost
  • The number of parameters in each model
  • Corresponding validation performance for each reported test result
  • A clear definition of the specific evaluation measure or statistics used to report results.

    For all results involving multiple experiments, such as hyperparameter search:
  • The exact number of training and evaluation runs
  • The bounds for each hyperparameter
  • The hyperparameter configurations for best-performing models
  • The method of choosing hyperparameter values (e.g., manual tuning, uniform sampling, etc.) and the criterion used to select among them (e.g., accuracy)
  • Summary statistics of the results (e.g., mean, variance, error bars, etc.)

    For all datasets used:
  • Relevant statistics such as number of examples and label distributions
  • Details of train/validation/test splits
  • An explanation of any data that were excluded, and all pre-processing steps
  • For natural language data, the name of the language(s)
  • A link to a downloadable version of the dataset or simulation environment
  • For new data collected, a complete description of the data collection process, such as instructions to annotators and methods for quality control