Getting Started¶

The AutoML Benchmark is a tool for benchmarking AutoML frameworks on tabular data. It automates the installation of AutoML frameworks, passing it data, and evaluating their predictions. Our paper describes the design and showcases results from an evaluation using the benchmark. This guide goes over the minimum steps needed to evaluate an AutoML framework on a toy dataset.

Full instructions can be found in the API Documentation.

Installation¶

These instructions assume that Python 3.9 (or higher) and git are installed, and are available under the alias python and git, respectively. We recommend Pyenv for managing multiple Python installations, if applicable. We support Ubuntu 22.04, but many linux and MacOS versions likely work (for MacOS, it may be necessary to have brew installed).

First, clone the repository:

git clone https://github.com/openml/automlbenchmark.git --branch stable --depth 1
cd automlbenchmark

Create a virtual environments to install the dependencies in:

Linux¶

python -m venv venv
source venv/bin/activate

MacOS¶

python -m venv venv
source venv/bin/activate

Windows¶

python -m venv ./venv
venv/Scripts/activate

Then install the dependencies:

python -m pip install --upgrade pip
python -m pip install -r requirements.txt

Note for Windows users

The automated installation of AutoML frameworks is done using shell script, which doesn't work on Windows. We recommend you use Docker to run the examples below. First, install and run docker. Then, whenever there is a python runbenchmark.py ... command in the tutorial, add -m docker to it (python runbenchmark.py ... -m docker).

Problem with the installation?

On some platforms, we need to ensure that requirements are installed sequentially. Use xargs -L 1 python -m pip install < requirements.txt to do so. If problems persist, open an issue with the error and information about your environment (OS, Python version, pip version).

Running the Benchmark¶

To run a benchmark call the runbenchmark.py script specifying the framework to evaluate.

See the API Documentation. for more information on the parameters available.