scikit-learn¶

OpenML is readily integrated with scikit-learn through the Python API. This page provides a brief overview of the key features and installation instructions. For more detailed API documentation, please refer to the official documentation.

Key features:¶

Query and download OpenML datasets and use them however you like
Build any sklearn estimator or pipeline and convert to OpenML flows
Run any flow on any task and save the experiment as run objects
Upload your runs for collaboration or publishing
Query, download and reuse all shared runs

Installation¶

1	`pip install openml`

Query and download data¶

import openml

# List all datasets and their properties
openml.datasets.list_datasets(output_format="dataframe")

# Get dataset by ID
dataset = openml.datasets.get_dataset(61)

# Get dataset by name
dataset = openml.datasets.get_dataset('Fashion-MNIST')

# Get the data itself as a dataframe (or otherwise)
X, y, _, _ = dataset.get_data(dataset_format="dataframe")

Download tasks, run models locally, publish results (with scikit-learn)¶

from sklearn import ensemble
from openml import tasks, runs

# Build any model you like
clf = ensemble.RandomForestClassifier()

# Download any OpenML task
task = tasks.get_task(3954)

# Run and evaluate your model on the task
run = runs.run_model_on_task(clf, task)

# Share the results on OpenML. Your API key can be found in your account.
# openml.config.apikey = 'YOUR_KEY'
run.publish()

OpenML Benchmarks¶

# List all tasks in a benchmark
benchmark = openml.study.get_suite('OpenML-CC18')
tasks.list_tasks(output_format="dataframe", task_id=benchmark.tasks)

# Return benchmark results
openml.evaluations.list_evaluations(
    function="area_under_roc_curve",
    tasks=benchmark.tasks,
    output_format="dataframe"
)