OpenML is readily integrated with scikit-learn through the Python API.
This page provides a brief overview of the key features and installation instructions. For more detailed API documentation, please refer to the official documentation.
importopenml# List all datasets and their propertiesopenml.datasets.list_datasets(output_format="dataframe")# Get dataset by IDdataset=openml.datasets.get_dataset(61)# Get dataset by namedataset=openml.datasets.get_dataset('Fashion-MNIST')# Get the data itself as a dataframe (or otherwise)X,y,_,_=dataset.get_data(dataset_format="dataframe")
Download tasks, run models locally, publish results (with scikit-learn)¶
fromsklearnimportensemblefromopenmlimporttasks,runs# Build any model you likeclf=ensemble.RandomForestClassifier()# Download any OpenML tasktask=tasks.get_task(3954)# Run and evaluate your model on the taskrun=runs.run_model_on_task(clf,task)# Share the results on OpenML. Your API key can be found in your account.# openml.config.apikey = 'YOUR_KEY'run.publish()
# List all tasks in a benchmarkbenchmark=openml.study.get_suite('OpenML-CC18')tasks.list_tasks(output_format="dataframe",task_id=benchmark.tasks)# Return benchmark resultsopenml.evaluations.list_evaluations(function="area_under_roc_curve",tasks=benchmark.tasks,output_format="dataframe")