Skip to content
from IPython.display import display, HTML, Markdown
import os
import yaml
with open("../../mkdocs.yml", "r") as f:
    load_config = yaml.safe_load(f)
repo_url = load_config["repo_url"].replace("https://github.com/", "")
binder_url = load_config["binder_url"]
relative_file_path = "integrations/getting_started.ipynb"
display(HTML(f"""<a href="https://colab.research.google.com/github/{repo_url}/{relative_file_path}" target="_blank">
<img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"/>
</a>"""))
display(Markdown("[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/SubhadityaMukherjee/openml_docs/HEAD?labpath=Scikit-learn%2Fdatasets_tutorial)"))

Getting Started

This page will guide you through the process of getting started with OpenML. While this page is a good starting point, for more detailed information, please refer to the integrations section and the rest of the documentation.

Authentication

  • If you are using the OpenML API to download datasets, upload results, or create tasks, you will need to authenticate. You can do this by creating an account on the OpenML website and using your API key. - You can find detailed instructions on how to authenticate in the authentication section
!pip install openml

EEG Eye State example

Download the OpenML task for the eeg-eye-state.

# License: BSD 3-Clause

import openml
from sklearn import neighbors

Warning

.. include:: ../../test_server_usage_warning.txt

openml.config.start_using_configuration_for_example()

When using the main server instead, make sure your apikey is configured. This can be done with the following line of code (uncomment it!). Never share your apikey with others.

# openml.config.apikey = 'YOURKEY'

Caching

When downloading datasets, tasks, runs and flows, they will be cached to retrieve them without calling the server later. As with the API key, the cache directory can be either specified through the config file or through the API:

  • Add the line cachedir = 'MYDIR' to the config file, replacing 'MYDIR' with the path to the cache directory. By default, OpenML will use ~/.openml/cache as the cache directory.
  • Run the code below, replacing 'YOURDIR' with the path to the cache directory.
# Uncomment and set your OpenML cache directory
# import os
# openml.config.cache_directory = os.path.expanduser('YOURDIR')
task = openml.tasks.get_task(403)
data = openml.datasets.get_dataset(task.dataset_id)
clf = neighbors.KNeighborsClassifier(n_neighbors=5)
run = openml.runs.run_model_on_task(clf, task, avoid_duplicate_runs=False)
# Publish the experiment on OpenML (optional, requires an API key).
# For this tutorial, our configuration publishes to the test server
# as to not crowd the main server with runs created by examples.
myrun = run.publish()
print(f"kNN on {data.name}: {myrun.openml_url}")
openml.config.stop_using_configuration_for_example()