Pytorch extension for OpenML python¶

Pytorch extension for openml-python API. This library provides a simple way to run your Pytorch models on OpenML tasks.

For a more native experience, PyTorch itself provides OpenML integrations for some tasks. You can find more information here.

Installation Instructions:¶

pip install openml-pytorch

PyPi link https://pypi.org/project/openml-pytorch/

Usage¶

To use this extension, you need to have a task from OpenML. You can either browse the OpenML website to find a task (and get it's ID), or follow the example to create a task from a custom dataset.

Set the API key for OpenML from the command line:

openml configure apikey <your API key>

Then, follow one of the examples in the Examples folder to see how to use this extension for your type of data.

Import openML libraries

import torch.nn
import torch.optim

import openml_pytorch.config
import openml
import logging

from openml_pytorch.trainer import OpenMLTrainerModule
from openml_pytorch.trainer import OpenMLDataModule
from torchvision.transforms import Compose, Resize, ToPILImage, ToTensor, Lambda
import torchvision
from openml_pytorch.trainer import convert_to_rgb

Create a pytorch model and get a task from openML

model = torchvision.models.efficientnet_b0(num_classes=200)
# Download the OpenML task for tiniest imagenet
task = openml.tasks.get_task(362128)

Download the task from openML and define Data and Trainer configuration

transform = Compose(
    [
        ToPILImage(),  # Convert tensor to PIL Image to ensure PIL Image operations can be applied.
        Lambda(
            convert_to_rgb
        ),  # Convert PIL Image to RGB if it's not already.
        Resize(
            (64, 64)
        ),  # Resize the image.
        ToTensor(),  # Convert the PIL Image back to a tensor.
    ]
)
data_module = OpenMLDataModule(
    type_of_data="image",
    file_dir="datasets",
    filename_col="image_path",
    target_mode="categorical",
    target_column="label",
    batch_size = 64,
    transform=transform
)
trainer = OpenMLTrainerModule(
    data_module=data_module,
    verbose = True,
    epoch_count = 1,
)
openml_pytorch.config.trainer = trainer

Run the model on the task

run = openml.runs.run_model_on_task(model, task, avoid_duplicate_runs=False)
run.publish()
print('URL for run: %s/run/%d' % (openml.config.server, run.run_id))

Note: The input layer of the network should be compatible with OpenML data output shape. Please check examples for more information.

Additionally, if you want to publish the run with onnx file, then you must call openml_pytorch.add_onnx_to_run() immediately before run.publish().

run = openml_pytorch.add_onnx_to_run(run)