Custom Datasets¶
This module contains custom dataset classes for handling image and tabular data from OpenML in PyTorch. To add support for new data types, new classes can be added to this module.
OpenMLImageDataset
¶
Bases: Dataset
Class representing an image dataset from OpenML for use in PyTorch.
Methods:
__init__(self, X, y, image_size, image_dir, transform_x=None, transform_y=None)
Initializes the dataset with given data, image size, directory, and optional transformations.
__getitem__(self, idx)
Retrieves an image and its corresponding label (if available) from the dataset at the specified index. Applies transformations if provided.
__len__(self)
Returns the total number of images in the dataset.
Source code in temp_dir/pytorch/openml_pytorch/custom_datasets.py
OpenMLTabularDataset
¶
Bases: Dataset
OpenMLTabularDataset
A custom dataset class to handle tabular data from OpenML (or any similar tabular dataset). It encodes categorical features and the target column using LabelEncoder from sklearn.
Methods: init(X, y) : Initializes the dataset with the data and the target column. Encodes the categorical features and target if provided.
__getitem__(idx): Retrieves the input data and target value at the specified index.
Converts the data to tensors and returns them.
__len__(): Returns the length of the dataset.