While writing my dataset class, I find that I need to have a sequence of transforms to be applied to the data that I’m reading from a Kaggle dataset. This can be for example, resize, align, normalize and conversion to tensor for PyTorch.
Should I use the steps library or transforms.compose, as described here:
In the above tutorial, each class has an init() and a call() method to be used as a transformer.
The Steps BaseTransformer however, has a different API, init(), fit() and transform().
If I had a class like this, with a transforms and transforms_mask parameter in the constructor, should I use a transforms class with the simple call api, or should I try and use the steps library.
I understand that the Steps library is meant to be used outside as part of a pipeline. But this is a special case where it doesn’t make much sense reading the individual items unprocessed from the dataset. So I have a need to pre-process the images before returning it to my dataloader.
What would you recommend that I do? I don’t want to end up writing two sets of composable transformers.
class DSTLSIFDDataset(Dataset): """ Kaggle DSTL Satellite Imagery Dataset class. The download parameters are stored in a experiment/*_model/params.yaml file. You can also specify a set of transformers for the image and the mask. For image, we will use transforms.compose to sequence image resize and image alignment operations. Finally, the actual image loader, mask generator, and reflectance index calculator functions are assigned in the class constructor. This gives some flexibility for re-assignment using different implementations without rewriting large portions of this code. """ def __init__(self, dataset_params, transform=None, transform_mask=None, download=True): super(DSTLSIFDDataset, self).__init__()