Description for datasets and dataloaders - Open Solution Mapping Challenge


#1

HI @jakub_czakon,

Could you please provide a very short one line description of what the following datasets and dataloaders do?
https://github.com/minerva-ml/open-solution-mapping-challenge/blob/master/loaders.py

class MetadataImageSegmentationDataset(Dataset)
class MetadataImageSegmentationDatasetDistances(Dataset)
class ImageSegmentationDataset(Dataset)

class MetadataImageSegmentationMultitaskDataset(Dataset)
class ImageSegmentationMultitaskDataset(Dataset)

class ImageSegmentationLoaderPatchingTrain(ImageSegmentationLoaderBasic)
class ImageSegmentationLoaderPatchingInference(ImageSegmentationLoaderBasic)

class MetadataImageSegmentationLoader(ImageSegmentationLoaderBasic)
class MetadataImageSegmentationLoaderDistances(ImageSegmentationLoaderBasic)

class MetadataImageSegmentationMultitaskLoader(ImageSegmentationLoaderBasic)

class ImageSegmentationLoader(ImageSegmentationLoaderBasic)

class ImageSegmentationMultitaskLoader(ImageSegmentationLoaderBasic)
class ImageSegmentationMultitaskLoaderPatchingTrain(ImageSegmentationLoaderPatchingTrain)
class ImageSegmentationMultitaskLoaderPatchingInference(ImageSegmentationLoaderPatchingInference)

class PatchCombiner(BaseTransformer)

e.g

  1. What is the difference between MetadataImageSegmentationDataset and MetadataImageSegmentationMultitaskDataset?

  2. What is the difference between MetadataImageSegmentationLoader and MetadataImageSegmentationLoaderDistances?

  3. What does ImageSegmentationMultitaskLoader do?

  4. What does PatchCombiner do?

Regards,

Elvis Dowson


#2

Hi, sorry for late reply but I thought I would be able to find time to generate proper docstrings and point you to them.
As I don’t want to keep you waiting any longer I figured I’d drop a line here:

  1. Multitask stands for a network with multiple outputs (fork architecture). In the case of this challenge (more so DSB 2018) it is seperate output head for object maps and object contours.
    For example list of targets contains mask and boundary for DSB

  2. Distances stands for the distance map to the closest 2 objects. Later cross entropy loss is weighted with those distance weights penalizing errors for that are close to object edges. Very interesing concept in its own right if you think about it.

  3. MultitaskLoader is just your typical pytorch loader that wraps MultitaskDataset .
    Remember loader and dataset in pytorch are 2 stronly coupled concepts.

  4. PatchCombiner was build to work for a problem where we slide a window over padded image and predict on those overlapping image patches. So this objects takes those patches and combines them together. It also deals with some basic TTA. I don’t like the implementation of this particular object and I do not advice to use it.

Best,
Jakub