ResourceExhaustedError when training large conv nets


#1

Hi,

I get a ResourceExhaustedError when training large pre trained networks in Keras ( DenseNet201 ).
Any idea why?
Also, a possible solution would be helpful.

Thanks!


#2

Please try to run your experiment on gcp-gpu-large.
We think that your problem is related to not enough memory on GPU.

neptune send --worker gcp-gpu-large

#3

Hi, I face the same issue when running on gcp-gpu-large too. Any suggestions?


#4

What batch size are you using?
Using smaller batch and/or downsizing the images helps.

What is your use case btw?
Are you using it as feature extractor, fine tuning or just for inference?


#5

Batch size = 32

I face this issue when I do aggressive data augmentation with a fully trainable Xception.

I’m using it to build a recognition.


#6

I see the following options:

Your augmentation may be multiplying the number of images that you actually pass to the net.
Are you using keras ImageDataGenerator?

I presume that you are using Xception without top layer.
How large are your images, can you use resize them to something smaller.
For recognition it should be ok because you don’t need the exact location or finely segmented images.

I’ve read that there are some problems with Xception implementation in Keras and I haven’t really used it really. Can you simply go with Resnet or Inception? If you are planning on using it in production I would suggest trying MobileNet which is extremely light and produces very good results.


#7

Oh, and one more thing.
Have you tried to run it with extremely small batch like 1 or 2?


#8

Hi, I got over the issue with using an InceptionResNetV2. Yes I was using keras’ ImageDataGenerator. Nope I haven’t tried with a batchsize of 1 or 2, but will give it a shot.

Thank you for your response!