GPU very slow gcp-gpu-medium


I am using gcp-gpu-medium to train my model, but it seems slower then my 8 core desktop cpu. Why is that?


Hi @xxdaggerxx,

It’s hard to say what’s the reason of this issue by now. We will give it a closer look on Monday.


Hi @xxdaggerxx,

It looks like your experiment haven’t utilized any GPU. Could you share your code with us so we can investigate this problem deeper?

#%% Import PAckages
##Pandas is for data set amnipulation, Numpy is advance math, matplot is plotting functions.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#import Utilities
#Import AI frame work
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.recurrent import LSTM, GRU
from keras.layers import Convolution1D, MaxPooling1D, AtrousConvolution1D, RepeatVector
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, CSVLogger
from keras.layers.wrappers import Bidirectional
from keras import regularizers
from keras.layers.normalization import BatchNormalization
from keras.layers.advanced_activations import *
from keras.optimizers import RMSprop, Adam, SGD, Nadam
from keras.initializers import *
from keras.utils import np_utils
from keras.preprocessing.text import Tokenizer
from pandas import DataFrame, Series

from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Embedding
from keras.layers import TimeDistributed
from keras.layers import Bidirectional

from numpy import array

from scipy.stats.mstats import zscore

from scipy.interpolate import interp1d
from deepsense import neptune

context = neptune.Context()

#%% Set Up functions.
## extract a specif colums.
def extractColum(columname):
    target_price = data_original_import[columname].to_frame().reset_index(drop=True)
    return target_price

# convert series to supervised learning
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = pd.DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = pd.concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
	return agg

# convert series to supervised learning
def series_to_supervised_2(data, datashape,n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else datashape
	df = pd.DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = pd.concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
	return agg

def split_seq(seq, size):
        newseq = []
        splitsize = 1.0/size*len(seq)
        for i in range(size):
        return newseq
#%split classes
def splitClasses(returns, splits):
    ##sort retruns by decending
    sortedReturns = np.sort(returns)[::-1]
    ##split sorted array into equal parts
    splitReturns = np.array_split(sortedReturns,splits)
    newclasses = np.array(returns)
    returns = np.array(returns)
    for v in range(splits):
        print(str(v)+'  '+ str(splitReturns[v][0]))
        mask = (newclasses <= splitReturns[v][0])
       , mask, v)
    return returns

def normalize_Data(DataIn, period):
    #DataIn = volume
    #period = 20
        numFeatures = DataIn.shape[1]
        numFeatures = 1
    normalization_period = period
    reframed_norm = series_to_supervised_2(DataIn,numFeatures, normalization_period, 0)
    data_values = np.array(reframed_norm.values)
    data_values = zscore(data_values, axis=1)
    prossesed_Data = data_values[:,data_values.shape[1]-numFeatures:data_values.shape[1]]
    fillerData = pd.DataFrame(np.zeros((normalization_period-1,numFeatures)))
    prossesed_Data = np.vstack((fillerData,prossesed_Data))
    ##cap ends
    fillerData = pd.DataFrame(np.zeros((1,numFeatures)))
    prossesed_Data = np.vstack((prossesed_Data,fillerData))

    return prossesed_Data

def rescale(values,old_min, old_max, new_min = 0, new_max = 100):
    output = []
    #old_min, old_max = min(values), max(values)

    for v in values:
        new_v = (new_max - new_min) / (old_max - old_min) * (v - old_min) + new_min

    return np.array(output)

#%% Start code. IMport orginal Data and phrase it. 
##time frame from MT5 meta quotes GMT +1
##data_original_import = pd.read_csv('C:\\Users\\Aaron\\AppData\\Roaming\\MetaQuotes\\Terminal\\FB9A56D617EDDDFE29EE54EBEFFE96C1\\MQL5\\Files\\TrainingData\\EURUSD_RAW_PRICE_1HR.csv') 
data_original_import = pd.read_csv('/input/EURUSD_RAW_PRICE_1MIN_ZZ.csv') 
##flip the data frame and reset the index.
data_original_import = data_original_import[::-1].reset_index(drop=True)
## Add time index.
timep = pd.DataFrame(np.linspace(1, len(data_original_import), num=len(data_original_import)), columns=['TimeNumber'])
data_original_import = pd.concat([data_original_import, timep], axis=1)

##crop data
data_original_import = data_original_import.loc[0:120000,:]

##Trading hours is 9 hours to 18, we will remove all others.
#data_original_import = data_original_import[data_original_import.hours >= 7]
#data_original_import = data_original_import[data_original_import.hours <= 17]

# Create CLassfication Ouputs.
features = 5
lookBackData = 160
PredictionHorizon = 0
NumOutputs = 4
normalization_period = 20

price_data = data_original_import.drop(data_original_import.columns[[4,5,6,7,8,9,10,11,12,13,14,15,16,17]], axis=1, inplace=False)
##normalize data
price_data_n = np.array(normalize_Data(price_data, normalization_period))
volume_n = rescale(data_original_import['Volume'],0,200, new_min = 0, new_max = 1)
#ATR_n = rescale(data_original_import['ATR'],0,0.001, new_min = 0, new_max = 1)
#CCI_n = rescale(data_original_import['CCI'],0,100, new_min = 0, new_max = 1)
#Momentum_n = rescale(data_original_import['Momentum'],99.5,100, new_min = 0, new_max = 1)
#Stoch1_n = rescale(data_original_import['Stoch1'],0,100, new_min = 0, new_max = 1)
#Stoch2_n = rescale(data_original_import['Stoch2'],0,100, new_min = 0, new_max = 1)
#force_n = data_original_import['Force']*1000 
#std_n = normalize_Data(data_original_import['Sd'], normalization_period)

##combine normalized data
combined_data_n = np.hstack((price_data_n,np.array([volume_n]).T))
#combined_data_n = np.hstack((combined_data_n,np.array([ATR_n]).T))
#combined_data_n = np.hstack((combined_data_n,np.array([CCI_n]).T))
#combined_data_n = np.hstack((combined_data_n,np.array([Momentum_n]).T))
#combined_data_n = np.hstack((combined_data_n,np.array([Stoch1_n]).T))
#combined_data_n = np.hstack((combined_data_n,np.array([Stoch2_n]).T))
#combined_data_n = np.hstack((combined_data_n,np.array([force_n]).T))
#combined_data_n = np.hstack((combined_data_n,np.array([force_n]).T))

reframed_Data= series_to_supervised(combined_data_n, lookBackData, PredictionHorizon)

values = np.array(reframed_Data.values)
#n_train_hours = lookBackData * features

##seprate the inputs from the outputs.
#inputs_raw = values[:, :n_train_hours]
##normalize inputs
#inputs_raw = zscore(values, axis=1)
inputs_raw = values
#outputs_raw = values[:, n_train_hours:]

#%% Get Zig Zag Values and sort it.
zigzag = data_original_import['ZigZag']
#zigzag = zigzag[0:len(inputs_raw)]

zigzag_indx = np.where( zigzag != 0 )[0]
zigzag_v = np.tile(np.array([0,1]), int(len(zigzag_indx)/2))
#zigzag_v = np.append(zigzag_v,1)

zigzag_combined = np.flip(np.rot90(np.vstack((zigzag_indx,zigzag_v))), axis=0)
##cap ends
zigzag_combined = np.vstack((np.array([0,1]),zigzag_combined))
zigzag_combined = np.vstack((zigzag_combined,np.array([len(zigzag)-1,1])))

##Create interpolation function.
lines = interp1d(array(zigzag_combined)[:,0], array(zigzag_combined)[:,1], kind='linear'  )
###create new linespace
xnew = np.linspace(0, max(array(zigzag_combined)[:,0]), num=max(array(zigzag_combined)[:,0])+1)
###create new indcator.
indicator = lines(xnew)


#Crop data to fit inputs.
Prediction_Values =  indicator[lookBackData-1:]
Prediction_Values =  Prediction_Values[0:len(Prediction_Values)-1]

#%% Debugging

##Check Zigzag data and close prices.
start_time = 5000
time_range = 100
get_price = np.array(reframed_Data['var2(t-1)'])
#get_price = np.array(price_data['High'])

indicator_cropped = Prediction_Values[start_time:(start_time+time_range)]
price_cropped = get_price[start_time:(start_time+time_range)]

trade_index_up = np.where( indicator_cropped == 1 )[0]
price_index_up = price_cropped[trade_index_up]

trade_index_dwn = np.where( indicator_cropped == 0 )[0]
price_index_dwn = price_cropped[trade_index_dwn]

plt.grid(color='r', linestyle='--', linewidth=0.5)

plt.grid(color='r', linestyle='--', linewidth=0.5)
plt.scatter(trade_index_up, price_index_up)
plt.scatter(trade_index_dwn, price_index_dwn)

#%% Split traning and test sets 

testSamplesNum = int(len(Prediction_Values)*0.95)

train_input_data = inputs_raw[:testSamplesNum, :]
train_ouput_data = Prediction_Values[:testSamplesNum]

test_input_data = inputs_raw[testSamplesNum:, :]
test_output_data = Prediction_Values[testSamplesNum:]

##Prepare data for neural netowrk.
##convert to 3 dimencial array
train_input_data_NN = train_input_data.reshape((train_input_data.shape[0], lookBackData, features))
outputs_NN = train_ouput_data

##create test data
test_input_data_NN = test_input_data.reshape((test_input_data.shape[0], lookBackData, features))
outputs_NN_test = test_output_data

#%% Create model

# create the model
model = Sequential()

model.add(Dropout(0.3, input_shape=(train_input_data_NN.shape[1], train_input_data_NN.shape[2])))

model.add(Bidirectional(LSTM(50, dropout=0.3, recurrent_dropout=0.3, return_sequences=True)))

model.add(Bidirectional(LSTM(10, dropout=0.3, recurrent_dropout=0.3, return_sequences=False)))

#model.add(LSTM(20, dropout=0.2, return_sequences=False, recurrent_dropout=0.2))

#model.add(LSTM(20, dropout=0.2, return_sequences=False, recurrent_dropout=0.2))

#model.add(Bidirectional(LSTM(20, dropout=0.4, recurrent_dropout=0.4)))

#model.add(Bidirectional(LSTM(20, dropout=0.4, recurrent_dropout=0.4)))
#model.add(Dense(1, activation='sigmoid'))

optimizer = Adam(lr=0.001)
model.compile(loss='mae', optimizer='SGD')

history =, outputs_NN, epochs=150,  batch_size=128, verbose=1, validation_data=(test_input_data_NN, outputs_NN_test))

# plot history
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')

#%% predict
predictions = model.predict(test_input_data_NN)
#scores = model.evaluate(test_input_data_NN, outputs_NN_test, verbose=0)
#print("Accuracy: %.2f%%" % (scores[1]*100))
#%% plot

currentPrices = np.array(reframed_Data['var1(t-1)'])
current_close_testSample = currentPrices[testSamplesNum:]

st = 5100
en = st+100

p_frame = predictions[st:en] 
o_frame = outputs_NN_test[st:en]
price_frame = current_close_testSample[st:en]

trade_index_up = np.where( o_frame == 1 )[0]
price_index_up = price_frame[trade_index_up]

trade_index_dwn = np.where( o_frame == 0 )[0]
price_index_dwn = price_frame[trade_index_dwn]

plt.plot(p_frame , label='Predictions')
plt.plot(o_frame , label='True Data')
plt.grid(color='r', linestyle='--', linewidth=0.5)

plt.grid(color='r', linestyle='--', linewidth=0.5)

plt.scatter(trade_index_up, price_index_up)
plt.scatter(trade_index_dwn, price_index_dwn)

# plot history
#plt.plot(history.history['loss'], label='train')
##pyplot.plot(history.history['val_loss'], label='test')
plt.title("True Data")
#%%Save model
# serialize model to JSON
model_json = model.to_json()
with open("model.json", "w") as json_file:
# serialize weights to HDF5
print("Saved model to disk")


Sorry, it looks like I checked the wrong experiment earlier. This code utilize GPU properly. Could you feed us with id of your experiment you have a problem with? How long does it take to execute it on your local 8 core machine?


(BP-12) Bo Exp 3.

Usually takes half the time. The GPU’s dont give me much of a speed advantage it seems.


Hi, looking at your code it seems that you are training a shallow recurrent LSTM network in keras.
Since the network is very small in order to efficiently use GPU you need to load large batches.
So I suggest that you change your batch_size=128 to batch_size=2048 in the Remember to adjust learning rate accordingly.

On a separate note LSTM implementation is notoriously slow and a few months ago new, faster implementation was released under the name CuDNNLSTM. It is a lot faster (~10x) but it has slightly less developed API where both dropout and recurrent_dropout are not available. If you really need to use them you should probably stay with LSTM if not switch to CuDNNLSTM.

If you do decide to go with CuDNNLSTM on a larger batch_size, regularization may be less of an issue. You could also try changing Dropout to SpatialDropout1D which makes more sense in this setting and experiment with CuDNNGRU which is less complex and hence less prone to overfitting.

We have implemented a lot of this stuff for kaggle toxic comment classification challenge and open-sourced it here. Feel free to use it however you like.


Hi Jakub, your suggestion works. I increased batch size to 2048 and epoch time when down from 20mins to 1 min.

Thanks very much.