Could I use tf.session() in enviroment where keras only used? - tensorflow

Thanks for reading my question.
I was using keras to develop my reinforcement learning agent based on keres-rl. But I want to upgrade my agent so that I get some update from open ai base line code for better action exploration. But the code used tensorflow only. It is my first time to use tensorflow. I am so confused. I build keras deep learninng model using its "Model API". I have never concerned about inside of model. But the code I referenced was full of the code that kick in inside of deep learning model and give some change to its weight and get immediate layer output using tf.Session(). The framework is so flexible. Like below, using tf.Session() the tensor, which is recognized tensor and is not callable, can get result feeding feed_dict data. In keras, it is impossible as far as I know.
Once I allow using tf.Session(), my architecture will be complex and nobody wants to understand and use it except that I can adapt reference code more easily.
On the other side, if I don't allow that, I needs to break down my existing model and use tons of K.function to get middle layer's output or something that I can't get from keras model.
import numpy as np
from keras.layers import Dense, Input, BatchNormalization
from keras.models import Model
import tensorflow as tf
import keras.backend as K
import rl2.tf_util as U
def normalize(x, stats):
if stats is None:
return x
return (x - stats.mean) / (stats.std + 1e-8)
class RunningMeanStd(object):
# https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
def __init__(self, my, epsilon=1e-2, shape=()):
self._sum = K.variable(value=np.zeros(shape), dtype=tf.float32, name=my+"_runningsum")
self._sumsq = K.variable(value=np.zeros(shape) + epsilon, dtype=tf.float32, name=my+"_runningsumsq")
self._count = K.variable(value=np.zeros(()) + epsilon, dtype=tf.float32, name=my+"_count")
self.mean = self._sum / self._count
self.std = K.sqrt(K.maximum((self._sumsq / self._count) - K.square(self.mean), epsilon))
newsum = K.variable(value=np.zeros(shape), dtype=tf.float32, name=my+'_sum')
newsumsq = K.variable(value=np.zeros(shape), dtype=tf.float32, name=my+'_var')
newcount = K.variable(value=np.zeros(()), dtype=tf.float32, name=my+'_count')
self.incfiltparams = K.function([newsum, newsumsq, newcount], [],
updates=[K.update_add(self._sum, newsum),
K.update(self._sumsq, newsumsq),
K.update(self._count, newcount)])
def update(self, x):
x = x.astype('float64')
n = int(np.prod(self.shape))
totalvec = np.zeros(n*2+1, 'float64')
addvec = np.concatenate([x.sum(axis=0).ravel(), np.square(x).sum(axis=0).ravel(), np.array([len(x)],dtype='float64')])
self.incfiltparams(totalvec[0:n].reshape(self.shape),
totalvec[n:2*n].reshape(self.shape),
totalvec[2*n])
i = Input(shape=(1,))
# h = BatchNormalization()(i)
h = Dense(4, activation='relu', kernel_initializer='he_uniform')(i)
h = Dense(10, activation='relu', kernel_initializer='he_uniform')(h)
o = Dense(1, activation='linear', kernel_initializer='he_uniform')(h)
model = Model(i, o)
obs_rms = RunningMeanStd(my='obs', shape=(1,))
normalized_obs0 = K.clip(normalize(i, obs_rms), 0, 100)
tf2 = model(normalized_obs0)
# print(model.predict(np.asarray([2,2,2,2,2]).reshape(5,)))
# print(tf(np.asarray([2,2,2,2,2]).reshape(5,)))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run([tf2], feed_dict={i : U.adjust_shape(i, [np.asarray([2,]).reshape(1,)])}))

Related

Loss function with derivative in TensorFlow 2

I am using TF2 (2.3.0) NN to approximate the function y which solves the ODE: y'+3y=0
I have defined cutsom loss class and function in which I am trying to differentiate the single output with respect to the single input so the equation holds, provided that y_true is zero:
from tensorflow.keras.losses import Loss
import tensorflow as tf
class CustomLossOde(Loss):
def __init__(self, x, model, name='ode_loss'):
super().__init__(name=name)
self.x = x
self.model = model
def call(self, y_true, y_pred):
with tf.GradientTape() as tape:
tape.watch(self.x)
y_p = self.model(self.x)
dy_dx = tape.gradient(y_p, self.x)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
return loss
but running the following NN:
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
from custom_loss_ode import CustomLossOde
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
loss = CustomLossOde(model.input, model)
model.compile(optimizer=Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99),loss=loss)
model.run_eagerly = True
model.fit(x_train, y_train, batch_size=16, epochs=30)
for now I am getting 0 loss from the fisrt epoch, which doesn't make any sense.
I have printed both y_true and y_test from within the function and they seem OK so I suspect that the problem is in the gradien which I didn't succeed to print.
Apprecitate any help
Defining a custom loss with the high level Keras API is a bit difficult in that case. I would instead write the training loop from scracth, as it allows a finer grained control over what you can do.
I took inspiration from those two guides :
Advanced Automatic Differentiation
Writing a training loop from scratch
Basically, I used the fact that multiple tape can interact seamlessly. I use one to compute the loss function, the other to calculate the gradients to be propagated by the optimizer.
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
# using the high level tf.data API for data handling
x_train = tf.reshape(x_train,(-1,1))
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(1)
opt = Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99)
for step, (x,y_true) in enumerate(dataset):
# we need to convert x to a variable if we want the tape to be
# able to compute the gradient according to x
x_variable = tf.Variable(x)
with tf.GradientTape() as model_tape:
with tf.GradientTape() as loss_tape:
loss_tape.watch(x_variable)
y_pred = model(x_variable)
dy_dx = loss_tape.gradient(y_pred, x_variable)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
grad = model_tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(grad, model.trainable_variables))
if step%20==0:
print(f"Step {step}: loss={loss.numpy()}")

How to solve "KeyError: '/conv2d_1/kernel:0'"

I am trying to use the colab to run the gym package with pacman, since the spec in colab is more powerful than my notebook. This program is successful simulate in Jupyter in my notebook, which using tensorflow 1.14. However, the errors keep appears when I put in google colab to simulate, and I also debug and change part of the code, so that the code can be used in tensor flow 2.0. Below is my code
#First we import all the necessary libraries
import numpy as np
import gym
import tensorflow as tf
from tensorflow import keras
from keras.layers import Flatten, Conv2D, Dense
#from tensorflow.contrib.layers import Flatten, conv2d, Dense
from collections import deque, Counter
import random
from datetime import datetime
#Now we define a function called preprocess_observation for preprocessing our input game screen.
#We reduce the image size and convert the image into greyscale.
color = np.array([210, 164, 74]).mean()
def preprocess_observation(obs):
# Crop and resize the image
img = obs[1:176:2, ::2]
# Convert the image to greyscale
img = img.mean(axis=2)
# Improve image contrast
img[img==color] = 0
# Next we normalize the image from -1 to +1
img = (img - 128) / 128-1
return img.reshape(88,80,1)
#Let us initialize our gym environment
env = gym.make('MsPacman-v0')
n_outputs = env.action_space.n
print(n_outputs)
print(env.env.get_action_meanings())
observation = env.reset()
import tensorflow as tf
import matplotlib.pyplot as plt
for i in range(22):
if i > 20:
plt.imshow(observation)
plt.show()
observation, _, _, _ = env.step(1)
#Okay, Now we define a function called q_network for building our Q network. We input the game state to the Q network
#and get the Q values for all the actions in that state.
#We build Q network with three convolutional layers with same padding followed by a fully connected layer.
tf.compat.v1.reset_default_graph()
def q_network(X, name_scope):
# Initialize layers
initializer = tf.compat.v1.keras.initializers.VarianceScaling(scale=2.0)
with tf.compat.v1.variable_scope(name_scope) as scope:
# initialize the convolutional layers
#layer_1 = tf.keras.layers.Conv2D(X, 32, kernel_size=(8,8), stride=4, padding='SAME', weights_initializer=initializer)
layer_1_set = Conv2D(32, (8,8), strides=4, padding="SAME", kernel_initializer=initializer)
layer_1= layer_1_set(X)
tf.compat.v1.summary.histogram('layer_1',layer_1)
#layer_2 = tf.keras.layers.Conv2D(layer_1, 64, kernel_size=(4,4), stride=2, padding='SAME', weights_initializer=initializer)
layer_2_set = Conv2D(64, (4,4), strides=2, padding="SAME", kernel_initializer=initializer)
layer_2= layer_2_set(layer_1)
tf.compat.v1.summary.histogram('layer_2',layer_2)
#layer_3 = tf.keras.layers.Conv2D(layer_2, 64, kernel_size=(3,3), stride=1, padding='SAME', weights_initializer=initializer)
layer_3_set = Conv2D(64, (3,3), strides=1, padding="SAME", kernel_initializer=initializer)
layer_3= layer_3_set(layer_2)
tf.compat.v1.summary.histogram('layer_3',layer_3)
flatten_layer = Flatten() # instantiate the layer
flat = flatten_layer(layer_3)
fc_set = Dense(128, kernel_initializer=initializer)
fc=fc_set(flat)
tf.compat.v1.summary.histogram('fc',fc)
#Add final output layer
output_set = Dense(n_outputs, activation= None, kernel_initializer=initializer)
output= output_set(fc)
tf.compat.v1.summary.histogram('output',output)
vars = {v.name[len(scope.name):]: v for v in tf.compat.v1.get_collection(key=tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope=scope.name)}
#Return both variables and outputs together
return vars, output
#Next we define a function called epsilon_greedy for performing epsilon greedy policy. In epsilon greedy policy we either select the best action
#with probability 1 - epsilon or a random action with probability epsilon.
#We use decaying epsilon greedy policy where value of epsilon will be decaying over time
#as we don't want to explore forever. So over time our policy will be exploiting only good actions.
epsilon = 0.5
eps_min = 0.05
eps_max = 1.0
eps_decay_steps = 500000
def epsilon_greedy(action, step):
p = np.random.random(1).squeeze()
epsilon = max(eps_min, eps_max - (eps_max-eps_min) * step/eps_decay_steps)
if np.random.rand() < epsilon:
return np.random.randint(n_outputs)
else:
return action
#Now, we initialize our experience replay buffer of length 20000 which holds the experience.
#We store all the agent's experience i.e (state, action, rewards) in the
#experience replay buffer and we sample from this minibatch of experience for training the network.
buffer_len = 20000
exp_buffer = deque(maxlen=buffer_len)
# Now we define our network hyperparameters,
num_episodes = 800
batch_size = 48
input_shape = (None, 88, 80, 1)
learning_rate = 0.001
X_shape = (None, 88, 80, 1)
discount_factor = 0.97
global_step = 0
copy_steps = 100
steps_train = 4
start_steps = 2000
logdir = 'logs'
tf.compat.v1.reset_default_graph()
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
# Now we define the placeholder for our input i.e game state
X = tf.placeholder(tf.float32, shape=X_shape)
#X = tf.Variable(tf.float32, tf.ones(shape=X_shape))
# we define a boolean called in_training_model to toggle the training
in_training_mode = tf.placeholder(tf.bool)
# we build our Q network, which takes the input X and generates Q values for all the actions in the state
mainQ, mainQ_outputs = q_network(X, 'mainQ')
# similarly we build our target Q network, for policy evaluation
targetQ, targetQ_outputs = q_network(X, 'targetQ')
# define the placeholder for our action values
X_action = tf.placeholder(tf.int32, shape=(None,))
Q_action = tf.reduce_sum(targetQ_outputs * tf.one_hot(X_action, n_outputs), axis=-1, keepdims=True)
#Copy the primary Q network parameters to the target Q network
copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in mainQ.items()]
copy_target_to_main = tf.group(*copy_op)
#Compute and optimize loss using gradient descent optimizer
# define a placeholder for our output i.e action
y = tf.placeholder(tf.float32, shape=(None,1))
# now we calculate the loss which is the difference between actual value and predicted value
loss = tf.reduce_mean(tf.square(y - Q_action))
# we use adam optimizer for minimizing the loss
optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
loss_summary = tf.summary.scalar('LOSS', loss)
merge_summary = tf.summary.merge_all()
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())
Ok up to here, the error come out when i run this cell in colab :
#Copy the primary Q network parameters to the target Q network
copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in mainQ.items()]
copy_target_to_main = tf.group(*copy_op)
The error gives:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-13-58715282cea8> in <module>()
----> 1 copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in mainQ.items()]
2 copy_target_to_main = tf.group(*copy_op)
<ipython-input-13-58715282cea8> in <listcomp>(.0)
----> 1 copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in mainQ.items()]
2 copy_target_to_main = tf.group(*copy_op)
KeyError: '/conv2d_1/kernel:0'
I have two question?
First, how to solve the question that already stated above.
Second, in tensor-flow 2.0 above,the placeholder command is replaced by tf.Variable, i rewrite the code:
X = tf.placeholder(tf.float32, shape=X_shape) to become
X = tf.Variable(tf.float32, tf.ones(shape=X_shape))
and still get error, and i have to use command below:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
X = tf.placeholder(tf.float32, shape=X_shape)
but get warning like this:
WARNING:tensorflow:From /usr/local/lib/python3.6/dist- packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating: non-resource variables are not supported in the long term
I doing intensive searching in the Stack overflow website by using keyword, yet i can't find solution. Really looking forward to any advise. Thank you very much.

Keras Loss Function with Additional Dynamic Parameter

I'm working on implementing prioritized experience replay for a deep-q network, and part of the specification is to multiply gradients by what's know as importance sampling (IS) weights. The gradient modification is discussed in section 3.4 of the following paper: https://arxiv.org/pdf/1511.05952.pdf I'm struggling with creating a custom loss function that takes in an array of IS weights in addition to y_true and y_pred.
Here's a simplified version of my model:
import numpy as np
import tensorflow as tf
# Input is RAM, each byte in the range of [0, 255].
in_obs = tf.keras.layers.Input(shape=(4,))
# Normalize the observation to the range of [0, 1].
norm = tf.keras.layers.Lambda(lambda x: x / 255.0)(in_obs)
# Hidden layers.
dense1 = tf.keras.layers.Dense(128, activation="relu")(norm)
dense2 = tf.keras.layers.Dense(128, activation="relu")(dense1)
dense3 = tf.keras.layers.Dense(128, activation="relu")(dense2)
dense4 = tf.keras.layers.Dense(128, activation="relu")(dense3)
# Output prediction, which is an action to take.
out_pred = tf.keras.layers.Dense(2, activation="linear")(dense4)
opt = tf.keras.optimizers.Adam(lr=5e-5)
network = tf.keras.models.Model(inputs=in_obs, outputs=out_pred)
network.compile(optimizer=opt, loss=huber_loss_mean_weighted)
Here's my custom loss function, which is just an implementation of Huber Loss multiplied by the IS weights:
'''
' Huber loss: https://en.wikipedia.org/wiki/Huber_loss
'''
def huber_loss(y_true, y_pred):
error = y_true - y_pred
cond = tf.keras.backend.abs(error) < 1.0
squared_loss = 0.5 * tf.keras.backend.square(error)
linear_loss = tf.keras.backend.abs(error) - 0.5
return tf.where(cond, squared_loss, linear_loss)
'''
' Importance Sampling weighted huber loss.
'''
def huber_loss_mean_weighted(y_true, y_pred, is_weights):
error = huber_loss(y_true, y_pred)
return tf.keras.backend.mean(error * is_weights)
The important bit is that is_weights is dynamic, i.e. it's different each time fit() is called. As such, I cannot simply close over is_weights as described here: Make a custom loss function in keras
I found this code online, which appears to use a Lambda layer to compute the loss: https://github.com/keras-team/keras/blob/master/examples/image_ocr.py#L475 It looks promising, but I'm struggling to understand it/adapt it to my particular problem. Any help is appreciated.
OK. Here is an example.
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Flatten
from keras.models import Model
from keras.losses import categorical_crossentropy
def sample_loss( y_true, y_pred, is_weight ) :
return is_weight * categorical_crossentropy( y_true, y_pred )
x = Input(shape=(32,32,3), name='image_in')
y_true = Input( shape=(10,), name='y_true' )
is_weight = Input(shape=(1,), name='is_weight')
f = Conv2D(16,(3,3),padding='same')(x)
f = MaxPool2D((2,2),padding='same')(f)
f = Conv2D(32,(3,3),padding='same')(f)
f = MaxPool2D((2,2),padding='same')(f)
f = Conv2D(64,(3,3),padding='same')(f)
f = MaxPool2D((2,2),padding='same')(f)
f = Flatten()(f)
y_pred = Dense(10, activation='softmax', name='y_pred' )(f)
model = Model( inputs=[x, y_true, is_weight], outputs=y_pred, name='train_only' )
model.add_loss( sample_loss( y_true, y_pred, is_weight ) )
model.compile( loss=None, optimizer='sgd' )
print model.summary()
Note, since you've add loss through add_loss(), you don't have to do it through compile( loss=xxx ).
With regards to train a model, nothing is special except you move y_true to your input end. See below
import numpy as np
a = np.random.randn(8,32,32,3)
a_true = np.random.randn(8,10)
a_is_weight = np.random.randint(0,2,size=(8,1))
model.fit( [a, a_true, a_is_weight] )
Finally, you can make a testing model (which share all weights in model) for easier use, i.e.
test_model = Model( inputs=x, outputs=y_pred, name='test_only' )
a_pred = test_model.predict( a )

Convolutional neural network outputting equal probabilities for all labels

I am currently training a CNN on MNIST, and the output probabilities (softmax) are giving [0.1,0.1,...,0.1] as training goes on. The initial values aren't uniform, so I can't figure out if I'm doing something stupid here?
I'm only training for 15 steps, just to see how training progresses; even though that's a low number, I don't think that should result in uniform predictions?
import numpy as np
import tensorflow as tf
import imageio
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
# Getting data
from sklearn.model_selection import train_test_split
def one_hot_encode(data):
new_ = []
for i in range(len(data)):
_ = np.zeros([10],dtype=np.float32)
_[int(data[i])] = 1.0
new_.append(np.asarray(_))
return new_
data = np.asarray(mnist["data"],dtype=np.float32)
labels = np.asarray(mnist["target"],dtype=np.float32)
labels = one_hot_encode(labels)
tr_data,test_data,tr_labels,test_labels = train_test_split(data,labels,test_size = 0.1)
tr_data = np.asarray(tr_data)
tr_data = np.reshape(tr_data,[len(tr_data),28,28,1])
test_data = np.asarray(test_data)
test_data = np.reshape(test_data,[len(test_data),28,28,1])
tr_labels = np.asarray(tr_labels)
test_labels = np.asarray(test_labels)
def get_conv(x,shape):
weights = tf.Variable(tf.random_normal(shape,stddev=0.05))
biases = tf.Variable(tf.random_normal([shape[-1]],stddev=0.05))
conv = tf.nn.conv2d(x,weights,[1,1,1,1],padding="SAME")
return tf.nn.relu(tf.nn.bias_add(conv,biases))
def get_pool(x,shape):
return tf.nn.max_pool(x,ksize=shape,strides=shape,padding="SAME")
def get_fc(x,shape):
sh = x.get_shape().as_list()
dim = 1
for i in sh[1:]:
dim *= i
x = tf.reshape(x,[-1,dim])
weights = tf.Variable(tf.random_normal(shape,stddev=0.05))
return tf.nn.relu(tf.matmul(x,weights) + tf.Variable(tf.random_normal([shape[1]],stddev=0.05)))
#Creating model
x = tf.placeholder(tf.float32,shape=[None,28,28,1])
y = tf.placeholder(tf.float32,shape=[None,10])
conv1_1 = get_conv(x,[3,3,1,128])
conv1_2 = get_conv(conv1_1,[3,3,128,128])
pool1 = get_pool(conv1_2,[1,2,2,1])
conv2_1 = get_conv(pool1,[3,3,128,512])
conv2_2 = get_conv(conv2_1,[3,3,512,512])
pool2 = get_pool(conv2_2,[1,2,2,1])
conv3_1 = get_conv(pool2,[3,3,512,1024])
conv3_2 = get_conv(conv3_1,[3,3,1024,1024])
conv3_3 = get_conv(conv3_2,[3,3,1024,1024])
conv3_4 = get_conv(conv3_3,[3,3,1024,1024])
pool3 = get_pool(conv3_4,[1,3,3,1])
fc1 = get_fc(pool3,[9216,1024])
fc2 = get_fc(fc1,[1024,10])
softmax = tf.nn.softmax(fc2)
loss = tf.losses.softmax_cross_entropy(logits=fc2,onehot_labels=y)
train_step = tf.train.AdamOptimizer().minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(15):
print(i)
indices = np.random.randint(len(tr_data),size=[200])
batch_data = tr_data[indices]
batch_labels = tr_labels[indices]
sess.run(train_step,feed_dict={x:batch_data,y:batch_labels})
Thank you so much.
There are several issues with your code, including elementary ones. I strongly suggest you first go through the Tensorflow step-by-step tutorials for MNIST, MNIST For ML Beginners and Deep MNIST for Experts.
In short, regarding your code:
First, your final layer fc2 should not have a ReLU activation.
Second, the way you build your batches, i.e.
indices = np.random.randint(len(tr_data),size=[200])
is by just grabbing random samples in each iteration, which is far from the correct way of doing so...
Third, the data you feed into the network are not normalized in [0, 1], as they should be:
np.max(tr_data[0]) # get the max value of your first training sample
# 255.0
The third point was initially puzzling for me, too, since in the aforementioned Tensorflow tutorials they don't seem to normalize the data either. But close inspection revealed the reason: if you import the MNIST data through the Tensorflow-provided utility functions (instead of the scikit-learn ones, as you do here), they come already normalized in [0, 1], something that is nowhere hinted at:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
np.max(mnist.train.images[0])
# 0.99607849
This is an admittedly strange design decision - as far as I am aware of, in all other similar cases/tutorials normalizing the input data is an explicit part of the pipeline (see e.g. the Keras example), and with good reason (it is something you will be certainly expected to do yourself later, when using your own data).

How do I call model.predict with data stored in GPU in Keras

I have built and trained a Sequential model.
Now before each model.predict call I want to upload the data into GPU, do some operations and then call model.predict using the output stored in GPU without downloading to memory and handover to keras model for it to upload to gpu again.
Edit:
I would like to use opencv operations on the input image in gpu and use the output directly to call model.predict if possible.
You can easily achieve this by adding the operations as Lambda layers on the top of the model.
Here is a very simple example.. you can extend from here:
import numpy as np
from keras import backend as K
from keras.models import Sequential, Model
from keras.layers import Dense, Lambda, Input, merge
X = np.random.random((1000,5))
Y = np.random.random((1000,1))
inp = Input(shape = (5,))
d1 = Dense(60, input_dim=5, init='normal', activation='relu')
d2 = Dense(1, init='normal', activation='sigmoid')
out = d2(d1(inp))
model = Model(input=[inp], output=[out])
model.compile(loss='binary_crossentropy', optimizer='adam')
model.summary()
model.fit(X, Y, nb_epoch=1)
X1 = np.random.random((10,3))
X2 = np.random.random((10,2))
inp1 = Input(shape = (3,))
inp2 = Input(shape = (2,))
p1 = Lambda(lambda x: K.sqrt(x))(inp1)
p2 = Lambda(lambda x: K.tf.exp(x))(inp2)
mer = merge([p1, p2], mode='concat')
out2 = d2(d1(mer))
model2 = Model(input=[inp1, inp2], output=[out2])
model2.summary()
ypred = model2.predict([X1, X2])
print ypred.shape
Here from model.summary() you can see both the models are sharing the upper layers so in essence is using the weights already learnt during the training of the first model