Train with target data generated within the model - tensorflow

How can I get the loss function used by tf.keras.Model.fit(x, y) to compare two outputs within the graph instead of one output with externally supplied target data, y?
The manual says you can use tensors for target value which sounds like what I want but that you then also need the inputs to be tensors. But my inputs are numpy arrays and I don't think I should have to change that.

1 - Easy, best - maybe not good for memory
Why not just get the expected items for the loss already?
new_y_train = non_trainable_ops_model.predict(original_y_train)
nn_model.fit(x_train, new_y_train)
This sounds definitely the best way if your memory can handle this. Simpler model, faster training.
You can even save/load the new data:
np.save(name, new_y_train)
new_y_train = np.load(name)
2 - Make the model output the loss and use a dummy loss for compiling
Losses:
def dummy_loss(true, pred):
return pred
def true_loss(x):
true, pred = x
return loss_function(true, pred) #you can probably from keras.losses import loss_function
Model:
#given
nn_model = create_nn_model()
non_trainable_ops_model = create_nto_model()
nn_input = Input(nn_input_shape)
nto_input = Input(nto_input_shape)
nn_outputs = nn_model(nn_input)
nto_outputs = non_trainable_ops_model(nto_input)
loss = Lambda(true_loss)([nto_outputs, nn_outputs])
training_model = Model([nn_input, nto_input], loss)
training_model.compile(loss = dummy_loss, ...)
training_model.fit([nn_x_train, nto_x_train], np.zeros((len(nn_x_train),)))
3 - Use model.add_loss instead of compiling a loss
Following the same as the previous answer, you can:
training_model = Model([nn_input, nto_input], nn_outputs)
loss = true_loss([nto_outputs, nn_outputs])
training_model.add_loss(loss)
training_model.compile(loss=None, ...)
training_model.fit([nn_x_train, nto_x_train], None)
4 - Enable eager execution and make custom training loops
https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough

Related

how to get labels when using model.predict()

In my project, I have a number of cases where I have a Dataset instance and I need to get predictions from some model on every item in the dataset.
The model.predict() API is optimized perfectly for this, as shown in the documentation. However, there seems to be one major catch. I also happen to need the labels to compare with the predicted values, i.e. the dataset contains x,y pairs, and I'd like to end up with (y_predicted, y) pairs after the prediction is complete. This does not seem to be possible with the predict() API though, and I can't think of a clean way to 'split' the dataset so that the x's are fed into the model and the y's are retained to be joined back up with the predicted y's.
EDIT: I know it's quite simple to do by iterating over the dataset manually and calling the model directly, e.g.
for x, y in dataset:
y_pred = model(x)
result.append((y, y_pred))
However, this seems like it will be a fair bit slower than using the inbuilt predict() as Tensorflow won't be able to multi-thread/optimize the input pipeline.
Does anyone have a good way to accomplish this?
Given the concerns you mentioned, it may be best to overwrite predict to suit your needs. You don't actually need to overwrite that function though, instead only predict_step which is called by that function. Just use this class instead of Model:
class MyModel(tf.keras.Model):
def predict_step(self, data):
x, y = data
return self(x, training=False), y
If your model is currently Sequential, inherit from that instead. Basically the only change I made from the default implementation is to add , y to the model call result.
Note that this also makes some assumptions, such that your dataset consists of (input, label) batch pairs. You may need to adapt it slightly to your needs. Here is a minimal example:
import tensorflow as tf
import numpy as np
(imgs, lbls), (te_imgs, te_lbls) = tf.keras.datasets.mnist.load_data()
imgs = imgs.astype(np.float32).reshape((-1, 784)) / 255.
te_imgs = te_imgs.astype(np.float32).reshape((-1, 784)) / 255.
lbls = lbls.astype(np.int32)
te_lbls = te_lbls.astype(np.int32)
tr_data = tf.data.Dataset.from_tensor_slices((imgs, lbls)).shuffle(60000).batch(128)
te_data = tf.data.Dataset.from_tensor_slices((te_imgs, te_lbls)).batch(128)
class MyModel(tf.keras.Model):
def predict_step(self, data):
x, y = data
return self(x, training=False), y
inp = tf.keras.Input((784,))
logits = tf.keras.layers.Dense(10)(inp)
model = MyModel(inp, logits)
opt = tf.keras.optimizers.Adam()
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(loss=loss, optimizer=opt)
something = model.predict(te_data)
print(something[0].shape, something[1].shape)
This shows ((10000, 10), (10000,)) -- predict now returns a tuple of outputs, labels (this can be confirmed by inspecting the returned labels and comparing to the images in the test set).

Using Tensorflow Dataset from_generator() to create multi Input/Output with Custom Generator and ImageDataGenerator

I am trying to scale up my model which uses a "cluster loss" extension, the implementation works so far on MNIST, but I would like to benefit from data augmentation and multi-processing for the real dataset.
In short, the network follows works done with the "centre loss", which resemble a bit a Siamese Network. The important part of the architectures is that the model has 2 inputs and 2 outputs. Therefore, I implemented a custom generator in order to feed the model as follow:
def my_generator(stop):
i = 0
while i < stop:
batch = train_gen.next()
img = batch[0]
labels = batch[1]
labels_size = np.shape(labels)
cluster = np.zeros(labels_size)
x = [img, labels]
y = [labels, cluster]
yield x, y
i += 1
which calls the generator ("train_gen") defined as follow:
generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, horizontal_flip=True)
train_gen = generator.flow_from_dataframe(df, x_col='img_path', y_col='label',
class_mode='categorical',
target_size=(32, 32),
batch_size=batch_size)
The generator works if I set only one worker in the fit function. But obviously it's painfully slow... So I tried to use the recommended tf.Data from Tensorflow (tf.data.Dataset.from_generator) to fit my model, but setting it as follow,
ds = tf.data.Dataset.from_generator(my_generator,
args=[num_iter],
output_types=([tf.float32, tf.float32], [tf.float32, tf.float32]))
I got the following error:
TypeError: Cannot convert value [tf.float32, tf.float32] to a Tensorflow DType.
From there, I tried multiple things, following this post
For example, trying to return tuples instead of arrays:
x = (img, labels)
y = (labels, cluster)
But I got:
ValueError: as_list() is not defined on an unknown TensorShape
Does anyone have experience with this? I am not sure to understand the error and I am thinking that I could change the "output_types" argument perhaps, but TensorFlow has no "list" or "tuple" DType argument.
Here is a link to my code which construct a small image dataset from cifar10 to feed a toy model.
I do not think your generator works as you expect. Each time it is called it sets i=0. The code after
yield x, y
i += 1
i += 1 never executes. Put a print statement as below
yield x, y
i += 1
print ('the value of i is ',i)
and you will see it never executes.
The above is true if you execute
x,y=next(my_generator(2))
which is how generators are used. However if you execute
x,y=my_generator(2)
then the i += 1 statement does execute. Normally with generators you use them with next(my_generator). model.fit I believe gets the next batch by using next() on the generator you specify.

How to create a custom layer in Keras with 'stateful' variables/tensors?

I would like to ask you some help for creating my custom layer.
What I am trying to do is actually quite simple: generating an output layer with 'stateful' variables, i.e. tensors whose value is updated at each batch.
In order to make everything more clear, here is a snippet of what I would like to do:
def call(self, inputs)
c = self.constant
m = self.extra_constant
update = inputs*m + c
X_new = self.X_old + update
outputs = X_new
self.X_old = X_new
return outputs
The idea here is quite simple:
X_old is initialized to 0 in the def__ init__(self, ...)
update is computed as a function of the inputs to the layer
the output of the layer is computed (i.e. X_new)
the value of X_old is set equal to X_new so that, at the next batch, X_old is no longer equal to zero but equal to X_new from the previous batch.
I have found out that K.update does the job, as shown in the example:
X_new = K.update(self.X_old, self.X_old + update)
The problem here is that, if I then try to define the outputs of the layer as:
outputs = X_new
return outputs
I will receiver the following error when I try model.fit():
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have
gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
And I keep having this error even though I imposed layer.trainable = False and I did not define any bias or weights for the layer. On the other hand, if I just do self.X_old = X_new, the value of X_old does not get updated.
Do you guys have a solution to implement this? I believe it should not be that hard, since also stateful RNN have a 'similar' functioning.
Thanks in advance for your help!
Defining a custom layer can become confusing some times. Some of the methods that you override are going to be called once but it gives you the impression that just like many other OO libraries/frameworks, they are going to be called many times.
Here is what I mean: When you define a layer and use it in a model the python code that you write for overriding call method is not going to be directly called in forward or backward passes. Instead, it's called only once when you call model.compile. It compiles the python code to a computational graph and that graph in which the tensors will flow is what does the computations during training and prediction.
That's why if you want to debug your model by putting a print statement it won't work; you need to use tf.print to add a print command to the graph.
It is the same situation with the state variable you want to have. Instead of simply assigning old + update to new you need to call a Keras function that adds that operation to the graph.
And note that tensors are immutable so you need to define the state as tf.Variable in the __init__ method.
So I believe this code is more like what you're looking for:
class CustomLayer(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(CustomLayer, self).__init__(**kwargs)
self.state = tf.Variable(tf.zeros((3,3), 'float32'))
self.constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
self.extra_constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
self.trainable = False
def call(self, X):
m = self.constant
c = self.extra_constant
outputs = self.state + tf.matmul(X, m) + c
tf.keras.backend.update(self.state, tf.reduce_sum(outputs, axis=0))
return outputs

Pytorch how to get the gradient of loss function twice

Here is what I'm trying to implement:
We calculate loss based on F(X), as usual. But we also define "adversarial loss" which is a loss based on F(X + e). e is defined as dF(X)/dX multiplied by some constant. Both loss and adversarial loss are backpropagated for the total loss.
In tensorflow, this part (getting dF(X)/dX) can be coded like below:
grad, = tf.gradients( loss, X )
grad = tf.stop_gradient(grad)
e = constant * grad
Below is my pytorch code:
class DocReaderModel(object):
def __init__(self, embedding=None, state_dict=None):
self.train_loss = AverageMeter()
self.embedding = embedding
self.network = DNetwork(opt, embedding)
self.optimizer = optim.SGD(parameters)
def adversarial_loss(self, batch, loss, embedding, y):
self.optimizer.zero_grad()
loss.backward(retain_graph=True)
grad = embedding.grad
grad.detach_()
perturb = F.normalize(grad, p=2)* 0.5
self.optimizer.zero_grad()
adv_embedding = embedding + perturb
network_temp = DNetwork(self.opt, adv_embedding) # This is how to get F(X)
network_temp.training = False
network_temp.cuda()
start, end, _ = network_temp(batch) # This is how to get F(X)
del network_temp # I even deleted this instance.
return F.cross_entropy(start, y[0]) + F.cross_entropy(end, y[1])
def update(self, batch):
self.network.train()
start, end, pred = self.network(batch)
loss = F.cross_entropy(start, y[0]) + F.cross_entropy(end, y[1])
loss_adv = self.adversarial_loss(batch, loss, self.network.lexicon_encoder.embedding.weight, y)
loss_total = loss + loss_adv
self.optimizer.zero_grad()
loss_total.backward()
self.optimizer.step()
I have few questions:
1) I substituted tf.stop_gradient with grad.detach_(). Is this correct?
2) I was getting "RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time." so I added retain_graph=True at the loss.backward. That specific error went away.
However now I'm getting a memory error after few epochs (RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.cu:58
). I suspect I'm unnecessarily retaining graph.
Can someone let me know pytorch's best practice on this? Any hint / even short comment will be highly appreciated.
I think you are trying to implement generative adversarial network (GAN), but from the code, I don't understand and can't follow to what you are trying to achieve as there are a few missing pieces for a GAN to works. I can see there's a discriminator network module, DNetwork but missing the generator network module.
If to guess, when you say 'loss function twice', I assumed you mean you have one loss function for the discriminator net and another for the generator net. If that's the case, let me share how I would implement a basic GAN model.
As an example, let's take a look at this Wasserstein GAN Jupyter notebook
I'll skip the less important bits and zoom into the important ones here:
First, import PyTorch libraries and set up
# Set up batch size, image size, and size of noise vector:
bs, sz, nz = 64, 64, 100 # nz is the size of the latent z vector for creating some random noise later
Build a discriminator module
class DCGAN_D(nn.Module):
def __init__(self):
... truncated, the usual neural nets stuffs, layers, etc ...
def forward(self, input):
... truncated, the usual neural nets stuffs, layers, etc ...
Build a generator module
class DCGAN_G(nn.Module):
def __init__(self):
... truncated, the usual neural nets stuffs, layers, etc ...
def forward(self, input):
... truncated, the usual neural nets stuffs, layers, etc ...
Put them all together
netG = DCGAN_G().cuda()
netD = DCGAN_D().cuda()
Optimizer needs to be told what variables to optimize. A module automatically keeps track of its variables.
optimizerD = optim.RMSprop(netD.parameters(), lr = 1e-4)
optimizerG = optim.RMSprop(netG.parameters(), lr = 1e-4)
One forward step and one backward step for Discriminator
Here, the network can calculate gradient during the backward pass, depends on the input to this function. So, in my case, I have 3 type of losses; generator loss, dicriminator real image loss, dicriminator fake image loss. I can get gradient of loss function three times for 3 different net passes.
def step_D(input, init_grad):
# input can be from generator's generated image data or input image from dataset
err = netD(input)
err.backward(init_grad) # backward pass net to calculate gradient
return err # loss
Control trainable parameters [IMPORTANT]
Trainable parameters in the model are those that require gradients.
def make_trainable(net, val):
for p in net.parameters():
p.requires_grad = val # note, i.e, this is later set to False below in netG update in the train loop.
In TensorFlow, this part can be coded like below:
grad = tf.gradients(loss, X)
grad = tf.stop_gradient(grad)
So, I think this will answer your first question, "I substituted tf.stop_gradient with grad.detach_(). Is this correct?"
Train loop
You can see here how's the 3 different loss functions are being called here.
def train(niter, first=True):
for epoch in range(niter):
# Make iterable from PyTorch DataLoader
data_iter = iter(dataloader)
i = 0
while i < n:
###########################
# (1) Update D network
###########################
make_trainable(netD, True)
# train the discriminator d_iters times
d_iters = 100
j = 0
while j < d_iters and i < n:
j += 1
i += 1
# clamp parameters to a cube
for p in netD.parameters():
p.data.clamp_(-0.01, 0.01)
data = next(data_iter)
##### train with real #####
real_cpu, _ = data
real_cpu = real_cpu.cuda()
real = Variable( data[0].cuda() )
netD.zero_grad()
# Real image discriminator loss
errD_real = step_D(real, one)
##### train with fake #####
fake = netG(create_noise(real.size()[0]))
input.data.resize_(real.size()).copy_(fake.data)
# Fake image discriminator loss
errD_fake = step_D(input, mone)
# Discriminator loss
errD = errD_real - errD_fake
optimizerD.step()
###########################
# (2) Update G network
###########################
make_trainable(netD, False)
netG.zero_grad()
# Generator loss
errG = step_D(netG(create_noise(bs)), one)
optimizerG.step()
print('[%d/%d][%d/%d] Loss_D: %f Loss_G: %f Loss_D_real: %f Loss_D_fake %f'
% (epoch, niter, i, n,
errD.data[0], errG.data[0], errD_real.data[0], errD_fake.data[0]))
"I was getting "RuntimeError: Trying to backward through the graph a second time..."
PyTorch has this behaviour; to reduce GPU memory usage, during the .backward() call, all the intermediary results (if you have like saved activations, etc.) are deleted when they are not needed anymore. Therefore, if you try to call .backward() again, the intermediary results don't exist and the backward pass cannot be performed (and you get the error you see).
It depends on what you are trying to do. You can call .backward(retain_graph=True) to make a backward pass that will not delete intermediary results, and so you will be able to call .backward() again. All but the last call to backward should have the retain_graph=True option.
Can someone let me know pytorch's best practice on this
As you can see from the PyTorch code above and from the way things are being done in PyTorch which is trying to stay Pythonic, you can get a sense of PyTorch's best practice there.
If you want to work with higher-order derivatives (i.e. a derivative of a derivative) take a look at the create_graph option of backward.
For example:
loss = get_loss()
loss.backward(create_graph=True)
loss_grad_penalty = loss + loss.grad
loss_grad_penalty.backward()

How to initialize a keras tensor employed in an API model

I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.