I would like to train a keras model (say a simple FFNN) using the model.fit() method and not doing it 'by hand' (i.e. by using the gradient.tape method explained for example here). However, the loss function I need to use is quite elaborated and cannot be computed on randomly generated batches of data. As a result, I need to train the model using batches of data computed 'by hand' (i.e. the data that goes into each batch needs to have certain properties and cannot be randomly assigned).
Can I pass somehow pre-computed batches to the fit() method?
One solution consists in sub-classing the Tensorflow Sequence. You can create your own batch for a given index using the __getitem__ method.
class MySequence(tf.keras.utils.Sequence):
def __init__(self, x_batch, y_batch) -> None:
super().__init__()
self.x_batch = x_batch # ordered list of batches
self.y_batch = y_batch # idem
self.leny = len(y_batch)
def __len__(self):
return self.leny
def __getitem__(self, idx):
x = self.x_batch[idx]
y = self.y_batch[idx]
return x, y
You can pass of an instance of this Sequence sub-class to the Model fit method.
Also set shuffle=False in the Model fit arguments.
Related
In my project, I have a number of cases where I have a Dataset instance and I need to get predictions from some model on every item in the dataset.
The model.predict() API is optimized perfectly for this, as shown in the documentation. However, there seems to be one major catch. I also happen to need the labels to compare with the predicted values, i.e. the dataset contains x,y pairs, and I'd like to end up with (y_predicted, y) pairs after the prediction is complete. This does not seem to be possible with the predict() API though, and I can't think of a clean way to 'split' the dataset so that the x's are fed into the model and the y's are retained to be joined back up with the predicted y's.
EDIT: I know it's quite simple to do by iterating over the dataset manually and calling the model directly, e.g.
for x, y in dataset:
y_pred = model(x)
result.append((y, y_pred))
However, this seems like it will be a fair bit slower than using the inbuilt predict() as Tensorflow won't be able to multi-thread/optimize the input pipeline.
Does anyone have a good way to accomplish this?
Given the concerns you mentioned, it may be best to overwrite predict to suit your needs. You don't actually need to overwrite that function though, instead only predict_step which is called by that function. Just use this class instead of Model:
class MyModel(tf.keras.Model):
def predict_step(self, data):
x, y = data
return self(x, training=False), y
If your model is currently Sequential, inherit from that instead. Basically the only change I made from the default implementation is to add , y to the model call result.
Note that this also makes some assumptions, such that your dataset consists of (input, label) batch pairs. You may need to adapt it slightly to your needs. Here is a minimal example:
import tensorflow as tf
import numpy as np
(imgs, lbls), (te_imgs, te_lbls) = tf.keras.datasets.mnist.load_data()
imgs = imgs.astype(np.float32).reshape((-1, 784)) / 255.
te_imgs = te_imgs.astype(np.float32).reshape((-1, 784)) / 255.
lbls = lbls.astype(np.int32)
te_lbls = te_lbls.astype(np.int32)
tr_data = tf.data.Dataset.from_tensor_slices((imgs, lbls)).shuffle(60000).batch(128)
te_data = tf.data.Dataset.from_tensor_slices((te_imgs, te_lbls)).batch(128)
class MyModel(tf.keras.Model):
def predict_step(self, data):
x, y = data
return self(x, training=False), y
inp = tf.keras.Input((784,))
logits = tf.keras.layers.Dense(10)(inp)
model = MyModel(inp, logits)
opt = tf.keras.optimizers.Adam()
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(loss=loss, optimizer=opt)
something = model.predict(te_data)
print(something[0].shape, something[1].shape)
This shows ((10000, 10), (10000,)) -- predict now returns a tuple of outputs, labels (this can be confirmed by inspecting the returned labels and comparing to the images in the test set).
I would like to ask you some help for creating my custom layer.
What I am trying to do is actually quite simple: generating an output layer with 'stateful' variables, i.e. tensors whose value is updated at each batch.
In order to make everything more clear, here is a snippet of what I would like to do:
def call(self, inputs)
c = self.constant
m = self.extra_constant
update = inputs*m + c
X_new = self.X_old + update
outputs = X_new
self.X_old = X_new
return outputs
The idea here is quite simple:
X_old is initialized to 0 in the def__ init__(self, ...)
update is computed as a function of the inputs to the layer
the output of the layer is computed (i.e. X_new)
the value of X_old is set equal to X_new so that, at the next batch, X_old is no longer equal to zero but equal to X_new from the previous batch.
I have found out that K.update does the job, as shown in the example:
X_new = K.update(self.X_old, self.X_old + update)
The problem here is that, if I then try to define the outputs of the layer as:
outputs = X_new
return outputs
I will receiver the following error when I try model.fit():
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have
gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
And I keep having this error even though I imposed layer.trainable = False and I did not define any bias or weights for the layer. On the other hand, if I just do self.X_old = X_new, the value of X_old does not get updated.
Do you guys have a solution to implement this? I believe it should not be that hard, since also stateful RNN have a 'similar' functioning.
Thanks in advance for your help!
Defining a custom layer can become confusing some times. Some of the methods that you override are going to be called once but it gives you the impression that just like many other OO libraries/frameworks, they are going to be called many times.
Here is what I mean: When you define a layer and use it in a model the python code that you write for overriding call method is not going to be directly called in forward or backward passes. Instead, it's called only once when you call model.compile. It compiles the python code to a computational graph and that graph in which the tensors will flow is what does the computations during training and prediction.
That's why if you want to debug your model by putting a print statement it won't work; you need to use tf.print to add a print command to the graph.
It is the same situation with the state variable you want to have. Instead of simply assigning old + update to new you need to call a Keras function that adds that operation to the graph.
And note that tensors are immutable so you need to define the state as tf.Variable in the __init__ method.
So I believe this code is more like what you're looking for:
class CustomLayer(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(CustomLayer, self).__init__(**kwargs)
self.state = tf.Variable(tf.zeros((3,3), 'float32'))
self.constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
self.extra_constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
self.trainable = False
def call(self, X):
m = self.constant
c = self.extra_constant
outputs = self.state + tf.matmul(X, m) + c
tf.keras.backend.update(self.state, tf.reduce_sum(outputs, axis=0))
return outputs
For those who know about this, I would like to know how to run Carlini-Wagner's attack script on a non-Keras model.
Suppose I have class containing a tensorflow graph that implements a classification model:
class classifier:
def __init__(self):
self.x = tf.placeholder(tf.float32, shape)
# [insert mdoel here]
self.logits = tf.layers.dense(inputs=..., units=num_labels, activation=None)
I would like to be able to run a predict statement within an optimization loop like this:
model = classifier()
output = model.predict(newimg)
where "newimg" is a tf.variable that I will be optimizing over.
How can I modify my "classifier" class so that this becomes possible, i.e. a predict method that takes in a Tensorflow variable or placeholder and outputs another?
Essentially this would be equivalent to defining a new graph where I replace the original placeholder x by a non-trainable variable, but this seems dirty.
You need to put your computation graph into a Tensorflow Session, and use Session.run() method to feed the input image. It should be something like this:
class classifier:
def __init__(self):
self.x = tf.placeholder(tf.float32, shape)
# [insert mdoel here]
self.logits = tf.layers.dense(inputs=..., units=num_labels, activation=None)
self.graph = tf.get_default_graph()
self.sess = tf.Session(graph=self.graph)
def predict(self, image):
pred_logits = self.sess.run(self.logits,
feed_dict={self.x: image})
To eval, simply call:
model = classifier()
output_logits = model.predict(newimg)
Keep in mind that feeding images one by one is not the optimal solution, if you need to perform evaluation on a large scale dataset, you will need to use the Tensorflow Data pipeline to batch images in parallel and speed-up inference time, which is certainly off the topic.
I want to add to my model a layer that, during evaluation, takes the input, applies some transformations (a quantization in this case, but can be whatever) and return it as the output. This layer must, however, be completely transparent during training, meaning that it must return the same input tensor.
I have written the following function
from keras.layers import Lambda
import keras.backend as K
def myquantize(x):
return K.in_test_phase( K.clip(K.round(x*(2**5))/(2**5),-3.9,3.9) , x)
which I then use via a Lambda layer
y = keras.layers.Conv1D(**args1)
y = keras.layers.AveragePooling1D(pool_size=2)(y)
y = keras.layers.Lambda(myquantize)(y)
y = keras.layers.Conv1D(**args2)
#...
Now, in principle the K.in_test_phase should return x during training, and that expression during test.
However, training the network with such layer prevent the network from learning (i.e. the train loss stops decreasing after 3 epochs), while if I remove it the network keeps training normally. I assume this layer is not actually transparent during training as expected.
in_test_phase has a parameter of training which you can explicitly set to indicate whether you are training or not. If you don't set it explicitly, then the value of learning_phase is used. This value keeps changing when you reset the graph or when you call different types of fit/predict/evaluate functions of model.
Since your full code isn't present, you can make use of training parameter. Set it to True during training. Then save the weights of the model using save_weights function of model. When you wish to test your model, set the training parameter to False. Then load the weights using load_weights function and then you can proceed accordingly.
For those who are in a similar situation, I created a custom layer like the following, which I only use during training:
class MyLayer(keras.layers.Layer):
def __init__(self, **kwargs):
super(MyLayer, self).__init__(**kwargs)
def compute_output_shape(self, input_shape):
return input_shape
def call(self, inputs, **kwargs):
x=inputs
return K.identity(x)
note that this layer always returns the input tensor, but it serves as 'placeholder' for the next step. On the evaluation part of the code, I wrote the following code:
class MyLayer(keras.layers.Layer):
def __init__(self, **kwargs):
super(MyLayer, self).__init__(**kwargs)
def compute_output_shape(self, input_shape):
return input_shape
def call(self, inputs, **kwargs):
x=inputs
return #Your actual processing here
Here, the only difference is that you actually perform the desired processing steps on your tensor. When I load my stored model, I pass this class as custom object
model = keras.models.load_model(model_file,custom_objects={'MyLayer':MyLayer})
be careful to pass as MyLayer the one where the actual processing is performed.
This is my solution, other suggestions are welcome
i want to create a custom layer with weights that update only in training phase.
from the official documentation this is the way:
from keras import backend as K
from keras.layers import Layer
class MyLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyLayer, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
super(MyLayer, self).build(input_shape) # Be sure to call this at the end
def call(self, x):
return K.dot(x, self.kernel)
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)
in this github repo
the author added
new_centers = self.centers - self.alpha * delta_centers
self.add_update((self.centers, new_centers), x)
where self.centers are the weights.
I cant understand why self.add_update is useful in that situation.
Weights are not updated if i dont add self.add_update? If not, why new_centers must be in the updates list and not in the inputs list?And why x is a requirement?
from the source code,
self.add_update(updates, inputs)
updates: update op or list of update ops to add to the layer.
inputs: input tensor or list of inputs tensors to mark the updates as conditional on these inputs.If None is passed, the updates are assumed unconditional.
There are two types of weights:
Trainable = Updated automatically by the optimizer with backpropagation
Untrainable = Not updated by backpropagation
For the trainable weights, it's really not recommended to use updates, you will be mixing the optimizer's updates with your own updates and that could cause many issues
For the untrainable weights, you can do whatever you want. Sometimes you want constants and you will do nothing, sometimes, you want these weights to change (but not via backpropagation)
Notice how in that example the weights updated by the user are untrainable:
self.centers = self.add_weight(name='centers',
shape=(10, 2),
initializer='uniform',
#UNTRAINABLE
trainable=False)
But the user wants these weights to be updated following some rules. I don't know what they are doing there (didn't analyse the code), but I assume that they are calculating, for instance, something similar to the center point of a group of images, and each batch will have this center in a different position. They want to update this position.
A classical example is the BatchNormalization layer. Besides having trainable scale and bias weights used to rescale the outputs, they have the mean and variance weights. These are statistical properties of the data that need to be updated with every batch.
You are not training the "mean" or the "variance", but each batch of data updates these values.
How does it work?
This is obscure and lies deep down in Keras code.
We need the update operation so we make sure self.centers will have new values for every batch, otherwise it won't.
We use self.add_update in a layer to register that this variable should be updated. (We do similar things in custom optimizers as well, the optimizers contain the updates to the weights made via backpropagation)
Later in the source code for training the model, Keras will collect all these registered updates and make a train function. Somewhere inside this, these updates will be applied to the vars:
#inside a training function from keras
with K.name_scope('training'):
with K.name_scope(self.optimizer.__class__.__name__):
training_updates = self.optimizer.get_updates(
params=self._collected_trainable_weights,
loss=self.total_loss)
updates = (self.updates + #probably the updates registered in layers
training_updates + #the updates registered in optimizers
self.metrics_updates) #don't know....
# Gets loss and metrics. Updates weights at each call.
self.train_function = K.function(
inputs,
[self.total_loss] + self.metrics_tensors,
updates=updates,
name='train_function',
**self._function_kwargs)