Tensorflow loss function no gradient provided - tensorflow

Currently I try to code my own loss function, but when returning the result (a tensor that consists of a list with the loss values) I get the following error:
ValueError: No gradients provided for any variable: ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0', 'dense_2/kernel:0', 'dense_2/bias:0'].
However in tutorials and in their docs they also use tf.recude_mean and when using it like them (they showed how to code mse loss function) I dont get the error, so it seems that I am missing something
My code:
gl = tfa.losses.GIoULoss()
def loss(y_true, y_pred):
batch_size = y_true.shape[0]
# now contains 32 lists (a batch) of bbxs -> shape is (32, 7876)
bbx_true = y_true.numpy()
# now contains 32 lists (a batch) of bbxs here we have to double access [0] in order to get the entry itself
# -> shape is (32, 1, 1, 7876)
bbx_pred = y_pred.numpy()
losses = []
curr_true = []
curr_pred = []
for i in range(batch_size):
curr_true = bbx_true[i]
curr_pred = bbx_pred[i][0][0]
curr_true = [curr_true[x:x+4] for x in range(0, len(curr_true), 4)]
curr_pred = [curr_pred[x:x+4] for x in range(0, len(curr_pred), 4)]
if len(curr_true) == 0:
curr_true.append([0., 0.,0.,0.])
curr_loss = gl(curr_true, curr_pred)
losses.append(curr_loss)
return tf.math.reduce_mean(losses, axis=-1)
Basically I want to achive bounding box regression and because of that I want to use the GIoUloss loss function. Because my model outputs 7896 neurons (the max amount of bounding boxes I want to predict according to my training set times 4) and the gioloss function needs the input as an array of lists with 4 elements each, I have to perform this transformation.
How do I have to change my code in order to also build up a gradient

Numpy don't provide autograd functions so you need to have Tensorflow tensors exclusively in your loss (otherwise the gradient is lost during backpropagation). So avoid using .numpy() and use the tensorflow operators and slicing on tensoflow tensors instead.

Related

How to map an array of values for y_true to a single value in order to compare to y_pred in a Tensorflow loss function (Tensorflow/Tensorflow Quantum)

I am trying to implement the circuits listed on page 8 in the following paper: https://arxiv.org/pdf/1905.10876.pdf using Tensorflow Quantum (TFQ). I have done so previously for a subset of circuits using Qiskit, and ended up with accuracies that can be found on page 14 in the following paper: https://arxiv.org/pdf/2003.09887.pdf. In TFQ, my accuracies are way down. I think this delta originates because in TFQ, I only used 1 observable Pauli Z operator on the first qubit, and the circuits do not seem to "transfer all knowledge" to the first qubit. I place this in quotes, because I am sure there is a better way to describe this. In Qiskit on the other hand, 16 states (4^2) get mapped to 2 states.
My question: how can I get my accuracies back up?
Potential answer a): some method of "transferring all information" to a single qubit, potentially an ancilla qubit, and doing a readout on this qubit.
Potential answer b) placing a Pauli Z observable on all qubits (4 in total), mapping half of the 16 states to a label 0 and the other half to a label 1. I attempted this in the code below.
My attempt at answer b):
I have a Tensorflow Quantum (TFQ) circuit implemented in Tensorflow. The circuit has multiple observables, which I try to bring together in my loss function. I prefer to use as many standard components as possible, but need to map my quantum states to a label in order to determine the loss. I think what I am trying to achieve is not unique to TFQ. I define my model in the following way:
def circuit():
data_qubits = cirq.GridQubit.rect(4, 1)
circuit = cirq.Circuit()
...
return circuit, [cirq.Z(data_qubits[0]), cirq.Z(data_qubits[1]), cirq.Z(data_qubits[2]), cirq.Z(data_qubits[3])]
model_circuit, model_readout = circuit()
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(), dtype=tf.string),
# The PQC layer returns the expected value of the readout gate, range [-1,1].
tfq.layers.PQC(model_circuit, model_readout),
])
# compile model
model.compile(
loss = loss_mse,
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
metrics=[])
in loss_mse (Mean Square Error), I receive a (32, 4) tensor for y_pred. One row could look like
[-0.2, 0.33, 0.6, 0.3]
This would have to be first mapped from [-1,1] to a binarized version of [0,1], so that it looks like:
[0, 1, 1, 1]
Now, a table lookup needs to happen, which tells if this combination is 0 or 1. Finally, the regular (y_true-y_pred)^2 can be performed by that row, followed by a np.sum on all rows. I tried to implement this:
def get_label(measurement):
if measurement == [0,0,0,0]: return 0
...
elif measurement == [1,1,1,1]: return 0
else: return -1
def py_call(y_true, y_pred):
# cast tensor to numpy
y_pred_np = np.asarray(y_pred)
loss = np.zeros((len(y_pred))) # could be a single variable with += within the loop
# evalaute all 32 samples
for pred in range(len(y_pred_np)):
# map, binarize and lookup
y_labelled = get_label([0 if y<0 else 1 for y in y_pred_np[pred]])
# regular loss comparison
loss[pred] = (y_labelled - y_true[pred])**2
# reduce
loss = np.sum(loss)/len(y_true)
return loss
#tf.function
def loss_mse(y_true, y_pred):
external_list = []
loss = tf.py_function(py_call, inp=[y_true, y_pred], Tout=[tf.float64])
return loss
However, the system appears to still expect a (32,4) tensor. I would have thought I could simply provide a single loss values (float). My question: how can I map multiple values for y_true to a single number in order to compare with a single y_pred value in a tensorflow loss function?
So it looks like there are a couple of things going on here. To answer your question
how can I map multiple values for y_true to a single number in order to compare with a single y_pred value in a tensorflow loss function ?
What you might want is some kind of tf.reduce_* function like tf.reduce_mean or tf.reduce_sum. This function will allow you to apply this reduction operation accross a given tensor axis allowing you to convert a tensor of shape (32, 4) to a tensor of shape (32,) or a tensor of shape (4,). Here is a quick snippet:
#tf.function
def my_loss(y_true, y_pred):
# y_true is shape (32, 4)
# y_pred is shape (32, 4)
# Scale from [-1, 1] to [0, 1]
y_true += 1
y_true /= 2
y_pred += 1
y_pred /= 2
# These are now both (32,) with the reduction of taking the mean applied along
# the second axis.
reduced_true = tf.reduce_mean(y_true, axis=1)
reduced_pred = tf.reduce_mean(y_pred, axis=1)
# Now a scalar loss.
loss = tf.reduce_mean((reduce_true - reduced_pred) ** 2)
return loss
Now the above isn't exactly what you want, since it's not super clear to me at least what exact reduction rules you have in mind for taking something like [0,1,1,1] -> 0 vs [0,0,0,0] -> 1.
Another thing I will also mention is that if you want JUST the sum of these Pauli Operators in cirq that you have term by term in the list [cirq.Z(data_qubits[0]), cirq.Z(data_qubits[1]), cirq.Z(data_qubits[2]), cirq.Z(data_qubits[3])] and all you care about is the final sum of these expectations, you could just as easily do:
my_operator = sum([cirq.Z(data_qubits[0]), cirq.Z(data_qubits[1]),
cirq.Z(data_qubits[2]), cirq.Z(data_qubits[3])])
print(my_op)
Which should give something like:
cirq.PauliSum(cirq.LinearDict({frozenset({(cirq.GridQubit(0, 0), cirq.Z)}): (1+0j), frozenset({(cirq.GridQubit(0, 1), cirq.Z)}): (1+0j), frozenset({(cirq.GridQubit(0, 2), cirq.Z)}): (1+0j), frozenset({(cirq.GridQubit(0, 3), cirq.Z)}): (1+0j)}))
Which is also compatable as a readout operation in the PQC layer. Lastly if would recommend reading through some of the snippets and examples here:
https://www.tensorflow.org/quantum/api_docs/python/tfq/layers/PQC
and here:
https://www.tensorflow.org/quantum/api_docs/python/tfq/layers/Expectation
Which give a pretty good description of how the input and output signatures of the functions look as well as the shapes you can expect from them.

What would be the output from tensorflow dense layer if we assign itself as input and output while making a neural network?

I have been going through the implementation of neural network in openAI code for any Vanilla Policy Gradient (As a matter of fact, this part is used nearly everywhere). The code looks something like this :
def mlp_categorical_policy(x, a, hidden_sizes, activation, output_activation, action_space):
act_dim = action_space.n
logits = mlp(x, list(hidden_sizes) + [act_dim], activation, None)
logp_all = tf.nn.log_softmax(logits)
pi = tf.squeeze(tf.random.categorical(logits, 1), axis=1)
logp = tf.reduce_sum(tf.one_hot(a, depth=act_dim) * logp_all, axis=1)
logp_pi = tf.reduce_sum(tf.one_hot(pi, depth=act_dim) * logp_all, axis=1)
return pi, logp, logp_pi
and this multi-layered perceptron network is defined as follows :
def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
for h in hidden_sizes[:-1]:
x = tf.layers.dense(inputs=x, units=h, activation=activation)
return tf.layers.dense(inputs=x, units=hidden_sizes[-1], activation=output_activation)
My question is what is the return from this mlp function? I mean the structure or shape. Is it an N-dimentional tensor? If so, how is it given as an input to tf.random_categorical? If not, and its just has the shape [hidden_layer2, output], then what happened to the other layers? As per their website description about random_categorical it only takes a 2-D input. The complete code of openAI's VPG algorithm can be found here. The mlp is implemented here. I would be highly grateful if someone would just tell me what this mlp_categorical_policy() is doing?
Note: The hidden size is [64, 64], the action dimension is 3
Thanks and cheers
Note that this is a discrete action space - there are action_space.n different possible actions at every step, and the agent chooses one.
To do this the MLP is returning the logits (which are a function of the probabilities) of the different actions. This is specified in the code by + [act_dim] which is appending count of the action_space as the final MLP layer. Note that the last layer of an MLP is the output layer. The input layer is not specified in tensorflow, it is inferred from the inputs.
tf.random.categorical takes the logits and samples a policy action pi from them, which is returned as a number.
mlp_categorical_policy also returns logp, the log probability of the action a (used to assign credit), and logp_pi, the log probability of the policy action pi.
It seems your question is more about the return from the mlp.
The mlp creates a series of fully connected layers in a loop. In each iteration of the loop, the mlp is creating a new layer using the previous layer x as an input and assigning it's output to overwrite x, with this line x = tf.layers.dense(inputs=x, units=h, activation=activation).
So the output is not the same as the input, on each iteration x is overwritten with the value of the new layer. This is the same kind of coding trick as x = x + 1, which increments x by 1. This effectively chains the layers together.
The output of tf.layers.dense is a tensor of size [:,h] where : is the batch dimension (and can usually be ignored). The creation of the last layer happens outisde the loop, it can be seen that the number of nodes in this layer is act_dim (so shape is [:,3]). You can check the shape by doing this:
import tensorflow.compat.v1 as tf
import numpy as np
def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
for h in hidden_sizes[:-1]:
x = tf.layers.dense(x, units=h, activation=activation)
return tf.layers.dense(x, units=hidden_sizes[-1], activation=output_activation)
obs = np.array([[1.0,2.0]])
logits = mlp(obs, [64, 64, 3], tf.nn.relu, None)
print(logits.shape)
result: TensorShape([1, 3])
Note that the observation in this case is [1.,2.], it is nested inside a batch of size 1.

Error when attempting to change tensor shape in keras model

I want to change the shape and the content of the tensor in a keras model. Tensor is the output of a layer and has
shape1=(batch_size, max_sentences_in_doc, max_tokens_in_doc, embedding_size)
and I want to convert to
shape2=(batch_size, max_documents_length, embedding_size)
suitable as input of the next layer. Here sentences are made of tokens, and are zero-padded so every sentence has length=max_tokens_in_sentence.
In detail:
I wanto to concatenate all the sentences of a batch taking only the nonzero part of the sentences;
then I zero-pad this concatenation to a length=max_document_length.
So passing from shape1 to shape2 is not only a reshape as mathematical operations are involved.
I created the function embedding_to_docs(x) that iterates on the tensor of shape1 to transform it into shape2. I call the function using a Lambda layer in the model, it works in debug with fictious data, but when I try to call it during the build of the model an error is raised:
Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn.
def embedding_to_docs(x):
new_output = []
for doc in x:
document = []
for sentence in doc:
non_zero_indexes = np.nonzero(sentence[:, 0])
max_index = max(non_zero_indexes[0])
if max_index > 0:
document.extend(sentence[0:max_index])
if MAX_DOCUMENT_LENGTH-len(document) > 0:
a = np.zeros((MAX_DOCUMENT_LENGTH-len(document), 1024))
document.extend(a)
else:
document = document[0:MAX_DOCUMENT_LENGTH]
new_output.append(document)
return np.asarray(new_output)
...
# in the model:
tensor_of_shape2 = Lambda(embedding_to_docs)(tensor_of_shape1)
How to fix this?
You can use py_function, which allows you to switch from the graph mode (used by Keras) to the eager mode (where it is possible to iterate over tensors like in your function).
def to_docs(x):
return tf.py_function(embedding_to_docs, [x], tf.float32)
tensor_of_shape2 = Lambda(to_docs)(tensor_of_shape1)
Note that the code run within your embedding_to_docs must be written in tensorflow eager instead of numpy. This means that you'd need to replace some of the numpy calls with tensorflow. You'd surely need to replace the return line with:
return tf.convert_to_tensor(new_output)
Using numpy arrays will stop the gradient computation, but you are likely not interested in gradient flowing through the input data anyway.

autocorrelation of the input in tensorflow/keras

I have a 1D input signal. I want to compute autocorrelation as the part of the neural net for further use inside the network.
I need to perform convolution of input with input itself.
To perform convolution in keras custom layer/ tensorflow. We need the following parameters
data shape is "[batch, in_height, in_width, in_channels]",
filter shape is "[filter_height, filter_width, in_channels, out_channels]
There is no batch present in filter shape, which needs to be input in my case
TensorFlow now has an auto_correlation function. It should be in release 1.6. If you build from source you can use it right now (see e.g. the github code).
Here is a possible solution.
By self convolution, I understood a regular convolution where the filter is exactly the same as the input (if it's not that, sorry for my misunderstanding).
We need a custom function for that, and a Lambda layer.
At first I used padding = 'same' which brings outputs with the same length as the inputs. I'm not sure about what output length you want exactly, but if you want more, you should add padding yourself before doing the convolution. (In the example with length 7, for a complete convolution from one end to another, this manual padding would include 6 zeros before and 6 zeros after the input length, and use padding = 'valid'. Find the backend functions here)
Working example - Input (5,7,2)
from keras.models import Model
from keras.layers import *
import keras.backend as K
batch_size = 5
length = 7
channels = 2
channels_batch = batch_size*channels
def selfConv1D(x):
#this function unfortunately needs to know previously the shapes
#mainly because of the for loop, for other lines, there are workarounds
#but these workarounds are not necessary since we'll have this limitation anyway
#original x: (batch_size, length, channels)
#bring channels to the batch position:
x = K.permute_dimensions(x,[2,0,1]) #(channels, batch_size, length)
#suppose channels are just individual samples (since we don't mix channels)
x = K.reshape(x,(channels_batch,length,1))
#here, we get a copy of x reshaped to match filter shapes:
filters = K.permute_dimensions(x,[1,2,0]) #(length, 1, channels_batch)
#now, in the lack of a suitable available conv function, we make a loop
allChannels = []
for i in range (channels_batch):
f = filters[:,:,i:i+1]
allChannels.append(
K.conv1d(
x[i:i+1],
f,
padding='same',
data_format='channels_last'))
#although channels_last is my default config, I found this bug:
#https://github.com/fchollet/keras/issues/8183
#convolution output: (1, length, 1)
#concatenate all results as samples
x = K.concatenate(allChannels, axis=0) #(channels_batch,length,1)
#restore the original form (passing channels to the end)
x = K.reshape(x,(channels,batch_size,length))
return K.permute_dimensions(x,[1,2,0]) #(batch_size, length, channels)
#input data for the test:
x = np.array(range(70)).reshape((5,7,2))
#little model that just performs the convolution
inp= Input((7,2))
out = Lambda(selfConv1D)(inp)
model = Model(inp,out)
#checking results
p = model.predict(x)
for i in range(5):
print("x",x[i])
print("p",p[i])
You can just use tf.nn.conv3d by treating the "batch size" as "depth":
# treat the batch size as depth.
data = tf.reshape(input_data, [1, batch, in_height, in_width, in_channels])
kernel = [filter_depth, filter_height, filter_width, in_channels, out_channels]
out = tf.nn.conv3d(data, kernel, [1,1,1,1,1], padding='SAME')

Organizing tensor into batches of dynamically shaped tensors

I have the following situation:
I want to deploy a face detector model using Tensorflow Serving: https://www.tensorflow.org/serving/.
In Tensorflow Serving, there is a command line option called --enable_batching. This causes the model server to automatically batch the requests to maximize throughput. I want this to be enabled.
My model takes in a set of images (called images), which is a tensor of shape (batch_size, 640, 480, 3).
The model has two outputs: (number_of_faces, 4) and (number_of_faces,). The first output will be called faces. The last output, which we can call partitions is the index in the original batch for the corresponding face. For example, if I pass in a batch of 4 images and get 7 faces, then I might have this tensor as [0, 0, 1, 2, 2, 2, 3]. The first two faces correspond to the first image, the third face for the second image, the 3rd image has 3 faces, etc.
My issue is this:
In order for the --enable_batching flag to work, the output from my model needs to have the 0th dimension the same as the input. That is, I need a tensor with the following shape: (batch_size, ...). I suppose this is so that the model server can know which grpc connection to send each output in the batch towards.
What I want to do is to convert my output tensor from the face detector from this shape (number_of_faces, 4) to this shape (batch_size, None, 4). That is, an array of batches, where each batch can have a variable number of faces (e.g. one image in the batch may have no faces, and another might have 3).
What I tried:
tf.dynamic_partition. On the surface, this function looks perfect. However, I ran into difficulties after realizing that the num_partitions parameter cannot be a tensor, only an integer:
tensorflow_serving_output = tf.dynamic_partition(faces, partitions, batch_size)
If the tf.dynamic_partition function were to accept tensor values for num_partition, then it seems that my problem would be solved. However, I am back to square one since this is not the case.
Thank you all for your help! Let me know if anything is unclear
P.S. Here is a visual representation of the intended process:
I ended up finding a solution to this using TensorArray and tf.while_loop:
def batch_reconstructor(tensor, partitions, batch_size):
"""
Take a tensor of shape (batch_size, 4) and a 1-D partitions tensor as well as the scalar batch_size
And reconstruct a TensorArray that preserves the original batching
From the partitions, we can get the maximum amount of tensors within a batch. This will inform the padding we need to use.
Params:
- tensor: The tensor to convert to a batch
- partitions: A list of batch indices. The tensor at position i corresponds to batch # partitions[i]
"""
tfarr = tf.TensorArray(tf.int32, size=batch_size, infer_shape=False)
_, _, count = tf.unique_with_counts(partitions)
maximum_tensor_size = tf.cast(tf.reduce_max(count), tf.int32)
padding_tensor_index = tf.cast(tf.gather(tf.shape(tensor), 0), tf.int32)
padding_tensor = tf.expand_dims(tf.cast(tf.fill([4], -1), tf.float32), axis=0) # fill with [-1, -1, -1, -1]
tensor = tf.concat([tensor, padding_tensor], axis=0)
def cond(i, acc):
return tf.less(i, batch_size)
def body(i, acc):
partition_indices = tf.reshape(tf.cast(tf.where(tf.equal(partitions, i)), tf.int32), [-1])
partition_size = tf.gather(tf.shape(partition_indices), 0)
# concat the partition_indices with padding_size * padding_tensor_index
padding_size = tf.subtract(maximum_tensor_size, partition_size)
padding_indices = tf.reshape(tf.fill([padding_size], padding_tensor_index), [-1])
partition_indices = tf.concat([partition_indices, padding_indices], axis=0)
return (tf.add(i, 1), acc.write(i, tf.gather(tensor, partition_indices)))
_, reconstructed = tf.while_loop(
cond,
body,
(tf.constant(0), tfarr),
name='batch_reconstructor'
)
reconstructed = reconstructed.stack()
return reconstructed