Keras custom loss with dynamic variable for slicing - tensorflow

First, I would like to say that I only have little experience in Keras/Tensorflow and probably lack some understanding on tensors manipulations.
I am using a model which input is an "oversized" matrix (NxN). That is, I feed it with data that can be smaller (ie. (KxK), K <= N) where "missing" data (to fit the NxN shape) is filled with zeros. The output is an encoded version (Nx2) of the input.
I'm using a custom loss function that I would like to be computed only on the (Kx2) first values of the model's output. To do so, I think the solution is to "slice" the y_pred tensor in my loss function since I don't want to simply mask it with a boolean tensor. However, I can't figure out how to pass K as a dynamic argument to my custom loss.
Wrapping the function within another function that takes an argument does not fit my needs since the K value will change on each data sample
Passing K in the model's input and getting it back through a function wrapp (eg. https://stackoverflow.com/a/55445837/6315123) as mentionned in the first point does not work either, since slices cannot be computed from Tensor (as far as I understand); and evaluate the tensor within the loss function doesn't seem possible.
How can I pass such an argument to my loss function ?
Thanks !

Related

Is it possible to get the sample indexes in Keras custom loss function?

In my Keras custom loss function I would like to know the sample indexes (as in the original input array) for the current y_true, y_pred tensors.
I know it sounds weird, but for calculating loss I need some additional information, what I prepare in an external array, which is not part neither of the input array, nor the expected output array.
The only solution I currently see is to include it to the expected output array as additional columns, so I got it in y_true, but I am not sure how disturbing it would be for the NN and the optimizer to have one extra node in the output layer, which's actual prediction is not correlated with the calculated loss...

Why is "step" argument necessary when predicting using data tensors? what does this error mean?

I am trying to predict() the output for a single data point d, using my trained Keras model loaded from a file. But I get a ValueError If predicting from data tensors, you should specify the 'step' argument. What does that mean?
I tried setting step=1, but then I get a different error ValueError: Cannot feed value of shape () for Tensor u'input_1:0', which has shape '(?, 600)'.
Here is my code:
d = np.concatenate((hidden[p[i]], hidden[x[i]])).resize((1,600))
hidden[p[i]] = autoencoder.predict(d,steps=)
The model is expecting (?,600) as input. I have concatenated two numpy arrays of shape (300,) each to get (600,), which is resized to (1,600). This (1,600) is my input to predict().
In my case, the input to predict was None (because I had a bug in another part of the code).
In official doc, steps refer to the total number of steps before stopping. So steps=1 means make predictions on one batch instead of making prediction on one record (single data point).
https://keras.io/models/sequential/
-> Define value of steps argument,
d = np.concatenate((hidden[p[i]],
hidden[x[i]])).resize((1,600))
hidden[p[i]] = autoencoder.predict(d,steps=1)
If you are using a test data generator, it is good practice to define the steps, as mentioned in the documentation.
If you are predicting a single instance, no need to define the steps. Just make sure the argument (i.e. instance 'd') is not None, otherwise that error will show up. Some reshaping may also be necessary.
in my case i got the same error, i just reshaped the data to predict with numpy function reshape() to the shape of the data originally used to train the model.

BroadcastGradientArgs no documentation provided

I am creating my custom ops. While inspecting the ops in back prop, I am coming across BroadcastGradientArgs.
Does anyone has any idea what does this do?
it is an internal op that returns the axis of reduction given two tensor shapes. Notice that the return values of this is always used for reduce_sum. ops that support broadcasting (an op involving a tensor of lesser rank or shape) needed to have a reduction function so that the resulting gradient has the same original size. It has the effect of summing individual gradients into one value.

Tensorflow: difference get_tensor_by_name vs get_operation_by_name?

The answer here says that one returns an operation while the other returns a tensor. That is pretty obvious from the name and from the documentation. However, suppose I do the following:
logits = tf.add(tf.matmul(inputs, weights), biases, name='logits')
I am following the pattern described in Tensorflow Mechanics 101. Should I restore it as an operation or as a tensor? I am afraid that if I restore it as a tensor I will only get the last computed values for the logits; nonetheless, the post here, seems to suggest that there is no difference or that I should just use get_tensor_by_name. The idea is to compute the logits for a new set of inputs and then make predictions accordingly.
Short answer: you can use both, get_operation_by_name() and get_tensor_by_name(). Long answer:
tf.Operation
When you call
op = graph.get_operation_by_name('logits')
... it returns an instance of type tf.Operation, which is a node in the computational graph, which performs some op on its inputs and produces one or more outputs. In this case, it's a plus op.
One can always evaluate an op in a session, and if this op needs some placehoder values to be fed in, the engine will force you to provide them. Some ops, e.g. reading a variable, don't have any dependencies and can be executed without placeholders.
In your case, (I assume) logits are computed from the input placeholder x, so logits doesn't have any value without a particular x.
tf.Tensor
On the other hand, calling
tensor = graph.get_tensor_by_name('logits:0')
... returns an object tensor, which has the type tf.Tensor:
Represents one of the outputs of an Operation.
A Tensor is a symbolic handle to one of the outputs of an Operation.
It does not hold the values of that operation's output, but instead
provides a means of computing those values in a TensorFlow tf.Session.
So, in other words, tensor evaluation is the same as operation execution, and all the restrictions described above apply as well.
Why is Tensor useful? A Tensor can be passed as an input to another Operation, thus forming the graph. But in your case, you can assume that both entities mean the same.

Optimizing a subset of a tensor in Tensor Flow

I have a free varaible (tf.variable) x, and I wish to minimize an error term with respect to subset of the tensor x (for example minimizing the error only with respect to the first row of 2D tensor).
One way is to compute the gradients and change the gradient to zero for the irrelevant parts of the tensor and apply the gradients. Is their another way?
You can use mask and tf.stop_gradient to selectively make the variable non-trainable: tf.stop_gradient(mask*x). The value in matrix mask 1 should denote parts to apply gradient and 0 otherwise.