InvalidArgumentError: In[0] is not a matrix. Instead it has shape [] - tensorflow

I'm not able to train the network using keras, getting the following error, at epoch 1, first batch:
InvalidArgumentError: In[0] is not a matrix. Instead it has shape []
[[{{node training/SGD/gradients/dense_1/MatMul_grad/MatMul}}]]
I'm trying to solve a regression problem using Keras and a custom function provided by https://github.com/farrell236/DeepPose
The network is a quite simple CNN VGG-like.
I think the problem is the loss function. In particular, I suppose that the weight initialization is the issue (take a look at the Tensorflow example: https://github.com/farrell236/DeepPose/blob/master/tensorflow/example)
That's my loss function:
def custom_loss(y_true, y_pred):
loss = SE3GeodesicLoss(np.ones((1, 6)))
tf.initializers.constant([loss])
y_pred = tf.cast(y_pred, dtype=tf.float32)
y_true = tf.cast(y_true, dtype=tf.float32)
loss = SE3GeodesicLoss(np.ones(6))
geodesic_loss = loss.geodesic_loss(y_pred, y_true)
geodesic_loss = tf.cast(geodesic_loss, dtype=tf.float32)
return geodesic_loss
What's strange is that I'm able to use this function as a metric for the training.
Further information:
What I'm trying to do is to estimate the position of an object having images as input and relative Eulerian angles and distance of the target as labels (which means 6 parameters [r_x, r_y, r_z, t_x, t_y, t_z]). I'm trying to implement this loss function in order to solve the attitude estimation problem. Other losses (means: MSE, MAE) are not effective enough in solving attitude regression problem.
Do you have any suggestion?

Related

ValueError: No gradients provided for any variable in semi/self supervised loss function

I am training a neural network for clustering applications in a semi/self-supervised way:
Instead of having the ground truth, I define the loss function by calculating the similarity among the data points assigned to the same clusters, like:
def loss_function(self, y_true, y_pred):
def get_loss(x_input, y_input):
similarity = 0
for i in range(len(np.unique(y_input))):
similarity += sum(pdist(x_input))
return similarity
score = tf.numpy_function(get_loss, [self.x_input, y_pred], tf.float32)
return score
In calculating the loss, I don't use y_true, and instead, I use self.x_input, which is the original data point.
I'm getting the following error while running my code:
raise ValueError(f"No gradients provided for any variable: {variable}. "
ValueError: No gradients provided for any variable
So my question is it possible to train a neural network model in this way (without having ground truth)? If so, what is causing the above problem?

How to build a Neural Network in Keras using a custom loss function with datapoint-specific weight?

I want to train a Neural Network for a classification task in Keras using a TensorFlow backend with a custom loss function. In my loss, I want to give different weights to different training examples. I have some datapoints I consider important and some I do not consider as important. I want my loss function to take this into account and punish errors in important examples more than in less important ones.
I have already built my model:
input = tf.keras.Input(shape=(16,))
hidden_layer_1 = tf.keras.layers.Dense(5, kernel_initializer='glorot_uniform', activation='relu')(input)
output = tf.keras.layers.Dense(1, kernel_initializer='normal', activation='softmax')(hidden_layer_1)
model = tf.keras.Model(input, output)
model.compile(loss=custom_loss(input), optimizer='adam', run_eagerly=True, metrics = [tf.keras.metrics.Accuracy(), 'acc'])
and the currrent state of my loss function is:
def custom_loss(input):
def loss(y_true, y_pred):
return ...
return loss
I'm struggling with implementing the loss function in the way I explained above, mainly because I don't exactly know what input, y_pred and y_true are (KerasTensors, I know - but what is the content? And is it for one training example only or for the whole batch?). I'd appreciate help with
printing out the values of input, y_true and y_pred
converting the input value to a numpy ndarray ([1,3,7] for example) so I can use the array to look up my weight for this specific training data point
once I have my weigth as a number (0.5 for example), how do I implement the computation of the loss function in Keras? My loss for one training exaple should be 0 if the classification was correct and weight if it was incorrect.

Custom gradient in tensorflow attempts to convert model to tensor

I am trying to use the output of one neural network to compute the loss value for another network. As the first network is approximating another function (L2 distance) I would like to provide the gradients myself, as if it had come from an L2 function.
An example of my loss function in simplified code is:
#tf.custom_gradient
def loss_function(model_1_output):
def grad(dy, variables=None):
gradients = 2 * pred
return gradients
pred = model_2(model_1_output)
loss = pred ** 2
return loss, grad
This is called in a standard tensorflow 2.0 custom training loop such as:
with tf.GradientTape() as tape:
model_1_output = model_1(training_data)
loss = loss_function(model_1_output)
gradients = tape.gradient(loss, model_1.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables)
However, whenever I try to run this I keep getting the error:
ValueError: Attempt to convert a value (<model.model_2 object at 0x7f41982e3240>) with an unsupported type (<class 'model.model_2'>) to a Tensor.
The whole point of using the custom_gradients decorator is that I don't want the model_2 in the loss function to be included in the back propagation as I give it the gradients manually.
How can I make tensorflow completely ignore anything inside the loss function? So that for example I could do non-differetiable operations. I have tried using with tape.stop_recording() but I always result in a no gradients found error.
Using:
OS: Ubuntu 18.04
tensorflow: 2.0.0
python: 3.7

Can you process a tensor in chunks in a custom Keras loss function?

I am trying to write a cusom Keras loss function in which I process the tensors in sub-vector chunks. For example, if an output tensor represented a concatenation of quaternion coefficients (i.e. w,x,y,z,w,x,y,z...) I might wish to normalize each quaternion before calculating the mean squared error in a loss function like:
def norm_quat_mse(y_true, y_pred):
diff = y_pred - y_true
dist = 0
for i in range(0,16,4):
dist += K.sum( K.square(diff[i:i+4] / K.sqrt(K.sum(K.square(diff[i:i+4])))))
return dist/4
While Keras will accept this function without error and use in training, it outputs a different loss value from when applied as an independent function and when using model.predict(), so I suspect it is not working properly. None of the built-in Keras loss functions use this per-chunk processing approach, is it possible to do this within Keras' auto-differentiation framework?
Try:
def norm_quat_mse(y_true, y_pred):
diff = y_pred - y_true
dist = 0
for i in range(0,16,4):
dist += K.sum( K.square(diff[:,i:i+4] / K.sqrt(K.sum(K.square(diff[:,i:i+4])))))
return dist/4
You need to know that shape of y_true and y_pred is (batch_size, output_size) so you need to skip first dimension during computations.

tensorflow tutorial of convolution, scale of logit

I am trying to edit my own model by adding some code to cifar10.py and here is the question.
In cifar10.py, the [tutorial][1] says:
EXERCISE: The output of inference are un-normalized logits. Try editing the network architecture to return normalized predictions using tf.nn.softmax().
So I directly input the output from "local4" to tf.nn.softmax(). This gives me the scaled logits which means the sum of all logits is 1.
But in the loss function, the cifar10.py code uses:
tf.nn.sparse_softmax_cross_entropy_with_logits()
and description of this function says
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
Also, according to the description, logits as input to above funtion must have the shape [batch_size, num_classes] and it means logits should be unscaled softmax, like sample code calculate unnormalized softmaxlogit as follow.
# softmax, i.e. softmax(WX + b)
with tf.variable_scope('softmax_linear') as scope:
weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
stddev=1/192.0, wd=0.0)
biases = _variable_on_cpu('biases', [NUM_CLASSES],
tf.constant_initializer(0.0))
softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
_activation_summary(softmax_linear)
Does this mean I don't have to use tf.nn.softmax in the code?
You can use tf.nn.softmax in the code if you want, but then you will have to compute the loss yourself:
softmax_logits = tf.nn.softmax(logits)
loss = tf.reduce_mean(- labels * tf.log(softmax_logits) - (1. - labels) * tf.log(1. - softmax_logits))
In practice, you don't use tf.nn.softmax for computing the loss. However you need to use tf.nn.softmax if for instance you want to compute the predictions of your algorithm and compare them to the true labels (to compute accuracy).