Custom loss function works even though dimensions mismatch - tensorflow

I'm using Keras/TF with the following model:
conv = Conv2D(4, 3, activation = None, use_bias=True)(inputs)
conv = Conv2D(2, 1, activation = None, use_bias=True)(conv)
model = Model(input = inputs, output = conv)
model.compile(optimizer=Adam(lr=1e-4), loss=keras.losses.mean_absolute_error)
In model.fit, I get an error saying:
ValueError: Error when checking target: expected conv2d_2 to have
shape (300, 320, 2) but got array with shape (300, 320, 1)
This is as expected because the targets are single channel images whereas the last layer in the model has 2 channels.
What I don't understand is why when I use a custom loss function:
def my_loss2(y_true, y_pred):
return keras.losses.mean_absolute_error(y_true, y_pred)
and compile the model:
model.compile(optimizer = Adam(lr=1e-4), loss=my_loss2)
it does work (or at least, not giving the error). Is there any kind of automatic conversion/truncation going on?
I'm using TF (CPU) 1.12.0, and Keras 2.2.2
Sincerely,
Elad

Why is the behavior different for built-in and custom losses?
It turns out that Keras is performing an upfront shape check for built-in functions that are defined in the losses module.
In the source code of Model._standardize_user_data, which is called by fit, I found this comment:
# If `loss_fn` is not a function (e.g. callable class)
# or if it not in the `losses` module, then
# it is a user-defined loss and we make no assumptions
# about it.
In the code around that comment you can see that indeed, depending on the type of loss function (built-in or custom), the output shape is either passed to an inner call of standardize_input_data or not. If the output shape is passed, standardize_input_data is raising the error message you are getting.
And I think this behavior makes some sense: Without knowing the implementation of a loss function, you cannot know its shape requirements. Someone may invent some loss function that needs different shapes. On the other hand, the docs clearly say that the loss function's parameters must have the same shape:
y_true: True labels. TensorFlow/Theano tensor.
y_pred: Predictions. TensorFlow/Theano tensor of the same shape as y_true.
So I find this a little inconsistent...
Why does your custom loss function work with incompatible shapes?
If you provide a custom loss, it may still work, even if the shapes do not perfectly match. In your case, where only the last dimension is different, I'm quite sure that broadcasting is what is happening. The last dimension of your targets will just be duplicated.
In many cases broadcasting is quite useful. Here, however, it is likely not since it hides a logical error.

Related

My tensorflow 2.0 custom model is not receiving the shape or values I expect

I'm in the process of converting my pytorch models into tensorflow 2.0, so I'm still getting used to it. I have mostly gone off the API, I made a custom model, and defined it's call method with argument inputs:
class CustomModel(tf.keras.Model):
<... init...>
def call(self, inputs):
print("inputs: ", inputs)
self.sequential_convolution(inputs)
The sequential_convolution is a keras.Sequential of multiple convolution related layers. I can create the model object, compile it. It is variable length on both the output and input
model = CustomModel(inputs=tf.keras.Input(shape=(None, vdim)))
model.compile(optimizer=optimizer, loss=loss_func, metrics=[calc_accuracy])
for x, y in dataset:
print("x.shape: ", x.shape)
print("y.shape: ", y.shape)
model.fit(x, y, batch_size=1)
Where the shapes are x.shape: (244, 161) and y.shape: (40,). Both are Tensorflow tensors created from numpy arrays with tf.convert_to_tensor().
But when the model's call method prints the inputs, I get the following:
Tensor("input_1_1:0", shape=(None, 161), dtype=float32)
Which I should point out is not the Input defined on the model, this input is calculated from the actual input provided in the model.fit(), I manually changed the numbers to see what the causes were...
Which then ultimately leads to the stack trace:
x = self.sequential_conv(inputs)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py:396 converted_call
return py_builtins.overload_of(f)(*args)
TypeError: 'NoneType' object is not callable
This error occurs in a function deemed internal use only, but not able to ascertain what the cause of my problem is.
As I can't find much information on the matter, I feel that it's most likely something simple I haven't done, but I'm not sure. Any help would be great...

Keras: BiLSTM only works when return_sequences=True

I've been trying to implement this BiLSTM in Keras: https://github.com/ffancellu/NegNN
Here is where I'm at, and it kind of works:
inputs_w = Input(shape=(sequence_length,), dtype='int32')
inputs_pos = Input(shape=(sequence_length,), dtype='int32')
inputs_cue = Input(shape=(sequence_length,), dtype='int32')
w_emb = Embedding(vocabulary_size+1, embedding_dim, input_length=sequence_length, trainable=False)(inputs_w)
p_emb = Embedding(tag_voc_size+1, embedding_dim, input_length=sequence_length, trainable=False)(inputs_pos)
c_emb = Embedding(2, embedding_dim, input_length=sequence_length, trainable=False)(inputs_cue)
summed = keras.layers.add([w_emb, p_emb, c_emb])
BiLSTM = Bidirectional(CuDNNLSTM(hidden_dims, return_sequences=True))(summed)
DPT = Dropout(0.2)(BiLSTM)
outputs = Dense(2, activation='softmax')(DPT)
checkpoint = ModelCheckpoint('bilstm_one_hot.hdf5', monitor='val_loss', verbose=1, save_best_only=True, mode='auto')
early = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=5, verbose=1, mode='auto')
model = Model(inputs=[inputs_w, inputs_pos, inputs_cue], outputs=outputs)
model.compile('adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit([X_train, X_pos_train, X_cues_train], Y_train, batch_size=batch_size, epochs=num_epochs, verbose=1, validation_split=0.2, callbacks=[early, checkpoint])
In the original code, in Tensorflow, the author uses masking and softmax cross entropy with logits. I don't get how to implement this in Keras yet. If you have any advice don't hesitate.
My main issue here is with return_sequences=True. The author doesn't appear to be using it in his tensorflow implementation and when I turn it to False, I get this error:
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (820, 109, 2)
I also tried using:
outputs = TimeDistributed(Dense(2, activation='softmax'))(BiLSTM)
which returns and AssertionError without any information.
Any ideas ?
Thanks
the author uses masking and softmax cross entropy with logits. I don't get how to implement this in Keras yet.
Regarding softmax cross entropy with logits, you are doing it correctly. softmax_cross_entropy_with_logits as the loss function + no activation function on the last layer is the same as your approach with categorical_crossentropy as loss + softmax activation on the last layer. The only difference is that the latter one is numerically less stable. If this turns out to be an issue for you, you can (if your Keras backend is tensorflow) just pass tf.softmax_cross_entropy_with_logits as your loss. If you have another backend, you will have to look for an equivalent there.
Regarding masking, I'm not sure if I fully understand what the author is doing. However, in Keras the Embedding layer has a mask_zero parameter that you can set to True. In that case all timesteps that have a 0 will be ignored in all further calculations. In your source, it is not 0 that is being masked, though, so you would have to adjust the indices accordingly. If that doesn't work, there is the Masking layer in Keras that you can put before your recurrent layer, but I have little experience with that.
My main issue here is with return_sequences=True. The author doesn't
appear to be using it
What makes you think that he doesn't use it? Just because that keyword does not appear in the code doesn't mean anything. But I'm also not sure. The code is pretty old and I don't find it in the docs anymore that could tell what the defaults are.
Anyway, if you want to use return_sequences=False (for whatever reason) be aware that this changes the output shape of the layer:
with return_sequences=True the output shape is (batch_size, timesteps, features)
with return_sequences=False the output shape is (batch_size, features)
The error you are getting is basically telling you that your network's output has one dimension less than the target y values you are feeding it.
So, to me it looks like return_sequences=True is just what you need, but without further information it is hard to tell.
Then, regarding TimeDistributed. I'm not quite sure what you are trying to achieve with it, but quoting from the docs:
This wrapper applies a layer to every temporal slice of an input.
The input should be at least 3D, and the dimension of index one will be considered to be the temporal dimension.
(emphasis is mine)
I'm not sure from your question, in which scenario the empty assertion occurs.
If you have a recurrent layer with return_sequences=False before, you are again missing a dimension (I can't tell you why the assertion is empty, though).
If you have a recurrent layer with return_sequences=True before, it should work, but it would be completely useless, as Dense is applied in a time distributed way anyways. If I'm not mistaken, this behavior of the Dense layer was changed in some older Keras version (they should really update the example there and stop using Dense!). As the code you are referring to is quite old, it's well possible that TimeDistributed was needed back then, but is not needed anymore.
If your plan was to restore the missing dimension, TimeDistributed won't help you, but RepeatVector would. But, as already said, in that case better use return_sequences=True in the first place.
The problem is that your target values seem to be time distributed. So you have 109 timesteps with a onehot target vector of size two. This is why you need the return_sequences=True. Otherwise you will just feed the last timestep to the Dense layer and you would just have one output.
So depending on what you need you need to keep it like it is now or if just the last timestep is enough for you you can get rid of it, but then you would need to adjust the y values accordingly.

Tensorflow weighted vs sigmoid cross-entropy loss

I am trying to implement multi-label classification using TensorFlow (i.e., each output pattern can have many active units). The problem has imbalanced classes (i.e., much more zeros than ones in the labels distribution, which makes label patterns very sparse).
The best way to tackle the problem should be to use the tf.nn.weighted_cross_entropy_with_logits function. However, I get this runtime error:
ValueError: Tensor conversion requested dtype uint8 for Tensor with dtype float32
I can't understand what is wrong here. As input to the loss function, I pass the labels tensor, the logits tensor, and the positive class weight, which is a constant:
positive_class_weight = 10
loss = tf.nn.weighted_cross_entropy_with_logits(targets=labels, logits=logits, pos_weight=positive_class_weight)
Any hints about how to solve this? If I just pass the same labels and logits tensors to the tf.losses.sigmoid_cross_entropy loss function, everything works well (in the sense that Tensorflow runs properly, but of course following training predictions are always zero).
See related problem here.
The error is likely to be thrown after the loss function, because the only significant difference between tf.losses.sigmoid_cross_entropy and tf.nn.weighted_cross_entropy_with_logits is the shape of the returned tensor.
Take a look at this example:
logits = tf.linspace(-3., 5., 10)
labels = tf.fill([10,], 1.)
positive_class_weight = 10
weighted_loss = tf.nn.weighted_cross_entropy_with_logits(targets=labels, logits=logits, pos_weight=positive_class_weight)
print(weighted_loss.shape)
sigmoid_loss = tf.losses.sigmoid_cross_entropy(multi_class_labels=labels, logits=logits)
print(sigmoid_loss.shape)
Tensors logits and labels are kind of artificial and both have shape (10,). But it's important that weighted_loss and sigmoid_loss are different. Here's the output:
(10,)
()
This is because tf.losses.sigmoid_cross_entropy performs reduction (the sum by default). So in order to replicate it, you have to wrap the weighted loss with tf.reduce_sum(...).
If this doesn't help, make sure that labels tensor has type float32. This bug is very easy to make, e.g., the following declaration won't work:
labels = tf.fill([10,], 1) # the type is not float!
You might be also interested to read this question.

Return a tf.Variable from an Estimator

I have an Tensorflow Estimator defined by a model function in the usual way.
I want to determine which of my (zscore normalised) inputs are significant to the result, and which can be eliminated. I have altered the model to introduce two changes:
(1) A new layer weight_layer which is randomly intialized and elementwise multiplied with input_layer.
weight_layer = tf.Variable(tf.random_normal([1, inputs_n], 0.5, 1))
weighted_input = tf.multiply(weight_layer, input_layer)
first_hidden_layer = tf.layers.dense(weighted_input,
int(inputs_n),
activation=tf.nn.relu,
name='dense1')
(2) A penalty sparsity which is added to the loss function to penalize the loss by the sum of the weights in weight_layer
sparsity = tf.reduce_sum(weight_layer)
loss = tf.losses.mean_squared_error(labels, predictions) + (1000*sparsity)
The trouble comes at prediction time, when I try to return the values of weight_layer, as follows:
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec( mode=mode,
predictions={
"predictions": predictions,
"sparsity" : weight_layer})
I get the following error:
TypeError: predictions[sparsity] must be Tensor,
given: <tf.Variable 'Model/Variable:0' shape=(1, 275) dtype=float32_ref>
This seems odd, since although predictions[sparsity] is not a Tensor, it is a tf.Variable, and the tf.Variable documentation suggests I can treat a tf.Variable 'like a normal tf.Tensor'.
How can I fix the above to return the weight_layer, or if I there is a more fundamental mistake, please recommend a way for me to determine which of my input variables are significant.
Although, I don't know very well the dynamics of EstimatorSpec, it seems you are trying to feed the model the weight_layer into sparsity Variable, but as they have different shapes due to tf.reduce_sum, hence it's raising the error.

Regarding setting up the target tensor shape for sparse_categorical_crossentropy

I am trying to experiment with a multi-layer encoder-decoder type of network. The screenshot of the last several layers of network architecture is as follows. This is how I setup model compiling and training process.
optimizer = SGD(lr=0.001, momentum=0.9, decay=0.0005, nesterov=False)
autoencoder.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])
model.fit(imgs_train, imgs_mask_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1,callbacks=[model_checkpoint])
imgs_train and imgs_mask_train are of shape (2000, 1, 128, 128). imgs_train represent the raw image and imgs_mask_train represents the mask image. I am trying to solve a semantic segmentation problem. However, running the program generates the following error message, (I only keep the main related part).
tensorflow.python.pywrap_tensorflow.StatusNotOK: Invalid argument: logits first dimension must match labels size. logits shape=[4096,128] labels shape=[524288]
[[Node: SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_364, Cast_158)]]
It seems to me that the loss function of sparse_categorical_crossentropy causes the problem for the current (imgs_train, imgs_mask_train) shape setting. The Keras API does not include the detail about how to setup the target tensor. Any suggestions are highly appreciated!
I am currently trying to figure the same problem and as far as I can tell it takes a sparse representation of the target category. That means integers as the target label instead of the one-hot encoded binary class matrix.
Concerning your problem, do you have categories in your masking or do you just have information about the outline of an object? With outline information it becomes a pixel wise binary loss instead of a categorical one. If you have categories, the output of your decoder should have dimensionality (None, number_of_classes, 128, 128). On that you should be able to use a sparse target mask but I haven't tried this myself...
Hope that helps