Regarding setting up the target tensor shape for sparse_categorical_crossentropy - tensorflow

I am trying to experiment with a multi-layer encoder-decoder type of network. The screenshot of the last several layers of network architecture is as follows. This is how I setup model compiling and training process.
optimizer = SGD(lr=0.001, momentum=0.9, decay=0.0005, nesterov=False)
autoencoder.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])
model.fit(imgs_train, imgs_mask_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1,callbacks=[model_checkpoint])
imgs_train and imgs_mask_train are of shape (2000, 1, 128, 128). imgs_train represent the raw image and imgs_mask_train represents the mask image. I am trying to solve a semantic segmentation problem. However, running the program generates the following error message, (I only keep the main related part).
tensorflow.python.pywrap_tensorflow.StatusNotOK: Invalid argument: logits first dimension must match labels size. logits shape=[4096,128] labels shape=[524288]
[[Node: SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_364, Cast_158)]]
It seems to me that the loss function of sparse_categorical_crossentropy causes the problem for the current (imgs_train, imgs_mask_train) shape setting. The Keras API does not include the detail about how to setup the target tensor. Any suggestions are highly appreciated!

I am currently trying to figure the same problem and as far as I can tell it takes a sparse representation of the target category. That means integers as the target label instead of the one-hot encoded binary class matrix.
Concerning your problem, do you have categories in your masking or do you just have information about the outline of an object? With outline information it becomes a pixel wise binary loss instead of a categorical one. If you have categories, the output of your decoder should have dimensionality (None, number_of_classes, 128, 128). On that you should be able to use a sparse target mask but I haven't tried this myself...
Hope that helps

Related

Keras/TensorFlow: What is the order of the weight tensor dimensions of a convolutional layer?

In channels_last format, the shape of the data tensor is (batch_size, height, width, channels) and the shape of the weight tensor is apparently (see reference 2) (rows, cols, input_depth, output_depth).
In channels_first format, the shape of the data tensor is (batch_size, channels, height, width) and the shape of the weight tensor is what?
I've looked high and low for the answer to that question. When I run my code and use model.get_weights() to get the weight and bias tensors, it appears that the format of the weight tensors is the same in channels_first as in channels_last. Yet, when I output the weight tensors to a file and read them back into my C/C++ code which is hand-crafted and doesn't use TensorFlow, it doesn't appear to be working. The results are numerically nonsensical. Maybe there is some other problem, but I would like to obtain a definitive answer to this question.
BTW, the reason I'm switching between channels_last and channels_first is that I need to be able to develop my code on a CPU machine and then run large training sessions on a GPU machine.
Any help is appreciated.
References:
Data tensor shape is explained here.
Weight tensor shape is partially explained here.
You can find the answer in source code of TF/keras keras/keras/layers/convolutional/base_conv.py, where data_format=channels_first or data_format=channels_last is working when forward calculation, but in weight definition, the kernel shape is kept as:
kernel_shape = self.kernel_size + (input_channel // self.groups, self.filters)
So, it makes you find the weight format is same in channels_first or channels_last by model.get_weights()。
In detail, convolution op is ultimately performed by conv1d, conv2d, conv3d, etc., in gen_nn_ops which defined and conducted by C/C++. Each of these operation need receive data_format to adjust inputs but not kernels (weights/filters).

How to use embedding models in tensorflow hub with LSTM layer?

I'm learning tensorflow 2 working through the text classification with TF hub tutorial. It used an embedding module from TF hub. I was wondering if I could modify the model to include a LSTM layer. Here's what I've tried:
train_data, validation_data, test_data = tfds.load(
name="imdb_reviews",
split=('train[:60%]', 'train[60%:]', 'test'),
as_supervised=True)
embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
dtype=tf.string, trainable=True)
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Embedding(10000, 50))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_data.shuffle(10000).batch(512),
epochs=10,
validation_data=validation_data.batch(512),
verbose=1)
results = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, results):
print("%s: %.3f" % (name, value))
I don't know how to get the vocabulary size from the hub_layer. So I just put 10000 there. When run it, it throws this exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[480,1] = -6 is not in [0, 10000)
[[node sequential/embedding/embedding_lookup (defined at .../learning/tensorflow/text_classify.py:36) ]] [Op:__inference_train_function_36284]
Errors may have originated from an input operation.
Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/34017 (defined at Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py:112)
Function call stack:
train_function
I stuck here. My questions are:
how should I use the embedding module from TF hub to feed an LSTM layer? it looks like embedding lookup has some issues with the setting.
how do I get the vocabulary size from the hub layer?
Thanks
Finally figured out the way to link pre-trained embeddings to LSTM or other layers. Just post the steps here in case anyone feels helpful.
Embedding layer has to be the first layer in the model. (hub_layer is the same as Embedding layer.) The not very intuitive part is that any text input to the hub layer will be converted to only one vector of shape [embedding_dim]. You need to do sentence splitting and tokenization to make sure whatever input to the model is a sequence in the form of array of arrays. e.g., "Let us prepare the data." should be converted to [["let"],["us"],["prepare"], ["the"], ["data"]]. You will also need to pad the sequences if you are using batch mode.
In addition, you will need to convert your target tokens to int if your training labels are strings. The input to the model is array of strings with shape [batch, seq_length], the hub embedding layer converts it to [batch, seq_length, embed_dim]. (If you add a LSTM or other RNN layer, the output from the layer is [batch, seq_length, rnn_units]. ) The output dense layer will output index of text instead of actual text. The index of text is stored in the downloaded tfhub directory as "tokens.txt". You can load the file and convert text to the corresponding index. Otherwise you cannot compute the loss.

Keras: BiLSTM only works when return_sequences=True

I've been trying to implement this BiLSTM in Keras: https://github.com/ffancellu/NegNN
Here is where I'm at, and it kind of works:
inputs_w = Input(shape=(sequence_length,), dtype='int32')
inputs_pos = Input(shape=(sequence_length,), dtype='int32')
inputs_cue = Input(shape=(sequence_length,), dtype='int32')
w_emb = Embedding(vocabulary_size+1, embedding_dim, input_length=sequence_length, trainable=False)(inputs_w)
p_emb = Embedding(tag_voc_size+1, embedding_dim, input_length=sequence_length, trainable=False)(inputs_pos)
c_emb = Embedding(2, embedding_dim, input_length=sequence_length, trainable=False)(inputs_cue)
summed = keras.layers.add([w_emb, p_emb, c_emb])
BiLSTM = Bidirectional(CuDNNLSTM(hidden_dims, return_sequences=True))(summed)
DPT = Dropout(0.2)(BiLSTM)
outputs = Dense(2, activation='softmax')(DPT)
checkpoint = ModelCheckpoint('bilstm_one_hot.hdf5', monitor='val_loss', verbose=1, save_best_only=True, mode='auto')
early = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=5, verbose=1, mode='auto')
model = Model(inputs=[inputs_w, inputs_pos, inputs_cue], outputs=outputs)
model.compile('adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit([X_train, X_pos_train, X_cues_train], Y_train, batch_size=batch_size, epochs=num_epochs, verbose=1, validation_split=0.2, callbacks=[early, checkpoint])
In the original code, in Tensorflow, the author uses masking and softmax cross entropy with logits. I don't get how to implement this in Keras yet. If you have any advice don't hesitate.
My main issue here is with return_sequences=True. The author doesn't appear to be using it in his tensorflow implementation and when I turn it to False, I get this error:
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (820, 109, 2)
I also tried using:
outputs = TimeDistributed(Dense(2, activation='softmax'))(BiLSTM)
which returns and AssertionError without any information.
Any ideas ?
Thanks
the author uses masking and softmax cross entropy with logits. I don't get how to implement this in Keras yet.
Regarding softmax cross entropy with logits, you are doing it correctly. softmax_cross_entropy_with_logits as the loss function + no activation function on the last layer is the same as your approach with categorical_crossentropy as loss + softmax activation on the last layer. The only difference is that the latter one is numerically less stable. If this turns out to be an issue for you, you can (if your Keras backend is tensorflow) just pass tf.softmax_cross_entropy_with_logits as your loss. If you have another backend, you will have to look for an equivalent there.
Regarding masking, I'm not sure if I fully understand what the author is doing. However, in Keras the Embedding layer has a mask_zero parameter that you can set to True. In that case all timesteps that have a 0 will be ignored in all further calculations. In your source, it is not 0 that is being masked, though, so you would have to adjust the indices accordingly. If that doesn't work, there is the Masking layer in Keras that you can put before your recurrent layer, but I have little experience with that.
My main issue here is with return_sequences=True. The author doesn't
appear to be using it
What makes you think that he doesn't use it? Just because that keyword does not appear in the code doesn't mean anything. But I'm also not sure. The code is pretty old and I don't find it in the docs anymore that could tell what the defaults are.
Anyway, if you want to use return_sequences=False (for whatever reason) be aware that this changes the output shape of the layer:
with return_sequences=True the output shape is (batch_size, timesteps, features)
with return_sequences=False the output shape is (batch_size, features)
The error you are getting is basically telling you that your network's output has one dimension less than the target y values you are feeding it.
So, to me it looks like return_sequences=True is just what you need, but without further information it is hard to tell.
Then, regarding TimeDistributed. I'm not quite sure what you are trying to achieve with it, but quoting from the docs:
This wrapper applies a layer to every temporal slice of an input.
The input should be at least 3D, and the dimension of index one will be considered to be the temporal dimension.
(emphasis is mine)
I'm not sure from your question, in which scenario the empty assertion occurs.
If you have a recurrent layer with return_sequences=False before, you are again missing a dimension (I can't tell you why the assertion is empty, though).
If you have a recurrent layer with return_sequences=True before, it should work, but it would be completely useless, as Dense is applied in a time distributed way anyways. If I'm not mistaken, this behavior of the Dense layer was changed in some older Keras version (they should really update the example there and stop using Dense!). As the code you are referring to is quite old, it's well possible that TimeDistributed was needed back then, but is not needed anymore.
If your plan was to restore the missing dimension, TimeDistributed won't help you, but RepeatVector would. But, as already said, in that case better use return_sequences=True in the first place.
The problem is that your target values seem to be time distributed. So you have 109 timesteps with a onehot target vector of size two. This is why you need the return_sequences=True. Otherwise you will just feed the last timestep to the Dense layer and you would just have one output.
So depending on what you need you need to keep it like it is now or if just the last timestep is enough for you you can get rid of it, but then you would need to adjust the y values accordingly.

Custom loss function works even though dimensions mismatch

I'm using Keras/TF with the following model:
conv = Conv2D(4, 3, activation = None, use_bias=True)(inputs)
conv = Conv2D(2, 1, activation = None, use_bias=True)(conv)
model = Model(input = inputs, output = conv)
model.compile(optimizer=Adam(lr=1e-4), loss=keras.losses.mean_absolute_error)
In model.fit, I get an error saying:
ValueError: Error when checking target: expected conv2d_2 to have
shape (300, 320, 2) but got array with shape (300, 320, 1)
This is as expected because the targets are single channel images whereas the last layer in the model has 2 channels.
What I don't understand is why when I use a custom loss function:
def my_loss2(y_true, y_pred):
return keras.losses.mean_absolute_error(y_true, y_pred)
and compile the model:
model.compile(optimizer = Adam(lr=1e-4), loss=my_loss2)
it does work (or at least, not giving the error). Is there any kind of automatic conversion/truncation going on?
I'm using TF (CPU) 1.12.0, and Keras 2.2.2
Sincerely,
Elad
Why is the behavior different for built-in and custom losses?
It turns out that Keras is performing an upfront shape check for built-in functions that are defined in the losses module.
In the source code of Model._standardize_user_data, which is called by fit, I found this comment:
# If `loss_fn` is not a function (e.g. callable class)
# or if it not in the `losses` module, then
# it is a user-defined loss and we make no assumptions
# about it.
In the code around that comment you can see that indeed, depending on the type of loss function (built-in or custom), the output shape is either passed to an inner call of standardize_input_data or not. If the output shape is passed, standardize_input_data is raising the error message you are getting.
And I think this behavior makes some sense: Without knowing the implementation of a loss function, you cannot know its shape requirements. Someone may invent some loss function that needs different shapes. On the other hand, the docs clearly say that the loss function's parameters must have the same shape:
y_true: True labels. TensorFlow/Theano tensor.
y_pred: Predictions. TensorFlow/Theano tensor of the same shape as y_true.
So I find this a little inconsistent...
Why does your custom loss function work with incompatible shapes?
If you provide a custom loss, it may still work, even if the shapes do not perfectly match. In your case, where only the last dimension is different, I'm quite sure that broadcasting is what is happening. The last dimension of your targets will just be duplicated.
In many cases broadcasting is quite useful. Here, however, it is likely not since it hides a logical error.

Tensorflow weighted vs sigmoid cross-entropy loss

I am trying to implement multi-label classification using TensorFlow (i.e., each output pattern can have many active units). The problem has imbalanced classes (i.e., much more zeros than ones in the labels distribution, which makes label patterns very sparse).
The best way to tackle the problem should be to use the tf.nn.weighted_cross_entropy_with_logits function. However, I get this runtime error:
ValueError: Tensor conversion requested dtype uint8 for Tensor with dtype float32
I can't understand what is wrong here. As input to the loss function, I pass the labels tensor, the logits tensor, and the positive class weight, which is a constant:
positive_class_weight = 10
loss = tf.nn.weighted_cross_entropy_with_logits(targets=labels, logits=logits, pos_weight=positive_class_weight)
Any hints about how to solve this? If I just pass the same labels and logits tensors to the tf.losses.sigmoid_cross_entropy loss function, everything works well (in the sense that Tensorflow runs properly, but of course following training predictions are always zero).
See related problem here.
The error is likely to be thrown after the loss function, because the only significant difference between tf.losses.sigmoid_cross_entropy and tf.nn.weighted_cross_entropy_with_logits is the shape of the returned tensor.
Take a look at this example:
logits = tf.linspace(-3., 5., 10)
labels = tf.fill([10,], 1.)
positive_class_weight = 10
weighted_loss = tf.nn.weighted_cross_entropy_with_logits(targets=labels, logits=logits, pos_weight=positive_class_weight)
print(weighted_loss.shape)
sigmoid_loss = tf.losses.sigmoid_cross_entropy(multi_class_labels=labels, logits=logits)
print(sigmoid_loss.shape)
Tensors logits and labels are kind of artificial and both have shape (10,). But it's important that weighted_loss and sigmoid_loss are different. Here's the output:
(10,)
()
This is because tf.losses.sigmoid_cross_entropy performs reduction (the sum by default). So in order to replicate it, you have to wrap the weighted loss with tf.reduce_sum(...).
If this doesn't help, make sure that labels tensor has type float32. This bug is very easy to make, e.g., the following declaration won't work:
labels = tf.fill([10,], 1) # the type is not float!
You might be also interested to read this question.