Quantize Embeddings - tensorflow

I would like to quantize embeddings to a single signed byte in each dimension. If I try to do this by scaling the values to [-127, 128], then casting to tf.int8, recasting to tf.float32 and rescaling to the original [-1, 1] range, I get the following error:
ValueError: No gradients provided for any variable
The same training script works fine without the quantization step.
According to this thread, quantized ops will be a future feature of tensorflow. In the mean time is there a good work-around for this simple quantization scenario?

Related

Keras/TensorFlow: What is the order of the weight tensor dimensions of a convolutional layer?

In channels_last format, the shape of the data tensor is (batch_size, height, width, channels) and the shape of the weight tensor is apparently (see reference 2) (rows, cols, input_depth, output_depth).
In channels_first format, the shape of the data tensor is (batch_size, channels, height, width) and the shape of the weight tensor is what?
I've looked high and low for the answer to that question. When I run my code and use model.get_weights() to get the weight and bias tensors, it appears that the format of the weight tensors is the same in channels_first as in channels_last. Yet, when I output the weight tensors to a file and read them back into my C/C++ code which is hand-crafted and doesn't use TensorFlow, it doesn't appear to be working. The results are numerically nonsensical. Maybe there is some other problem, but I would like to obtain a definitive answer to this question.
BTW, the reason I'm switching between channels_last and channels_first is that I need to be able to develop my code on a CPU machine and then run large training sessions on a GPU machine.
Any help is appreciated.
References:
Data tensor shape is explained here.
Weight tensor shape is partially explained here.
You can find the answer in source code of TF/keras keras/keras/layers/convolutional/base_conv.py, where data_format=channels_first or data_format=channels_last is working when forward calculation, but in weight definition, the kernel shape is kept as:
kernel_shape = self.kernel_size + (input_channel // self.groups, self.filters)
So, it makes you find the weight format is same in channels_first or channels_last by model.get_weights()。
In detail, convolution op is ultimately performed by conv1d, conv2d, conv3d, etc., in gen_nn_ops which defined and conducted by C/C++. Each of these operation need receive data_format to adjust inputs but not kernels (weights/filters).

Obtain a one hot encoded representation of an image and computing loss

Currently I am trying to obtain a one-hot-encoded representation of an 84x84 grayscale image in TensorFlow. The result should be a 84x84x64 where 64 indicates the amount of bins for the pixel intensities (0-3, 4-7, ..., 252-255).
So far I tried the following:
one_hot_image = tf.div(self.frame, 4)
one_hot_image = tf.one_hot(tf.cast(tf.round(tf.reshape(one_hot_image, [-1, 84, 84])), tf.int32), 64, on_value=1.0, off_value=0.0, axis=-1)
However due to the cast I end up with the following error when computing the sigmoid cross entropy and performing an Adam optimisation step on the loss.
ValueError: No gradients provided for any variable, check your graph
for ops that do not support gradients, between variables.
The loss is computed on images of 84x84x64 to determine how related the original image and the reconstructed image is (implementing an auto-encoder). Any ideas or help is appreciated!

Working with multiple losses and their weights in keras

Training a GAN model using train_on_batch with multiple losses, can I use random loss_weights while compiling a model or is there some specific strategy to use these loss weights as mentioned Here. In my problem, mean_sqaured_error is a loss function for generated_image and original_image and binary_crossentropy is a classification loss function for 0 and 1 class.
model.compile(optimizer=optimizer, loss=['mean_squared_error', 'binary_crossentropy'], loss_weights=[100,1])
The weights are hyper parameters that you need to optimize. Notice that optimizing these hyper parameters is not simple, due to the fact that lowering the weights will automatically decrease the loss (which we usually aim to minimize), but will not necessarily create a better model. MSE can range between [0, infinity) if not normalized, or, e.g. [0, 1] if the features are normalized between [0,1] (and a sigmoid is used). Binary cross entropy values can range between [0, infinity), which make sthe process not as simple as we may think. Without any knowledge of your specific problem I will try first using the default weights (1 each).

Tensorflow weighted vs sigmoid cross-entropy loss

I am trying to implement multi-label classification using TensorFlow (i.e., each output pattern can have many active units). The problem has imbalanced classes (i.e., much more zeros than ones in the labels distribution, which makes label patterns very sparse).
The best way to tackle the problem should be to use the tf.nn.weighted_cross_entropy_with_logits function. However, I get this runtime error:
ValueError: Tensor conversion requested dtype uint8 for Tensor with dtype float32
I can't understand what is wrong here. As input to the loss function, I pass the labels tensor, the logits tensor, and the positive class weight, which is a constant:
positive_class_weight = 10
loss = tf.nn.weighted_cross_entropy_with_logits(targets=labels, logits=logits, pos_weight=positive_class_weight)
Any hints about how to solve this? If I just pass the same labels and logits tensors to the tf.losses.sigmoid_cross_entropy loss function, everything works well (in the sense that Tensorflow runs properly, but of course following training predictions are always zero).
See related problem here.
The error is likely to be thrown after the loss function, because the only significant difference between tf.losses.sigmoid_cross_entropy and tf.nn.weighted_cross_entropy_with_logits is the shape of the returned tensor.
Take a look at this example:
logits = tf.linspace(-3., 5., 10)
labels = tf.fill([10,], 1.)
positive_class_weight = 10
weighted_loss = tf.nn.weighted_cross_entropy_with_logits(targets=labels, logits=logits, pos_weight=positive_class_weight)
print(weighted_loss.shape)
sigmoid_loss = tf.losses.sigmoid_cross_entropy(multi_class_labels=labels, logits=logits)
print(sigmoid_loss.shape)
Tensors logits and labels are kind of artificial and both have shape (10,). But it's important that weighted_loss and sigmoid_loss are different. Here's the output:
(10,)
()
This is because tf.losses.sigmoid_cross_entropy performs reduction (the sum by default). So in order to replicate it, you have to wrap the weighted loss with tf.reduce_sum(...).
If this doesn't help, make sure that labels tensor has type float32. This bug is very easy to make, e.g., the following declaration won't work:
labels = tf.fill([10,], 1) # the type is not float!
You might be also interested to read this question.

Regarding setting up the target tensor shape for sparse_categorical_crossentropy

I am trying to experiment with a multi-layer encoder-decoder type of network. The screenshot of the last several layers of network architecture is as follows. This is how I setup model compiling and training process.
optimizer = SGD(lr=0.001, momentum=0.9, decay=0.0005, nesterov=False)
autoencoder.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])
model.fit(imgs_train, imgs_mask_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1,callbacks=[model_checkpoint])
imgs_train and imgs_mask_train are of shape (2000, 1, 128, 128). imgs_train represent the raw image and imgs_mask_train represents the mask image. I am trying to solve a semantic segmentation problem. However, running the program generates the following error message, (I only keep the main related part).
tensorflow.python.pywrap_tensorflow.StatusNotOK: Invalid argument: logits first dimension must match labels size. logits shape=[4096,128] labels shape=[524288]
[[Node: SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_364, Cast_158)]]
It seems to me that the loss function of sparse_categorical_crossentropy causes the problem for the current (imgs_train, imgs_mask_train) shape setting. The Keras API does not include the detail about how to setup the target tensor. Any suggestions are highly appreciated!
I am currently trying to figure the same problem and as far as I can tell it takes a sparse representation of the target category. That means integers as the target label instead of the one-hot encoded binary class matrix.
Concerning your problem, do you have categories in your masking or do you just have information about the outline of an object? With outline information it becomes a pixel wise binary loss instead of a categorical one. If you have categories, the output of your decoder should have dimensionality (None, number_of_classes, 128, 128). On that you should be able to use a sparse target mask but I haven't tried this myself...
Hope that helps