Can i have more than one 1s in the target one hot encoded vector in categorical cross entropy? - tensorflow

If I prepared target labels as eg:[0,0,1,0,1] which contains number 1 more than once. will the categorical cross entropy work fine or is there a good way to do this? please help.

Yes you can have, it would be a multi-label classification
The cross-entropy would calculate something like this for [0,0,1,0,1]
loss = -[0*log(p0)+0*log(p1)+1*log(p2)+0*log(p3)+1*log(p4)]

Related

Which Loss function & Metrics is more suitable for multi-label classification? Binary or Categorical cross-entropy and Why?

According to my knowledge(please correct me if I'm wrong),
Multi-label classification(mutually inclusive) i.e., samples might have more than 1 correct values (for example movie genre, disease detection, etc).
Multi-Class classification(mutually exclusive) i.e., samples will always have 1 correct value (for example Cat or Dog, object detection, etc) this includes Binary Classification.
Assuming output is one-hot encoding.
What are the Loss function and metrics on has to use for these 2 types?
loss func. metrics
1. multi-label: (binary, categorical) (binary_accuracy, TopKCategorical accuracy, categorical_accuracy, AUC)
2. multi-class: (binary) (binary_accuracy,f1, recall, precision)
Please tell me from the above table which of them is/are more suitable, which of them is/are wrong & Why?
If you are trying to use multi-class classification provided that the labels (y) is one hot encoded, use the loss function as categorical crossentropy and use adam optimizer (It is suitable for most cases). Also, while using multi-class classification, the number of output nodes should be the same as the number of classes (or) labels. Say if your model is going to classify the input into 4 classes, You can configure the output layer as follows..
model.add(4, activation = "softmax")
Also, forgot to mention that softmax activation should be used in the output layer for multiclass classification problems.
Incase if your y is not one hot encoded, I would advise you to choose the loss function as sparse categorical crossentropy. No other changes will be necessary.
Also, I usually split the data into test data and train data and feed them to the model like this to get the accuracy in each epoch..
history = model.fit(train_data, validation_data = test_data, epochs = 10)
Hope it solved your problem.

Output Format using Sparse Categorical Cross Entropy in Keras for Multi-Class Classification

I've built a u-net architecture using Keras Functional API but I'm having trouble using the sparse categorical cross entropy loss function. My learning task is multi-class, pixel-wise classification for many 256x256 images. The intended output is a 256x256 mask images with integer values from 0-31 (not every mask will contain each class). I have 32 classes so one-hot encoding gives me an OOM error which is why I don't use categorical cross entropy. The majority of the mask pixels are 0s (which may be part of the problem).
I keep getting loss = nan. I've normalized my input data to have mean = 0, std = 1. If I leave the masks as they are, I get an accuracy around 0.97 and the output masks are all 1s (which is obviously incorrect). If I add 1 to all my masks before performing training, the accuracy is 0. I'm using relu activations with a SoftMax in the last convolutional layer.
It seems the problem likely has to do with the format of my output data, so my main question is, what format should it be in for sparse categorical cross entropy? Should I normalize the mask values to be 0-1? Alternatively, are there any other loss functions or accuracy metrics I can use for training? As far as multi-class classification goes the only function I know of is categorical cross entropy. I can provide additional information about my data, network, etc. if needed.

Loss function for GCN(semi-supervised classification)

When i read ICLR 2017,"Semi-supervised classification with GCN" .
I'm confused about the loss function.
Intuitively, how can i understand this.
Thank you.
I am also a beginner in this.
For intuition, Z_l is the vector of representation you learnt with number of F features/multi-class. and Y_l is the same size vector of multi-class label you have already knew. What you want is to minimize the loss between Z and F.
This format of loss function is called entropy, which is a classic one.
Compared with the 1-norm or 2-norm, this format can converge faster.

what is the difference between sampled_softmax_loss and nce_loss in tensorflow?

i notice there are two functions about negative Sampling in tensorflow to compute the loss (sampled_softmax_loss and nce_loss). the paramaters of these two function are similar, but i really want to know what is the difference between the two?
Sample softmax is all about selecting a sample of the given number and try to get the softmax loss. Here the main objective is to make the result of the sampled softmax equal to our true softmax. So algorithm basically concentrate lot on selecting the those samples from the given distribution.
On other hand NCE loss is more of selecting noise samples and try to mimic the true softmax. It will take only one true class and a K noise classes.
Sampled softmax tries to normalise over all samples in your output. Having a non-normal distribution (logarithmic over your labels) this is not an optimal loss function. Note that although they have the same parameters, they way you use the function is different. Take a look at the documentation here: https://github.com/calebchoo/Tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard4/tf.nn.nce_loss.md and read this line:
By default this uses a log-uniform (Zipfian) distribution for sampling, so your labels must be sorted in order of decreasing frequency to achieve good results. For more details, see log_uniform_candidate_sampler.
Take a look at this paper where they explain why they use it for word embeddings: http://papers.nips.cc/paper/5165-learning-word-embeddings-efficiently-with-noise-contrastive-estimation.pdf
Hope this helps!
Check out this documentation from TensorFlow https://www.tensorflow.org/extras/candidate_sampling.pdf
They seem pretty similar, but sampled softmax is only applicable for a single label while NCE extends to the case where your labels are a multiset. NCE can then model the expected counts rather than presence/absence of a label. I'm not clear on an exact example of when to use the sampled_softmax.

Multilabel image classification with sparse labels in TensorFlow?

I want to perform a multilabel image classification task for n classes.
I've got sparse label vectors for each image and each dimension of each label vector is currently encoded in this way:
1.0 ->Label true / Image belongs to this class
-1.0 ->Label false / Image does not contain to this class.
0.0 ->missing value/label
E.g.: V= {1.0,-1.0,1.0, 0.0}
For this example V the model should learn, that the corresponding image should be classified in the first and third class.
My problem is currently how to handle the missing values/labels. I've searched through the issues and found this issue:
tensorflow/skflow#113 found here
So could do multilable image classification with:
tf.nn.sigmoid_cross_entropy_with_logits(logits, targets, name=None)
but TensorFlow has this error function for sparse softmax, which is used for exclusive classification:
tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels, name=None)
So is there something like sparse sigmoid cross entropy? (Couldn't find something) or any suggestions how can I handle my multilabel classification problem with sparse labels.
I used weighted_cross_entropy_with_logits as the loss function with positive weights for 1s.
In my case, all the labels are equally important. But 0 was ten times more likely to be appeared as the value of any label than 1.
So I weighed all the 1s by calling the pos_weight parameter of the aforementioned loss function. I used a pos_weight (= weight on positive values) of 10. By the way, I do not recommend any strategy to calculate the pos_weight. I think it will depend explicitly on the data in hand.
if real label = 1,
weighted_cross_entropy = pos_weight * sigmoid_cross_entropy
Weighted cross entropy with logits is same as the Sigmoid cross entropy with logits, except for the extra weight value multiplied to all the targets with a positive real value i.e.; 1.
Theoretically, it should do the job. I am still tuning other parameters to optimize the performance. Will update with performance statistics later.
First I would like to know what you mean by missing data? What is the difference between miss and false in your case?
Next, I think it is wrong that you represent your data like this. You have unrelated information that you try to represent on the same dimension. (If it was false or true it would work)
What seems to me better is to represent for each of your class a probability if it is good, or is missing or is false.
In your case V = [(1,0,0),(0,0,1),(1,0,0),(0,1,0)]
Ok!
So your problem is more about how to handle the missing data I think.
So I think you should definitely use tf.sigmoid_cross_entropy_with_logits()
Just change the target for the missing data to 0.5. (0 for false and 1 for true).
I never tried this approach but it should let your network learn without biasing it too much.