What is the difference of BinaryCrossentropy and SparseCategoricalCrossentropy? - tensorflow

I´m training a CNN for binary image classification and I´m at the point where I have to choose the loss function and searching for answers. At this point I´m getting confused because one half is saying that you should use BinaryCrossentropy and use a Dense-layer at the end with the dimension (None, 1) and others say to use SparseCategoricalCrossentropy and a Dense-layer with dim (None, 2). Also I know that SparseCategoricalCrossentropy is meant to be used for a classification task with more than 2 categories. But most of the time the second solution is recommended, eventhough there is a specific BinaryCrossentropy class to have a loss-function for binary classification. But why are there these two solutions and what is the difference? Or does it not matter in the end anyway?

Related

Solving Imbalance Classification on Video Transcript dataset

I am currently working on a problem that requires segmenting a video lecture transcript based on the topics present within the video. My dataset consists of sentence wise labels where 1 indicates the beginning of a new segment(ie. topic) and 0 indicates the same segment. Thus the problem can be framed as a Binary Classification problem where the model takes a sentence as input and makes a binary prediction on it . However, due of the very nature of the problem, the dataset is highly imbalanced (90% 0s and 10% 1s). As a consequence, while training the model, I have noticed that my model becomes biased and starts making all 0s predictions.
I have tried resolving this issue through using class_weights in model.fit(). However, this hasnt been of much help. If I increase the penalty on 1s class, my model starts predicting all 1s. If I lower the value, the model again starts predicting all 0s. Does someone have any ideas as to how I should resolve this issue?
There are other oversampling and undesampling techniques(eg: SMOTE), however I dont think they are suitable in my use case since that would disrupt the continuity in the video transcript.
PS: I am sharing a screenshot of my model's architecture for reference.
Basically, the model takes BERT tokenized input sentences and encodes them using the Universal Sentence Encoder. This encoding is then passed to a classification layer which finally returns a tensor of shape [BATCH_SIZE, 1]. I am using BinaryCrossentropy as a loss function.
Model Architecture
Fitting the model using model.fit()

Function CNN model written in Keras 2.2.4 not learning in TensorFlow or Keras 2.4

I am dealing with an object detection problem and using a model which is actually functioning (its results have been published on a paper and I have the original code). Originally, the code was written with Keras 2.2.4 without importing TensorFlow and trained and tested on the same dataset that I am using at the moment. However, when I try to run the same model with TensorFlow 2.x it just won't learn a thing.
I have tried importing everything from TensorFlow 2.4, but I have the same problem if I import everything (layers, models, optimizers...) from Keras 2.4. And I have tried to do so on two different devices, both using a GPU. Namely, what is happening is that the loss function decreases ridiculously fast, but the accuracy won't increase a bit (or, if it does, it gets stuck around 10% or smth). Also, every now and then this happens from an epoch to the next one:
Loss undergoes HUGE jumps between consecutive epochs, and all this without any changes in accuracy
I have tried to train the network on another dataset (had to change the last layers in order to match the required dimensions) and the model seemed to be learning in a normal way, i.e. the accuracy actually increases and the loss doesn't reach 0.0x in one epoch.
I can't post the script, but the model is an Encoder-Decoder network: consecutive Convolutions with increasing number of filters reduce the dimensions of the image, and a specular path of Transposed Convolutions restores the original dimensions. So basically the network only contains:
Conv2D
Conv2DTranspose
BatchNormalization
Activation("relu")
Activation("sigmoid")
concatenate
6 is used to put together outputs from parallel paths or distant layers; 3 and 4 are used after every Conv or ConvTranspose; 5 is only used as final activation function, i.e. as output layer.
I think the problem is pretty generic and I am honestly surprised that I couldn't find a single question about it. What could be happening here? The problem must have something to do with TF/Keras versions, but I can't find any documentation about it and I have been trying to change so many things but nothing changes. It's crazy because if I didn't know that the model works I would try to rewrite it from scratch so I am afraid that this problem may occurr with a new network and I won't be able to understand whether it's the libraries or the model itself.
Thank you in advance! :)
EDIT
Code snippets:
Convolutional block:
encoder1 = Conv2D(filters=first_layer_channels, kernel_size=2, strides=2)(input)
encoder1 = BatchNormalization()(encoder1)
encoder1 = Activation('relu')(encoder1)
Decoder
decoder1 = Conv2DTranspose(filters=first_layer_channels, kernel_size=2, strides=2)(encoder4)
decoder1 = BatchNormalization()(decoder1)
decoder1 = Activation('relu')(decoder1)
Final layers:
final = Conv2D(filters=total, kernel_size=1)(decoder4)
final = BatchNormalization()(final)
Last_Conv = Activation('sigmoid')(final)
The task is human pose estimation: the network (which, I recall, works on this specific task with Keras 2.2.4) has to predict twenty binary maps containing the positions of specific keypoints.

Binary classification of every time series step based on past and future values

I'm currently facing a Machine Learning problem and I've reached a point where I need some help to proceed.
I have various time series of positional (x, y, z) data tracked by sensors. I've developed some more features. For example, I rasterized the whole 3D space and calculated a cell_x, cell_y and cell_z for every time step. The time series itself have variable lengths.
My goal is to build a model which classifies every time step with the labels 0 or 1 (binary classification based on past and future values). Therefore I have a lot of training time series where the labels are already set.
One thing which could be very problematic is that there are very few 1's labels in the data (for example only 3 of 800 samples are labeled with 1).
It would be great if someone can help me in the right direction because there are too many possible problems:
Wrong hyperparameters
Incorrect model
Too few 1's labels, but I think that's not a big problem because I only need the model to suggests the right time steps. So I would only use the peaks of the output.
Bad or too less training data
Bad features
I appreciate any help and tips.
Your model seems very strange. Why only use 2 units in lstm layer? Also your problem is a binary classification. In this case you should choose only one neuron in your output layer (try to insert one additional dense layer between and lstm layer and try dropout layers between them).
Binary crossentropy does not make much sense with 2 output neurons, if you don't have a multi label problem. But if you're switching to one output neuron it's the right one. You also need sigmoid then as activation function.
As last advice: Try class weights.
http://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html
This can make a huge difference, if you're label are unbalanced.
You can create the model using tensorflow BasicLSTMCell, the shape of your data fits for BasicLSTMCell in TensorFlow you can find Documentation for BasicLSTMCell here and for creating the model this Documentation contain code that will help to build BasicLstmCell model . Hope this will help you, Cheers.

CNN Loss stuck at 2.302 (ln(10))

I am trying to model the Neural Net for solving CIFAR-10 dataset, but there is this very odd problem I am facing, I have tried over 6 different CNN architecture and with many different CNN hyperparameters and fully connected #neurons values, but all seem to fail with loss of 2.302 and corresponding accuracy of 0.0625, why does this happen, what property of CNN or neural net makes this, I also tried dropout, l2_norm, different kernel sizes, different padding in CNN and Max Pool. I don't understand why the loss gets stuck over such an odd number?
I am implementing this using tensorflow, and I have tried softmax layer + cross_entropy_loss, and without_softmax_layer + sparse_cross_entropy_loss. Is it the plateau the neural net loss function is stuck at?
This seems like you accidentally applied a non-linearity/activation function to the last layer of your network. Keep in mind that the cross entropy works upon values within a range between 0 and 1. As you "force" your output to this range automatically by applying the softmax function just before computing the cross entropy, you should just "apply" a linear activation function (just don't add any).
By the way, the value of 2.302 is not random by any chance. It is rather the result of the softmax loss being -ln(0.1) when you assume that all 10 classes (CIFAR-10) initially got the same expected diffuse probability of 0.1. Check out the explanation by Andrej Karpathy:
http://cs231n.github.io/neural-networks-3/

Output layer for multi-label, multi-class classification?

I'm looking for a way to achieve multiple classifications for an input. The number of outputs is specified, and the class sets may or may not be the same for the outputs. The sample belongs to one class of each class set.
My question is, what should the target data and the output layer look like? What activation, loss and training functions could be used, and how should the layer be connected to the hidden layer? I'm not necessarily looking for an optimal solution, just a working one.
My current guess on what could work, is to make the target data be multiple concatenated one-hot vectors and the output layer have as many softmax units as the number of vectors. I don't know how the layers would be connected with that solution and how the net would figure out the sizes of class sets. I think a label powerset would not work for my needs.
I think the matlab patternnet function can create a net that does that, but I don't know how the resulting net works. Code for TensorFlow or Keras would be very welcome.
Maybe it's not a good time to response to the question, but I am working on the multi-label classification and just found an solution.
As for Keras, there's a example:
target label: [1, 0, 0, 1, 0]
output layer: Dense(5, activation='sigmoid')
loss: 'binary_crossentropy'
That will work well if dataset is big enough.