Does keras automatically weight classes? - tensorflow

The Keras documentation states:
Using sample weighting and class weighting
With the default settings the weight of a sample is decided by its frequency in the dataset. There are two methods to weight the data, independent of sample frequency:
-Class weights
-Sample weights
I have a significantly imbalanced dataset. I was looking at how to adjust for this, and came across a few answers here dealing with that, such as here and here. My plan was thus to create a dictionary object of the relative frequencies and pass them onto model.fit()'s class_weight parameter.
However, now that I'm looking up the documentation, it seems as though class imbalance is already dealt with? So I don't necessarily have to manage for the different class counts after all?
For the record, here are the class counts:
0: 25,811, 1: 2,444, 2: 5,293, 3: 874, 4: 709.
And here is the dictionary I was going to pass onto (pseudocode):
class_weight = {0: 1.0,
1: len(/0/)/len(/1/),
2: len(/0/)/len(/2/),
3: len(/0/)/len(/3/),
4: len(/0/)/len(/4/)}

In the Model() class documentation, the model.fit() has the following signature:
fit(
x=None,
y=None,
batch_size=None,
epochs=1,
verbose='auto',
callbacks=None,
validation_split=0.0,
validation_data=None,
shuffle=True,
class_weight=None,
sample_weight=None,
initial_epoch=0,
steps_per_epoch=None,
validation_steps=None,
validation_batch_size=None,
validation_freq=1,
max_queue_size=10,
workers=1,
use_multiprocessing=False
)
You can clearly notice that the class_weight parameter is None, by default.
The documentation also mentions that this is an optional parameter:
Optional dictionary mapping class indices (integers) to a weight
(float) value, used for weighting the loss function (during training
only). This can be useful to tell the model to "pay more attention" to
samples from an under-represented class.
Example (a mistake for class_1 is penalized 10 times more than a mistake for class_0):
class_weight = {0:1,
1:10}
PS: The statements are indeed a little bit misleading, in the sense that one can infer that if you have 2 classes with 90 and 10 samples respectively, the "weight" of the class is its number, but in fact what the explanation intends to convey is a synonym to saying "the model will not prioritize class 2 with 10 points over class 1 with 90 points, it's the overrepresentation of the latter (basic frequency) which counts more".
In other words, what it tells you is that the basic cross-entropy loss function (be it binary or multiclass) will favor the overrepresented class in absence of specific constraints/parameters from the developer.
This is indeed a correct affirmation, hence the necessity of tackling the imbalance via this class_weighting scheme in this situation.

Related

Is there linter for model(inputs) of PyTorch like model.predict(inputs) of TensorFlow?

My goal is to do object detection. However, YOLOv7 and (hack to create bounding box with feature map) tutorial is using PyTorch.
The problem is: model(inputs) do not have typings.
The code L148-L150
out = model(inputs)
probs, class_preds = torch.max(out[0], dim=-1)
feature_maps = out[1].to("cpu")
The forced me to debug the helper.py file to understand what [0] and out[1] are. Currently, I assume that out[0] as the softmax probability and out[1] as the feature maps.
I think the answer is no, in general it is non-trivial to automatically infer the semantic meaning of the outputs of a neural network; this is a product of the semantic meaning of the inputs and the model structure itself. You could reference the Yolo model architecture provided in model.py (though as an aside you should not link to external code but rather provide relevant code in your question itself) and investigate the structure of the outputs, then reference the structure of the labeled inputs (as the model by definition is learning to replicate the structure of the labels.)
That being said in your case the output is quite obviously per-class probabilities and class indexes as shown in line 149:
probs, class_preds = torch.max(out[0], dim=-1)
as the outputs from torch.max per pytorch documentation are (maximum value, maximum index).

How to set `class_weights` in the right way around?

While setting the class_weights parameter in keras' model fit function as such:
model.fit(X_train, y_train, class_weight={0: 2.217857142857143, 1: 0.6455301455301455})
Im not sure which way to set the class_weight weights to the right class:
Is it class_weight={0: 2.217857142857143, 1: 0.6455301455301455} or class_weight={0: 0.6455301455301455, 1: 2.217857142857143 } ?
If you haveclass_weight={0: 2.217857142857143, 1: 0.6455301455301455} This means
Treat every instance of class 1 as 0.6455301 instances of class 0" means that in your loss function you assign a lower value to these instances. Hence, the loss becomes a weighted average, where the weight of each sample is specified by class_weight and its corresponding class.
Now it's your choice (based on your problem) in what order you want to specify the weights.
Have a look at my github project to get more information about class imbalance.

Tensorflow & Keras prediction threshold

What is the threshold value that is used by TF by default to classify an input image as being a certain class?
For example, say I have 3 classes 0, 1, 2, and the labels for images are one-hot encoded like so: [1, 0, 0], meaning this image has label of class 0.
Now when a model outputs a prediction after softmax like this one: [0.39, 0.56, 0.05] does TF use 0.5 as the threshold so the class it predicts is class 1?
What if all the predictions were below 0.5 like [0.33, 0.33, 0.33] what would TF say the result is?
And is there any way to specify a new threshold for example 0.7 and ensure TF says that a prediction is wrong if no class prediction is above that threshold?
Also would this logic carry over to the inference stage too where if the network is uncertain of the class then it will refuse to give a classification for the image?
when a model outputs a prediction after softmax like this one: [0.39, 0.56, 0.05] does TF use 0.5 as the threshold so the class it predicts is class 1?
No. There is not any threshold involved here. Tensorflow (and any other framework, for that matter) will just pick up the maximum one (argmax); the result here (class 1) would be the same even if the probabilistic output was [0.33, 0.34, 0.33].
You seem to erroneously believe that a probability value of 0.5 has some special significance in a 3-class classification problem; it has not: a probability value of 0.5 is "special" only in a binary classification setting (and a balanced one, for that matter). In an n-class setting, the respective "special" value is 1/n (here 0.33), and by definition, there will always be some entry in the probability vector greater than or equal to this value.
What if all the predictions were below 0.5 like [0.33, 0.33, 0.33] what would TF say the result is?
As already implied, there is nothing strange or unexpected with all probabilities being below 0.5 in an n-class problem with n>2.
Now, if all the probabilities happen to be equal, as in the example you show (although highly improbable in practice, the question is valid, at least in theory), ideally, such ties should be resolved randomly (i.e. pick a class in random); in practice, since usually this stage is handled by the argmax method of Numpy, the prediction will be the first class (i.e. class 0), which is not difficult to demonstrate:
import numpy as np
x = np.array([0.33, 0.33, 0.33])
np.argmax(x)
# 0
due to how such cases are handled by Numpy - from the argmax docs:
In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.
To your next question:
is there any way to specify a new threshold for example 0.7 and ensure TF says that a prediction is wrong if no class prediction is above that threshold?
Not in Tensorflow (or any other framework) itself, but this is always something that can be done in a post-processing stage during inference: irrespectively of what is actually returned by your classifier, it is always possible to add some extra logic such that whenever the max probability value is less that a threshold, your system (i.e. your model plus the post-processing logic) returns something like "I don't know / I am not sure / I can't answer". But again, this is external to Tensorflow (or any other framework used) and the model itself, and it can be used only during inference and not during training (in any case, it doesn't make sense during training, because during training only predicted class probabilities are used, and not hard classes).
In fact, we had implemented such a post-processing module in a toy project some years ago, which was an online service to classify dog races from images: when the max probability returned by the model was less than a threshold (which was the case, say, when the model was presented with an image of a cat instead of a dog), the system was programmed to respond with the question "Are you sure this is a dog"?, instead of being forced to make a prediction among the predefined dog races...
the threshold is used in the case of binary classification or multilabel classification, in the case of multi class classification you use argmax, basically the class with the highest activation is your output class, all classes rarely equal each other, if the model is trained well there should be one dominant class

There is no "name" variable in the constructor of BasicLSTMCell

In order to differentiate LSTMs, I wish to give a name to the BasicLSTMCell variable in my code. But it reported the following error:
num_units=self.config.num_lstm_units, state_is_tuple=True, name="some_basic_lstm")
TypeError: __init__() got an unexpected keyword argument 'name'
And I found in the library of my tensorflow installation. Int the file rnn_cell_impl.py:
class BasicLSTMCell(RNNCell):
"""Basic LSTM recurrent network cell.
The implementation is based on: http://arxiv.org/abs/1409.2329.
We add forget_bias (default: 1) to the biases of the forget gate in order to
reduce the scale of forgetting in the beginning of the training.
It does not allow cell clipping, a projection layer, and does not
use peep-hole connections: it is the basic baseline.
For advanced models, please use the full #{tf.nn.rnn_cell.LSTMCell}
that follows.
"""
def __init__(self, num_units, forget_bias=1.0,
state_is_tuple=True, activation=None, reuse=None):
"""Initialize the basic LSTM cell.
Args:
num_units: int, The number of units in the LSTM cell.
forget_bias: float, The bias added to forget gates (see above).
Must set to `0.0` manually when restoring from CudnnLSTM-trained
checkpoints.
state_is_tuple: If True, accepted and returned states are 2-tuples of
the `c_state` and `m_state`. If False, they are concatenated
along the column axis. The latter behavior will soon be deprecated.
activation: Activation function of the inner states. Default: `tanh`.
reuse: (optional) Python boolean describing whether to reuse variables
in an existing scope. If not `True`, and the existing scope already has
the given variables, an error is raised.
Is it a bug in my version of tensorflow? How can I give it a "name"?
I think #aswinids provided the best answer here in comments, but let me explain why it is should not be considered a bug. An LSTM cell is comprised of at least 4 variables (there are a few others used for control flow and such). There are 4 sub-network operations that occur in an LSTM. The diagram below from Colah's blog illustrates the internals of an LSTM cell (http://colah.github.io/posts/2015-08-Understanding-LSTMs/):
Each of the yellow boxes has a set of weights assigned to it and is effectively a single layer neural network operation (piped together in an interesting way, defined by the LSTM architecture).
A good approach to naming these would then be tf.variable_scope('some_name') such that all 4 of the variables defined in the LSTM have a common base naming structure such as:
lstm_cell/f_t
lstm_cell/i_t
lstm_cell/C_t
lstm_cell/o_t
I suspect that previously they just did this and hard coded lstm_cell or whatever name they used as the prefix for all the variables under the LSMT cell. In the later versions as #ashwinids points out, there is a name variable and I suspect that just replaced lstm_cell I used in the example here.

Keras - class_weight vs sample_weights in the fit_generator

In Keras (using TensorFlow as a backend) I am building a model which is working with a huge dataset that is having highly imbalanced classes (labels). To be able to run the training process, I created a generator which feeds chunks of data to the fit_generator.
According to the documentation for the fit_generator, the output of the generator can either be the tuple (inputs, targets) or the tuple (inputs, targets, sample_weights). Having that in mind, here are a few questions:
My understanding is that
the class_weight regards the weights of all classes for the entire dataset whereas
the sample_weights regards the weights of all classes for each individual chunk
created by the generator. Is that correct? If not, can someone elaborate on the matter?
Is it necessary to give both the class_weight to the fit_generator and then the sample_weights as an output for each chunk? If yes, then why? If not then which one is better to give?
If I should give the sample_weights for each chunk, how do I map the weights if some of the classes are missing from a specific chunk? Let me give an example. In my overall dataset, I have 7 possible classes (labels). Because these classes are highly imbalanced, when I create smaller chunks of data as an output from the fit_generator, some of the classes are missing from the specific chunk. How should I create the sample_weights for these chunks?
My understanding is that the class_weight regards the weights of all
classes for the entire dataset whereas the sample_weights regards the
weights of all classes for each individual chunk created by the
generator. Is that correct? If not, can someone elaborate on the
matter?
class_weight affects the relative weight of each class in the calculation of the objective function. sample_weights, as the name suggests, allows further control of the relative weight of samples that belong to the same class.
Is it necessary to give both the class_weight to the fit_generator and
then the sample_weights as an output for each chunk? If yes, then why?
If not then which one is better to give?
It depends on your application. Class weights are useful when training on highly skewed data sets; for example, a classifier to detect fraudulent transactions. Sample weights are useful when you don't have equal confidence in the samples in your batch. A common example is performing regression on measurements with variable uncertainty.
If I should give the sample_weights for each chunk, how do I map the
weights if some of the classes are missing from a specific chunk? Let
me give an example. In my overall dataset, I have 7 possible classes
(labels). Because these classes are highly imbalanced, when I create
smaller chunks of data as an output from the fit_generator, some of
the classes are missing from the specific chunk. How should I create
the sample_weights for these chunks?
This is not an issue. sample_weights is defined on a per-sample basis and is independent from the class. For this reason, the documentation states that (inputs, targets, sample_weights) should be the same length.
The function _weighted_masked_objective in engine/training.py has an example of sample_weights are being applied.