While setting the class_weights parameter in keras' model fit function as such:
model.fit(X_train, y_train, class_weight={0: 2.217857142857143, 1: 0.6455301455301455})
Im not sure which way to set the class_weight weights to the right class:
Is it class_weight={0: 2.217857142857143, 1: 0.6455301455301455} or class_weight={0: 0.6455301455301455, 1: 2.217857142857143 } ?
If you haveclass_weight={0: 2.217857142857143, 1: 0.6455301455301455} This means
Treat every instance of class 1 as 0.6455301 instances of class 0" means that in your loss function you assign a lower value to these instances. Hence, the loss becomes a weighted average, where the weight of each sample is specified by class_weight and its corresponding class.
Now it's your choice (based on your problem) in what order you want to specify the weights.
Have a look at my github project to get more information about class imbalance.
Related
The Keras documentation states:
Using sample weighting and class weighting
With the default settings the weight of a sample is decided by its frequency in the dataset. There are two methods to weight the data, independent of sample frequency:
-Class weights
-Sample weights
I have a significantly imbalanced dataset. I was looking at how to adjust for this, and came across a few answers here dealing with that, such as here and here. My plan was thus to create a dictionary object of the relative frequencies and pass them onto model.fit()'s class_weight parameter.
However, now that I'm looking up the documentation, it seems as though class imbalance is already dealt with? So I don't necessarily have to manage for the different class counts after all?
For the record, here are the class counts:
0: 25,811, 1: 2,444, 2: 5,293, 3: 874, 4: 709.
And here is the dictionary I was going to pass onto (pseudocode):
class_weight = {0: 1.0,
1: len(/0/)/len(/1/),
2: len(/0/)/len(/2/),
3: len(/0/)/len(/3/),
4: len(/0/)/len(/4/)}
In the Model() class documentation, the model.fit() has the following signature:
fit(
x=None,
y=None,
batch_size=None,
epochs=1,
verbose='auto',
callbacks=None,
validation_split=0.0,
validation_data=None,
shuffle=True,
class_weight=None,
sample_weight=None,
initial_epoch=0,
steps_per_epoch=None,
validation_steps=None,
validation_batch_size=None,
validation_freq=1,
max_queue_size=10,
workers=1,
use_multiprocessing=False
)
You can clearly notice that the class_weight parameter is None, by default.
The documentation also mentions that this is an optional parameter:
Optional dictionary mapping class indices (integers) to a weight
(float) value, used for weighting the loss function (during training
only). This can be useful to tell the model to "pay more attention" to
samples from an under-represented class.
Example (a mistake for class_1 is penalized 10 times more than a mistake for class_0):
class_weight = {0:1,
1:10}
PS: The statements are indeed a little bit misleading, in the sense that one can infer that if you have 2 classes with 90 and 10 samples respectively, the "weight" of the class is its number, but in fact what the explanation intends to convey is a synonym to saying "the model will not prioritize class 2 with 10 points over class 1 with 90 points, it's the overrepresentation of the latter (basic frequency) which counts more".
In other words, what it tells you is that the basic cross-entropy loss function (be it binary or multiclass) will favor the overrepresented class in absence of specific constraints/parameters from the developer.
This is indeed a correct affirmation, hence the necessity of tackling the imbalance via this class_weighting scheme in this situation.
I am solving a regression problem, and I've set aside a cv data set on which I evaluate my models.
I can easily evaluate my NN network as TensorFlow evaluate() method gives me the sum of all squared errors.
However, xgb provides me with a function - score() that returns me a number - 0.7
Firstly, how should I interpret this number?
Secondly, how can I make xgb return a measure of the model that I can interpret?
Firstly, how should I interpret this number?
From the official doc, this number represents the coefficient of determination. It is the proportion of variance of your dependent variable (y) explained by the independent variable (x). Thus, the closer it is to 1, the better your regression line fits the data and the better your model is.
Secondly, how can I make xgb return a measure of the model that I can interpret?
You can use the predict method from the model and then calculate any measure you want. For example, if you want the sum of squared errors as Tensorflow does :
import xgboost as xgb
model = xgb.XGBRegressor()
model.fit(x_train, y_train)
predictions = model.predict(x_test)
ssr = ((predictions - y_test)**2).sum()
I've noticed that the KL part of the loss is added to the list self._losses of the Layer class when self.add_loss is called from the call method of the DenseVariational (i.e. during the forward pass).
But how is this list self._losses (or the method losses of the same Layer class) treated during training? Where is it called from during training? For example, are they summed or average before adding them to the final loss? I would like to SEE the ACTUAL CODE.
I would like to know how exactly these losses are combined with the loss that you specify in the fit method. Can you provide me with the code that combines them? Note that I am interested in the Keras that is shipped with TensorFlow (because that's the one I am using).
Actually, the part where the total loss is computed is in compile method of Model class, specifically in this line:
# Compute total loss.
# Used to keep track of the total loss value (stateless).
# eg., total_loss = loss_weight_1 * output_1_loss_fn(...) +
# loss_weight_2 * output_2_loss_fn(...) +
# layer losses.
self.total_loss = self._prepare_total_loss(masks)
The _prepare_total_loss method adds the regularization and layer losses to the total loss (i.e. so all the losses are summed together) and then averages them over the batch axis in these lines:
# Add regularization penalties and other layer-specific losses.
for loss_tensor in self.losses:
total_loss += loss_tensor
return K.mean(total_loss)
Actually, self.losses is not the attribute of the Model class; rather, it's the attribute of the parent class, i.e. Network, which returns all the layer-specific losses as a list. Further, to resolve any confusion, total_loss at above code is a single tensor which is eqaul to the summation of all the losses in the model (i.e. loss function values, and layer-specific losses). Note that loss functions by definition must return a single loss value per each input sample (not the whole batch). Therefore, K.mean(total_loss) would average all these values over the batch axis to one final loss value which should be minimized by optimizer.
As for the tf.keras this is more or less the same as native keras; however, the structures and flow of things is a bit different which are explained below.
First, in compile method of Model class a loss container is created which holds and computes value of loss functions:
self.compiled_loss = compile_utils.LossesContainer(
loss, loss_weights, output_names=self.output_names)
Next, in train_step method of Model class this container is called to compute the loss value of a batch:
loss = self.compiled_loss(
y, y_pred, sample_weight, regularization_losses=self.losses)
As you can see above self.losses is passed to this container. The self.losses, as in native Keras implementation, contains all the layer-specific loss values with the only difference that in tf.keras it's implemented in Layer class (instead of Network class as in native Keras). Note that Model is a subclass of Network which itself is a subclass of Layer. Now, let's see how regularization_losses would be treated in the __call__ method of LossesContainer (these lines):
if (loss_obj.reduction == losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE or
loss_obj.reduction == losses_utils.ReductionV2.AUTO):
loss_value = losses_utils.scale_loss_for_distribution(loss_value)
loss_values.append(loss_value)
loss_metric_values.append(loss_metric_value)
if regularization_losses:
regularization_losses = losses_utils.cast_losses_to_common_dtype(
regularization_losses)
reg_loss = math_ops.add_n(regularization_losses)
loss_metric_values.append(reg_loss)
loss_values.append(losses_utils.scale_loss_for_distribution(reg_loss))
if loss_values:
loss_metric_values = losses_utils.cast_losses_to_common_dtype(
loss_metric_values)
total_loss_metric_value = math_ops.add_n(loss_metric_values)
self._loss_metric.update_state(
total_loss_metric_value, sample_weight=batch_dim)
loss_values = losses_utils.cast_losses_to_common_dtype(loss_values)
total_loss = math_ops.add_n(loss_values)
return total_loss
As you can see, regularization_losses will be added to the total_loss which would hold the summation of layer-specific losses and sum of average of all the loss functions over the batch axis (therefore, it would be a single value).
Let's say I have a model with one input and two outputs. And I want the output of the third layer of my model to be the y_true in my cost function for my second output.
I've tried this:
model.fit(x, [y, model.layers[3].output], ...)
But got the error:
'Tensor' object has no attribute 'ndim'
Which I believe is referring to the second y_true I gave the fit method.
Is it possible to do something like this in Keras? If so, how?
I managed to this by changing only the cost function, like:
def custom_euclidean_distance_loss(layer_output):
from keras import backend as K
def wrap(y_true, y_pred):
return K.mean(K.square(y_pred - layer_output))
return wrap
And since I do not use any previously known y_true I just fed a dummy one to fit. Note that the printed metrics from Keras won't be correct this way but the model will train with no problem.
If you do know of a better way (like actually feeding the layer output to fit) please let me know
In order to differentiate LSTMs, I wish to give a name to the BasicLSTMCell variable in my code. But it reported the following error:
num_units=self.config.num_lstm_units, state_is_tuple=True, name="some_basic_lstm")
TypeError: __init__() got an unexpected keyword argument 'name'
And I found in the library of my tensorflow installation. Int the file rnn_cell_impl.py:
class BasicLSTMCell(RNNCell):
"""Basic LSTM recurrent network cell.
The implementation is based on: http://arxiv.org/abs/1409.2329.
We add forget_bias (default: 1) to the biases of the forget gate in order to
reduce the scale of forgetting in the beginning of the training.
It does not allow cell clipping, a projection layer, and does not
use peep-hole connections: it is the basic baseline.
For advanced models, please use the full #{tf.nn.rnn_cell.LSTMCell}
that follows.
"""
def __init__(self, num_units, forget_bias=1.0,
state_is_tuple=True, activation=None, reuse=None):
"""Initialize the basic LSTM cell.
Args:
num_units: int, The number of units in the LSTM cell.
forget_bias: float, The bias added to forget gates (see above).
Must set to `0.0` manually when restoring from CudnnLSTM-trained
checkpoints.
state_is_tuple: If True, accepted and returned states are 2-tuples of
the `c_state` and `m_state`. If False, they are concatenated
along the column axis. The latter behavior will soon be deprecated.
activation: Activation function of the inner states. Default: `tanh`.
reuse: (optional) Python boolean describing whether to reuse variables
in an existing scope. If not `True`, and the existing scope already has
the given variables, an error is raised.
Is it a bug in my version of tensorflow? How can I give it a "name"?
I think #aswinids provided the best answer here in comments, but let me explain why it is should not be considered a bug. An LSTM cell is comprised of at least 4 variables (there are a few others used for control flow and such). There are 4 sub-network operations that occur in an LSTM. The diagram below from Colah's blog illustrates the internals of an LSTM cell (http://colah.github.io/posts/2015-08-Understanding-LSTMs/):
Each of the yellow boxes has a set of weights assigned to it and is effectively a single layer neural network operation (piped together in an interesting way, defined by the LSTM architecture).
A good approach to naming these would then be tf.variable_scope('some_name') such that all 4 of the variables defined in the LSTM have a common base naming structure such as:
lstm_cell/f_t
lstm_cell/i_t
lstm_cell/C_t
lstm_cell/o_t
I suspect that previously they just did this and hard coded lstm_cell or whatever name they used as the prefix for all the variables under the LSMT cell. In the later versions as #ashwinids points out, there is a name variable and I suspect that just replaced lstm_cell I used in the example here.