Regularization losses Tensorflow - TRAINABLE_VARIABLES to Tensor Array - tensorflow

I would like to add both L1 and L2 Regularization to my loss function. When I define the weight variable I choose the regularization to use, but it seems I can only choose one.
regLosses=tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
loss=tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(y_conv,y_))+regLosses
when I try to get the losses manually by
weights=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
l1Loss=tf.reduce_sum(tf.abs(weights))
l2Loss=tf.nn.l2loss(weights)
loss=tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(y_conv,y_))+.1*l1Loss+.001*l2Loss
It doesn't work - I think because TRAINABLE_VARIABLES returns the variables not the parameters. How do i fix this? Is my manual calculation of l1 loss correct?
Thanks in advance

So I think I discovered the answer. Comments and review welcome.
When I create the weights I use:
W=tf.get_variable(name=name,shape=shape,regularizer=tf.contrib.layers.l1_regularizer(1.0))
Noting that the l1 regularization is simply the sum of the absolute values of the weights and that l2 is the squared of the weights, then I can do the following.
regLosses=tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
l1=tf.reduce_sum(tf.abs(regLosses))
l2=tf.reduce_sum(tf.square(reglosses))

Related

BigQueryML: L1 regularization amount

I am creating a logistic regression model using bigquery and I would like to use L1 regularization (Lasso). When I built model suing the sklearn, I just specified that I want to use the L1 regularization. On BQML, however, I need to specify a float number, according to this documentation. And I am totally confused what this amount should be. Can anyone explain it?
Use L1 regularization to encourage many of the uninformative coefficients in your model to be exactly 0.
L1 regularization—penalizing the absolute value of all the weights—turns out to be quite efficient for wide models.
Try out Regularization Rate (lambda) between 0.1 and 0.3, and see if your model improve.

Which kind of regularization use L2 regularization or dropout in multiRNNCell?

I have been working on a project related with sequence to sequence autoencoder for time series forecasting. So, I have used tf.contrib.rnn.MultiRNNCell in encoder and decoder. I am confused in which strategy used in order to regularize my seq2seq model. Should I use L2 regularization in the loss or using DropOutWrapper (tf.contrib.rnn.DropoutWrapper) in the multiRNNCell? Or can I use both strategies ... L2 for weigths and bias (projection layer) and DropOutWrapper between cells in the multiRNNCell?
Thanks in advance :)
You can use both dropout and L2 regularization at the same time as is commonly done. They are quite different types of regularization. However, I would note that recent literature has suggested that batch normalization has replaced the need for dropout as noted in the original paper on batch normalization:
https://arxiv.org/abs/1502.03167
From the abstract: "It also acts as a regularizer, in some cases eliminating the need for Dropout."
L2 regularization is typically applied when batchnorm is in use. There's nothing stopping you from applying all 3 forms of regularization, the statement above only indicates that you might not see an improvement by applying dropout when batchnorm is already in use.
There are generally optimal values for the amount of L2 regularization to apply and the dropout keep probability. These are hyperparameters you tune by trial and error or a hyperparameter search algorithm.

Convolutional Neural Network Loss

While Calculating the Loss Function. Can i manually Calculate Loss like
Loss = tf.reduce_mean(tf.square(np.array(Prediction) - np.array(Y)))
and then Optimize this Loss using Adam Optimizer
No.
Tensorflow loss functions typically accept tensors as input and also outputs a tensor. So np.array() wouldn't work.
In case of CNNs, you'd generally come across loss functions like cross-entropy, softmax corss-entropy, sigmoid cross-entropy etc. These are already in-built in tf.losses module. So you can use them directly.
The loss function that you're trying to apply looks like a Mean-squared loss. This is built in tf.losses as well. tf.losses.mean_squared_error.
Having said that, I've also implemented a few loss functions like cross-entropy using hand-coded formula such as: -tf.reduce_mean(tf.reduce_sum(targets * logProb)). This works equally fine, as long as the inputs targets and logProb are computed as tensors and not as numpy arrays.
No, actually you need to use tensor Variable for Loss, not use numpy.array(np.array(Prediction)).
Since tensorflow will eval these tensors in tensorflow engine.

TensorFlow softmax_crossentropy_with logits: are "labels" also trained (if differentiable)?

The softmax cross-entropy with logits loss function is used to reduce the difference between the logits and labels provided to the function. Typically, the labels are fixed for supervised learning and the logits are adapted. But what happens when the labels come from a differentiable source, e.g., another network? Do both networks, i.e., the "logits network" and the "labels network" get trained by the subsequent optimizer, or does this loss function always treat the labels as fixed?
TLDR: Does tf.nn.softmax_cross_entropy_with_logits() also provide gradients for the labels (if they are differentiable), or are they always considered fixed?
Thanks!
You need to use tf.softmax_cross_entropy_with_logits_v2 to get gradients with respect to labels.
The gradient is calculated from loss provided to the optimizer, if the "labels" are coming from another trainable network, then yes, these will be modified, since they influence the loss. The correct way of using another networks outputs for your own is to define it as untrainable, or make a list of all variables you want to train and pass them to the optimizer explicitly.

Tensorflow LSTM Regularization

I was wondering how one can implement l1 or l2 regularization within an LSTM in TensorFlow? TF doesn't give you access to the internal weights of the LSTM, so I'm not certain how one can calculate the norms and add it to the loss. My loss function is just RMS for now.
The answers here don't seem to suffice.
The answers in the link you mentioned are the correct way to do it. Iterate through tf.trainable_variables and find the variables associated with your LSTM.
An alternative, more complicated and possibly more brittle approach is to re-enter the LSTM's variable_scope, set reuse_variables=True, and call get_variable(). But really, the original solution is faster and less brittle.
TL;DR; Save all the parameters in a list, and add their L^n norm to the objective function before making gradient for optimisation
1) In the function where you define the inference
net = [v for v in tf.trainable_variables()]
return *, net
2) Add the L^n norm in the cost and calculate the gradient from the cost
weight_reg = tf.add_n([0.001 * tf.nn.l2_loss(var) for var in net]) #L2
cost = Your original objective w/o regulariser + weight_reg
param_gradients = tf.gradients(cost, net)
optimiser = tf.train.AdamOptimizer(0.001).apply_gradients(zip(param_gradients, net))
3) Run the optimiser when you want via
_ = sess.run(optimiser, feed_dict={input_var: data})