L1 regularization doesn't force the weight parameters to become zero

L1 regularization doesn't force the weight parameters to become zero - tensorflow

My objective is training an autoencoder through SGD. By using tensorflow 1.x, I added L1 regularization for my loss function like this:
........
........
beta = 10e-3
n_inputs = X_Train.shape[1]
n_outputs = n_inputs
X = tf.placeholder(tf.float32, shape=[None, n_inputs])
weights1 = tf.get_variable("weights1", shape=[n_inputs, n_hidden], dtype=tf.float32, initializer = tf.contrib.layers.variance_scaling_initializer())
weights2 = tf.get_variable("weights2", shape=[n_hidden, n_outputs], dtype=tf.float32, initializer = tf.contrib.layers.variance_scaling_initializer())
biases1 = tf.get_variable("biases1", shape=[n_hidden], initializer = tf.zeros_initializer())
biases2 = tf.get_variable("biases2", shape=[n_outputs], initializer = tf.zeros_initializer())
hidden = activation(tf.matmul(X, weights1) + biases1)
outputs = tf.matmul(hidden, weights2) + biases2
reconstruction_loss = tf.reduce_mean(tf.square(outputs - X))
reg_loss = beta * (tf.reduce_sum(tf.abs(weights1)) + tf.reduce_sum(tf.abs(weights2)))
loss = reconstruction_loss + reg_loss
training_op = tf.train.AdamOptimizer(learning_rate).optimizer.minimize(loss)
init = tf.global_variables_initializer()
.......
.......
After training, I counted the number of zero in weights1 matrix. I found that all weights1[i][j] ≠ 0. what's the problem?

L1 regularization make the network push useless weight toward 0, but it doesn't asssign 0 To them by brute force. They Will near 0 but not at 0.
see: Is the L1 regularization in Keras/Tensorflow *really* L1-regularization?
for detailed answers

Related

How can I use version 2.0 of Tensorflow in python 3.7.9 and use the displacement function?

I was following a tutorial by this guy: https://www.youtube.com/watch?v=PwAGxqrXSCs&list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v&index=47 , and now I have run into a Error.
The programm tells me : "AttributeError: module 'tensorflow' has no attribute 'placeholder'". I know that in version 2.0 and higher (of tensorflow) the placeholder function got removed, however I am unable to figure out another way of making my script work as I am just starting out with Deep Learning! Help would be greatly appreciated!
Here is my code (I am using PyCharm as an IDE, dont know if thats a source for the Error):
import tensorflow as tf
mnist = tf.keras.datasets.mnist
#Hidden Layers
n_nodes_hl1 = 500 # Can be different from others (they do no HAVE to be the same number)
n_nodes_hl2 = 500
n_nodes_hl3 = 500
n_classes = 10
batch_size = 100
#Input Data
#Height x width
x = tf.placeholder('float',[None, 784])
y = tf.placeholder('float')
def neural_network_model(data):
#(Input_data * weights) + biases
#Why need biases?
#If Input_Data = 0 and weights = (ANYTHING) and there was no bias, no neuron would ever fire!
#Because ofcourse (0 * (ANYTHING)) = 0
hidden_1_layer = {"weights":tf.Variable(tf.random.normal([784, n_nodes_hl1])),
"biases":tf.Variable(tf.random.normal(n_nodes_hl1))}
hidden_2_layer = {"weights":tf.Variable(tf.random.normal([n_nodes_hl1, n_nodes_hl2])),
"biases":tf.Variable(tf.random.normal(n_nodes_hl2))}
hidden_3_layer = {"weights":tf.Variable(tf.random.normal([n_nodes_hl2, n_nodes_hl3])),
"biases":tf.Variable(tf.random.normal(n_nodes_hl3))}
output_layer = {"weights":tf.Variable(tf.random.normal([n_nodes_hl3, n_classes])),
"biases":tf.Variable(tf.random.normal([n_classes]))}
# (Input_data * weights) + biases
#This is the Sum thingy l1(layer 1)
l1 = tf.add(tf.matmul(data * hidden_1_layer["weights"]), hidden_1_layer["biases"])
#Threshold function (will neuron fire?) [relu] = rectified linear
l1 = tf.nn.relu(l1)
#This is the Sum thingy for l2
l2 = tf.add(tf.matmul(l1 * hidden_2_layer["weights"]), hidden_2_layer["biases"])
l2 = tf.nn.relu(l2)
#This is the Sum thingy for l3
l3 = tf.add(tf.matmul(l2 * hidden_3_layer["weights"]), hidden_3_layer["biases"])
l3 = tf.nn.relu(l3)
#This is the Sum thingy
output = tf.matmul(l3 * output_layer["weights"]) + output_layer["biases"]
return output
def train_neural_network(x):
prediction = neural_network_model(x)
#Calculates the difference between prediction and known label
cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y))
#No we know what the cost is, LETS MINIMZE IT!
# learning_rate = 0.001
optimizer = tf.train.AdamOptimizer().minimize(cost)
#How many epochs do we want? (How often does it run) (Cycles feed forward + backprop)
hm_epochs = 100
with tf.compat.v1.Session() as sess:
sess.run(tf.global_variables_initializer())
#This trains:
for epoch in hm_epochs:
epoch_loss = 0
for _ in range(int(mnist.train.num_examples/batch_size)):
x, y = mnist.train.next_batch(batch_size)
_, c = sess.run([optimizer, cost], feed_dict = {x: x, y: y})
epoch_loss += c
print("Epoch", epoch, "Completed out of", hm_epochs, "loss:",epoch_loss)
#This checks if the machine does stuff right
#Checks if the maximum number is identical to the machines prediction
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct, "float"))
print("Accuracy:", accuracy.eval({x:mnist.test.images, y:mnist.test.labels}))
train_neural_network(x)

In Tenosrflow 2 you can use tf.compat.v1.placeholder(), this is not compatible with eager execution and tf.function.
If you want to explicitly set up your inputs, also see Keras functional API on how to use tf.keras.Input to replace tf.compat.v1.placeholder.
Example,
x = tf.keras.Input(shape=(32,))

Creating several weight tensors for each object in Multi-Object Tracking (MOT) using TensorFlow

I am using TensorFlow V1.10.0 and developing a Multi-Object Tracker based on MDNet. I need to assign a separate weight matrix for each detected object for the fully connected layers in order to get different embedding for each object during online training. I am using this tf.map_fn in order to generate a higher-order weight tensor (n_objects, flattened layer, hidden_units),
'''
def dense_fc4(n_objects):
initializer = lambda: tf.contrib.layers.xavier_initializer()(shape=(1024, 512))
return tf.Variable(initial_value=initializer, name='fc4/kernel',
shape=(n_objects.shape[0], 1024, 512))
W4 = tf.map_fn(dense_fc4, samples_flat)
b4 = tf.get_variable('fc4/bias', shape=512, initializer=tf.zeros_initializer())
fc4 = tf.add(tf.matmul(samples_flat, W4), b4)
fc4 = tf.nn.relu(fc4)
'''
However during execution when I run the session for W4 I get a weight matrix but all having the same values. Any help?
TIA

Here is a workaround, I was able to generate the multiple kernels outside the graph in a for loop and then giving it to the graph:
w6 = []
for n_obj in range(pos_data.shape[0]):
w6.append(tf.get_variable("fc6/kernel-" + str(n_obj), shape=(512, 2),
initializer=tf.contrib.layers.xavier_initializer()))
print("modeling fc6 branches...")
prob, train_op, accuracy, loss, pred, initialize_vars, y, fc6 = build_branches(fc5, w6)
def build_branches(fc5, w6):
y = tf.placeholder(tf.int64, [None, None])
b6 = tf.get_variable('fc6/bias', shape=2, initializer=tf.zeros_initializer())
fc6 = tf.add(tf.matmul(fc5, w6), b6)
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
logits=fc6))
train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="fc6")
with tf.variable_scope("", reuse=tf.AUTO_REUSE):
optimizer = tf.train.AdamOptimizer(learning_rate=0.001, name='adam')
train_op = optimizer.minimize(loss, var_list=train_vars)
initialize_vars = train_vars
initialize_vars += [optimizer.get_slot(var, name)
for name in optimizer.get_slot_names()
for var in train_vars]
if isinstance(optimizer, tf.train.AdamOptimizer):
initialize_vars += optimizer._get_beta_accumulators()
prob = tf.nn.softmax(fc6)
pred = tf.argmax(prob, 2)
correct_pred = tf.equal(pred, y)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
return prob, train_op, accuracy, loss, pred, initialize_vars, y, fc6

Tensorflow: Can't overfit training data with batch size > 1

I coded a small RNN network with Tensorflow to return the total energy consumption given some parameters. There seem to be a problem in my code. It can't overfit the training data when I use a batch size > 1 (even with only 4 samples!). In the code below, the loss value reaches 0 when I set BatchSize to 1. However, by setting BatchSize to 2, the network fails to overfit and the loss value goes toward 12.500000 and gets stuck there forever.
I suspect this has something to do with LSTM states. I get the same problem if I don't update the state with each iteration. Or maybe the cost function? A help is appreciated. Thanks.
import tensorflow as tf
import numpy as np
import os
from utils import loadData
Epochs = 10000
LearningRate = 0.0001
MaxGradNorm = 5
SeqLen = 1
NChannels = 28
NClasses = 1
NLayers = 2
NUnits = 256
BatchSize = 1
NumSamples = 4
#################################################################
trainingFile = "./training.dat"
X_values, Y_values = loadData(trainingFile, SeqLen, NumSamples)
X = tf.placeholder(tf.float32, [BatchSize, SeqLen, NChannels], name='inputs')
Y = tf.placeholder(tf.float32, [BatchSize, SeqLen, NClasses], name='labels')
keep_prob = tf.placeholder(tf.float32, name='keep')
initializer = tf.contrib.layers.xavier_initializer()
Xin = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
lstm_layers = []
for i in range(NLayers):
lstm_layer = tf.nn.rnn_cell.LSTMCell(num_units=NUnits, initializer=initializer, use_peepholes=True, state_is_tuple=True)
dropout_layer = tf.contrib.rnn.DropoutWrapper(lstm_layer, output_keep_prob=keep_prob)
#[LSTM ---> DROPOUT] ---> [LSTM ---> DROPOUT] ---> etc...
lstm_layers.append(dropout_layer)
rnn = tf.nn.rnn_cell.MultiRNNCell(lstm_layers, state_is_tuple=True)
initial_state = rnn.zero_state(BatchSize, tf.float32)
outputs, final_state = tf.nn.static_rnn(rnn, Xin, dtype=tf.float32, initial_state=initial_state)
outputs = tf.transpose(outputs, [1,0,2])
outputs = tf.reshape(outputs, [-1, NUnits])
weight = tf.Variable(tf.truncated_normal([NUnits, NClasses]))
bias = tf.Variable(tf.constant(0.1, shape=[NClasses]))
prediction = tf.matmul(outputs, weight) + bias
prediction = tf.reshape(prediction, [BatchSize, SeqLen, NClasses])
cost = tf.reduce_sum(tf.pow(tf.subtract(prediction, Y), 2)) / (2 * BatchSize)
tvars = tf.trainable_variables()
grad, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), MaxGradNorm)
optimizer = tf.train.AdamOptimizer(learning_rate = LearningRate)
train_step = optimizer.apply_gradients(zip(grad, tvars))
sess = tf.Session()
sess.run(tf.global_variables_initializer())
iteration = 1
for e in range(0, Epochs):
train_loss = []
state = sess.run(initial_state)
for i in xrange(0, len(X_values), BatchSize):
x = X_values[i:i + BatchSize]
y = Y_values[i:i + BatchSize]
y = np.expand_dims(y, 2)
feed = {X : x, Y : y, keep_prob : 1.0, initial_state : state}
_ , loss, state, pred = sess.run([train_step, cost, final_state, prediction], feed_dict = feed)
train_loss.append(loss)
iteration += 1
print("Epoch: {}/{}".format(e, Epochs), "Iteration: {:d}".format(iteration), "Train average rmse: {:6f}".format(np.mean(train_loss)))

Normalizing the input data solved the problem.

The loss function decreases, but accuracy on train set does not change in tensorflow

I am trying to implement a simple gender classifier using deep convolutional neural networks using tensorflow. I have found this model and implemented it.
def create_model_v2(data):
cl1_desc = {'weights':weight_variable([7,7,3,96]), 'biases':bias_variable([96])}
cl2_desc = {'weights':weight_variable([5,5,96,256]), 'biases':bias_variable([256])}
cl3_desc = {'weights':weight_variable([3,3,256,384]), 'biases':bias_variable([384])}
fc1_desc = {'weights':weight_variable([240000, 128]), 'biases':bias_variable([128])}
fc2_desc = {'weights':weight_variable([128,128]), 'biases':bias_variable([128])}
fc3_desc = {'weights':weight_variable([128,2]), 'biases':bias_variable([2])}
cl1 = conv2d(data,cl1_desc['weights'] + cl1_desc['biases'])
cl1 = tf.nn.relu(cl1)
pl1 = max_pool_nxn(cl1,3,[1,2,2,1])
lrm1 = tf.nn.local_response_normalization(pl1)
cl2 = conv2d(lrm1, cl2_desc['weights'] + cl2_desc['biases'])
cl2 = tf.nn.relu(cl2)
pl2 = max_pool_nxn(cl2,3,[1,2,2,1])
lrm2 = tf.nn.local_response_normalization(pl2)
cl3 = conv2d(lrm2, cl3_desc['weights'] + cl3_desc['biases'])
cl3 = tf.nn.relu(cl3)
pl3 = max_pool_nxn(cl3,3,[1,2,2,1])
fl = tf.contrib.layers.flatten(cl3)
fc1 = tf.add(tf.matmul(fl, fc1_desc['weights']), fc1_desc['biases'])
drp1 = tf.nn.dropout(fc1,0.5)
fc2 = tf.add(tf.matmul(drp1, fc2_desc['weights']), fc2_desc['biases'])
drp2 = tf.nn.dropout(fc2,0.5)
fc3 = tf.add(tf.matmul(drp2, fc3_desc['weights']), fc3_desc['biases'])
return fc3
What I need to note at this point is that I have also done all the pre-processing steps described in the paper, however my images are resized to 100x100x3 instead of the 277x277x3.
I have defined the the logits to be [0,1] for females and [1,0] for males
x = tf.placeholder('float',[None,100,100,3])
y = tf.placeholder('float',[None,2])
And have defined the training procedure as follows:
def train(x, hm_epochs, LR):
#prediction = create_model_v2(x)
prediction = create_model_v2(x)
cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits = prediction, labels = y) )
optimizer = tf.train.AdamOptimizer(learning_rate=LR).minimize(cost)
batch_size = 50
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
print("hello")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(hm_epochs):
epoch_loss = 0
i = 0
while i < (len(x_train)):
start = i
end = i + batch_size
batch_x = x_train[start:end]
batch_y = y_train[start:end]
whatever, vigen = sess.run([optimizer, cost], feed_dict = {x:batch_x, y:batch_y})
epoch_loss += vigen
i+=batch_size
print('Epoch', epoch ,'loss:',epoch_loss/len(x_train))
if (epoch+1) % 2 == 0:
j = 0
acc = []
while j < len(x_test):
acc += [accuracy.eval(feed_dict = {x:x_test[j:j + 10], y:y_test[j:j+10]})]
j+= 10
print ('accuracy after', epoch + 1, 'epochs on test set: ', sum(acc)/len(acc))
j = 0
acc = []
while j < len(x_train):
acc += [accuracy.eval(feed_dict = {x:x_train[j:j + 10], y:y_train[j:j+10]})]
j+= 10
print ('accuracy after', epoch, ' epochs on train set:', sum(acc)/len(acc))
Half of the code above is just for outputting test and train accuracies every 2 epochs.
Anyhow the loss starts high at first epoch
('Epoch', 0, 'loss:', 148.87030902462453)
('Epoch', 1, 'loss:', 0.01549744715988636)
('accuracy after', 2, 'epochs on test set: ', 0.33052011888510396)
('accuracy after', 1, ' epochs on train set:', 0.49607501227222384)
('Epoch', 2, 'loss:', 0.015493246909976005)
What am I missing?
and continues like this keeping the accuracy at 0.5 for train set.
EDIT: the functions weights variable, conv2d and max_pool_nn are
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def avg_pool_nxn(x, n, strides):
return tf.nn.avg_pool(x, ksize=[1,n,n,1], strides = strides,padding = 'SAME')
def max_pool_nxn(x, n, strides):
return tf.nn.max_pool(x, ksize=[1,n,n,1], strides = strides, padding = 'SAME')
def conv2d(x, W,stride = [1,1,1,1]):
return tf.nn.conv2d(x, W, strides = stride, padding = 'SAME')
EDIT 2 - Problem solved
The Problem was fascinatingly related to parameter initialization. Changing the weight initialization from Normal Distribution to Xavier initialization worked wonders and accuracy ended up at about 86%. If anyone is interested here is the original paper http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf, if anyone knows and cares to explain exactly why Xavier works well with convnets and images feel free to post an answer.

Proper initialisation of weights is often crucial to getting deeper neural nets to train.
Xavier initialisation is derived with the goal of ensuring that the variance of the output at each neuron is expected to be 1.0 (see here). This generally relies on the additional assumption that your inputs are standardised to have mean 0 and variance of 1, so it is important to also ensure this.
For ReLU units, I believe He initialisation is actually considered best practice. This requires initialising from a zero-mean Gaussian distribution with standard deviation:
Where n is the number of input units. See the Lasagne docs for best practices for some other activation functions.
On a side note, batch normalisation can often reduce the dependence of model performance on weights initialisation.

LSTM model error is percent of one output class

I'm having a rough time trying to figure out what's wrong with my LSTM model. I have 11 inputs, and 2 output classes (one-hot encoded) and very quickly, like within 1 batch or so, the error just goes to the % of one of the output classes and stays there.
I tried printing weights and biases, but they seem to all be full of NaN.
If i decrease the learning rate, or mess around with layers/units, I can get it to arrive at the % of one class error slowly, but it seems to always get to that point.
Here's the code:
num_units = 30
num_layers = 50
dropout_rate = 0.80
learning_rate=0.0001
batch_size = 180
epoch = 1
input_classes = len(train_input[0])
output_classes = len(train_output[0])
data = tf.placeholder(tf.float32, [None, input_classes, 1]) #Number of examples, number of input, dimension of each input
target = tf.placeholder(tf.float32, [None, output_classes]) #one-hot encoded: [1,0] = bad, [0,1] = good
dropout = tf.placeholder(tf.float32)
cell = tf.contrib.rnn.LSTMCell(num_units, state_is_tuple=True)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=dropout)
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
#Input shape [batch_size, max_time, depth], output shape: [batch_size, max_time, cell.output_size]
val, _ = tf.nn.dynamic_rnn(cell, data, dtype=tf.float32)
val = tf.transpose(val, [1, 0, 2]) #reshapes it to [sequence_size, batch_size, depth]
#get last entry as it includes previous results
last = tf.gather(val, int(val.get_shape()[0]) - 1)
weight = tf.get_variable("W", shape=[num_units, output_classes], initializer=tf.contrib.layers.xavier_initializer())
bias = tf.get_variable("B", shape=[output_classes], initializer=tf.contrib.layers.xavier_initializer())
logits = tf.matmul(last, weight) + bias
prediction = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=target)
prediction = tf.clip_by_value(prediction, 1e-10,100.0)
cost = tf.reduce_mean(prediction)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
minimize = optimizer.minimize(cost)
mistakes = tf.not_equal(tf.argmax(target, 1), tf.argmax(logits, 1))
error = tf.reduce_mean(tf.cast(mistakes, tf.float32))
init_op = tf.global_variables_initializer()
saver = tf.train.Saver()
sess = tf.Session()
sess.run(init_op)
no_of_batches = int((len(train_input)) / batch_size)
for i in range(epoch):
ptr = 0
for j in range(no_of_batches):
inp, out = train_input[ptr:ptr+batch_size], train_output[ptr:ptr+batch_size]
ptr+=batch_size
sess.run(minimize,{data: inp, target: out, dropout: dropout_rate })
sess.close()

Since you have one hot encoding use sparse_softmax_cross_entropy_with_logits instead of tf.nn.softmax_cross_entropy_with_logits.
Refer to this stackoverflow answer to understand the difference of two functions.
1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

L1 regularization doesn't force the weight parameters to become zero - tensorflow

L1 regularization make the network push useless weight toward 0, but it doesn't asssign 0 To them by brute force. They Will near 0 but not at 0. see: Is the L1 regularization in Keras/Tensorflow really L1-regularization? for detailed answers

Related

How can I use version 2.0 of Tensorflow in python 3.7.9 and use the displacement function?

Creating several weight tensors for each object in Multi-Object Tracking (MOT) using TensorFlow

Tensorflow: Can't overfit training data with batch size > 1

The loss function decreases, but accuracy on train set does not change in tensorflow

LSTM model error is percent of one output class

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

L1 regularization doesn't force the weight parameters to become zero - tensorflow

L1 regularization make the network push useless weight toward 0, but it doesn't asssign 0 To them by brute force. They Will near 0 but not at 0. see: Is the L1 regularization in Keras/Tensorflow *really* L1-regularization? for detailed answers

Related

How can I use version 2.0 of Tensorflow in python 3.7.9 and use the displacement function?

Creating several weight tensors for each object in Multi-Object Tracking (MOT) using TensorFlow

Tensorflow: Can't overfit training data with batch size > 1

The loss function decreases, but accuracy on train set does not change in tensorflow

LSTM model error is percent of one output class

Categories

Resources

L1 regularization make the network push useless weight toward 0, but it doesn't asssign 0 To them by brute force. They Will near 0 but not at 0. see: Is the L1 regularization in Keras/Tensorflow really L1-regularization? for detailed answers