Why batch_normalization in tensorflow does not give expected results? - tensorflow

I would like to see the output of batch_normalization layer in a small example, but apparently I am doing something wrong so I get the same output as the input.
import tensorflow as tf
import keras.backend as K
K.set_image_data_format('channels_last')
X = tf.placeholder(tf.float32, shape=(None, 2, 2, 3)) # samples are 2X2 images with 3 channels
outp = tf.layers.batch_normalization(inputs=X, axis=3)
x = np.random.rand(4, 2, 2, 3) # sample set: 4 images
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
K.set_session(sess)
a = sess.run(outp, feed_dict={X:x, K.learning_phase(): 0})
print(a-x) # print the difference between input and normalized output
The input and output of the above code are almost identical. Can anyone point out the problem to me?

Remember that batch_normalization behaves differently at train and test time. Here, you have never "trained" your batch normalization, so the moving average it has learned is random but close to 0, and the moving variance factor close to 1, so the output is almost the same as the input. If you use K.learning_phase(): 1 you'll already see some differences (because it will normalize using the batch's average and standard deviation); if you first learn on a lot of examples and then test on some other ones you'll also see the normalization occuring, because the learnt mean and standard deviation will not be 0 and 1.
To see better the effects of batch norm, I'd also suggest you to multiply your input by a big number (say 100), so that you have a clear difference between unnormalized and normalized vectors, that will help you test what's going on.
EDIT: In your code as is, it seems that the update of the moving mean and moving variance is never done. You need to make sure the update ops are run, as indicated in batch_normalization's doc. The following lines should make it work:
outp = tf.layers.batch_normalization(inputs=X, axis=3, training=is_training, center=False, scale=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
outp = tf.identity(outp)
Below is my full working code (I got rid of Keras because I don't know it well, but you should be able to re-add it).
import tensorflow as tf
import numpy as np
X = tf.placeholder(tf.float32, shape=(None, 2, 2, 3)) # samples are 2X2 images with 3 channels
is_training = tf.placeholder(tf.bool, shape=()) # samples are 2X2 images with 3 channels
outp = tf.layers.batch_normalization(inputs=X, axis=3, training=is_training, center=False, scale=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
outp = tf.identity(outp)
x = np.random.rand(4, 2, 2, 3) * 100 # sample set: 4 images
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
initial = sess.run(outp, feed_dict={X:x, is_training: False})
for i in range(10000):
a = sess.run(outp, feed_dict={X:x, is_training: True})
if (i % 1000 == 0):
print("Step %i: " %i, a-x) # print the difference between input and normalized output
final = sess.run(outp, feed_dict={X: x, is_training: False})
print("initial: ", initial)
print("final: ", final)
assert not np.array_equal(initial, final)

Related

Tensorflow: Replace one op with another(maybe even 2 ops)

My goal is to build a script to change an operation into another one using TF's graph editor. So far I tried making a script that just changes the input kernel weights of a Conv2D, but to no avail, as the interface is pretty confusing.
with tf.Session() as sess:
model_filename = sys.argv[1]
with gfile.FastGFile(model_filename, 'r') as f:
graph_def = graph_pb2.GraphDef()
text_format.Merge(f.read(), graph_def)
importer.import_graph_def(graph_def)
#my_sgv = ge.sgv("Conv2D", graph=tf.get_default_graph())
#print my_sgv
convs = find_conv2d_ops(tf.get_default_graph())
print convs
my_sgv = ge.sgv(convs)
print my_sgv
conv_tensor = tf.get_default_graph().get_tensor_by_name(convs[0].name + ':0')
conv_weights_input = tf.get_default_graph().get_tensor_by_name(convs[0].inputs[1].name)
weights_new = tf.Variable(tf.truncated_normal([1, 1, 1, 8], stddev=0.03),
name='Wnew')
ge.graph_replace(conv_tensor, {conv_weights_input: weights_new})
The error is "input needs to be a Tensor: ". Can someone please provide some insights?
Since you are dealing with a tf.Variable you don't need to use graph editor. tf.assign will be sufficient.
You can use it like the following:
assign_op = tf.assign(conv_weights_input, weights_new)
with tf.Session() as sess:
sess.run(assign_op)
If you are looking to sub out operations and not weights. Consider the following example (modified from this example):
import tensorflow as tf
import tensorflow.contrib.graph_editor as ge
def build():
a_pl = tf.placeholder(dtype=tf.float32, name="a")
b_pl = tf.placeholder(dtype=tf.float32, name="b")
c = tf.add(a_pl, b_pl, name="c")
build() #or load graph from disc
a = tf.constant(1.0, shape=[2, 3], name="a_const")
b = tf.constant(2.0, shape=[2, 3], name="b_const")
a_pl = tf.get_default_graph().get_tensor_by_name("a:0")
b_pl = tf.get_default_graph().get_tensor_by_name("b:0")
c = tf.get_default_graph().get_tensor_by_name("c:0")
c_ = ge.graph_replace(c, {a_pl: a, b_pl: b})
with tf.Session() as sess:
#no need for placeholders
print(sess.run(c_))
#will give error since a_pl and b_pl have no value
print(sess.run(c))
The issue with your code is that you're dealing with wights, and not tensors. The crux of the above example is that the first argument is the target tensor (output tensor) that have the to be replaced tensors as dependencies. The second argument are the actual tensors you want to replace.
It's also worth noting that conv_weights_input is actually a tensor, where weights_new is a tf.Variable. I believe what you want is to replace weights_new with a new conv operation with random weight initialisation.

Sampling from tensor that depends on a random variable in tensorflow

Is it possible to get samples from a tensor that depends on a random variable in tensorflow? I need to get an approximate sample distribution to use in a loss function to be optimized. Specifically, in the example below, I want to be able to obtain samples of Y_output in order to be able to calculate the mean and variance of the output distribution and use these parameters in a loss function.
def sample_weight(mean, phi, seed=1):
P_epsilon = tf.distributions.Normal(loc=0., scale=1.0)
epsilon_s = P_epsilon.sample([1])
s = tf.multiply(epsilon_s, tf.log(1.0+tf.exp(phi)))
weight_sample = mean + s
return weight_sample
X = tf.placeholder(tf.float32, shape=[None, 1], name="X")
Y_labels = tf.placeholder(tf.float32, shape=[None, 1], name="Y_labels")
sw0 = sample_weight(u0,p0)
sw1 = sample_weight(u1,p1)
Y_output = sw0 + tf.multiply(sw1,X)
loss = tf.losses.mean_squared_error(labels=Y_labels, predictions=Y_output)
train_op = tf.train.AdamOptimizer(0.5e-1).minimize(loss)
init_op = tf.global_variables_initializer()
losses = []
predictions = []
Fx = lambda x: 0.5*x + 5.0
xrnge = 50
xs, ys = build_toy_data(funcx=Fx, stdev=2.0, num=xrnge)
with tf.Session() as sess:
sess.run(init_op)
iterations=1000
for i in range(iterations):
stat = sess.run(loss, feed_dict={X: xs, Y_labels: ys})
Not sure if this answers your question, but: when you have a Tensor downstream from a sampling Op (e.g., the Op created by your call to P_epsilon.sample([1]), anytime you call sess.run on the downstream Tensor, the sample op will be re-run, and produce a new random value. Example:
import tensorflow as tf
from tensorflow_probability import distributions as tfd
n = tfd.Normal(0., 1.)
s = n.sample()
y = s**2
sess = tf.Session() # Don't actually do this -- use context manager
print(sess.run(y))
# ==> 0.13539088
print(sess.run(y))
# ==> 0.15465781
print(sess.run(y))
# ==> 4.7929106
If you want a bunch of samples of y, you could do
import tensorflow as tf
from tensorflow_probability import distributions as tfd
n = tfd.Normal(0., 1.)
s = n.sample(100)
y = s**2
sess = tf.Session() # Don't actually do this -- use context manager
print(sess.run(y))
# ==> vector of 100 squared random normal values
We also have some cool tools in tensorflow_probability to do the kind of thing you're driving at here. Namely the Bijector API and, somewhat simpler, the trainable_distributions API.
(Another minor point: I'd suggest using tf.nn.softplus, or at a minimum tf.log1p(tf.exp(x)) instead of tf.log(1.0 + tf.exp(x)). The latter has poor numerical properties due to floating point imprecision, which the former are optimized for).
Hope this is some help!

TensorFlow: How to embed float sequences to fixed size vectors?

I am looking methods to embed variable length sequences with float values to fixed size vectors. The input formats as following:
[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]-> ... -> [f1,f2,f3,f4]
[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]->[f1,f2,f3,f4]-> ... -> [f1,f2,f3,f4]
...
[f1,f2,f3,f4]-> ... -> ->[f1,f2,f3,f4]
Each line is a variable length sequnece, with max length 60. Each unit in one sequece is a tuple of 4 float values. I have already paded zeros to fill all sequences to the same length.
The following architecture seems solve my problem if I use the output as the same as input, I need the thought vector in the center as the embedding for the sequences.
In tensorflow, I have found tow candidate methods tf.contrib.legacy_seq2seq.basic_rnn_seq2seq and tf.contrib.legacy_seq2seq.embedding_rnn_seq2seq.
However, these tow methos seems to be used to solve NLP problem, and the input must be discrete value for words.
So, is there another functions to solve my problems?
All you need is only an RNN, not the seq2seq model, since seq2seq goes with an additional decoder which is unecessary in your case.
An example code:
import numpy as np
import tensorflow as tf
from tensorflow.contrib import rnn
input_size = 4
max_length = 60
hidden_size=64
output_size = 4
x = tf.placeholder(tf.float32, shape=[None, max_length, input_size], name='x')
seqlen = tf.placeholder(tf.int64, shape=[None], name='seqlen')
lstm_cell = rnn.BasicLSTMCell(hidden_size, forget_bias=1.0)
outputs, states = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=x, sequence_length=seqlen, dtype=tf.float32)
encoded_states = states[-1]
W = tf.get_variable(
name='W',
shape=[hidden_size, output_size],
dtype=tf.float32,
initializer=tf.random_normal_initializer())
b = tf.get_variable(
name='b',
shape=[output_size],
dtype=tf.float32,
initializer=tf.random_normal_initializer())
z = tf.matmul(encoded_states, W) + b
results = tf.sigmoid(z)
###########################
## cost computing and training components goes here
# e.g.
# targets = tf.placeholder(tf.float32, shape=[None, input_size], name='targets')
# cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=targets, logits=z))
# optimizer = tf.train.AdamOptimizer(learning_rate=0.1).minimize(cost)
###############################
init = tf.global_variables_initializer()
batch_size = 4
data_in = np.zeros((batch_size, max_length, input_size), dtype='float32')
data_in[0, :4, :] = np.random.rand(4, input_size)
data_in[1, :6, :] = np.random.rand(6, input_size)
data_in[2, :20, :] = np.random.rand(20, input_size)
data_in[3, :, :] = np.random.rand(60, input_size)
data_len = np.asarray([4, 6, 20, 60], dtype='int64')
with tf.Session() as sess:
sess.run(init)
#########################
# training process goes here
#########################
res = sess.run(results,
feed_dict={
x: data_in,
seqlen: data_len})
print(res)
To encode sequence to a fixed length vector you typically use recurrent neural networks (RNNs) or convolutional neural networks (CNNs).
If you use a recurrent neural network you can use the output at the last time step (last element in your sequence). This corresponds to the thought vector in your question. Have a look at tf.dynamic_rnn. dynamic_rnn requires you to specify to type of RNN cell you want to use. tf.contrib.rnn.LSTMCell and tf.contrib.rnn.GRUCell are most common.
If you want to use CNNs you need to use 1 dimensional convolutions. To build CNNs you need tf.layers.conv1d and tf.layers.max_pooling1d
I have found a solution to my problem, using the following architecture,
,
The LSTMs layer below encode the series x1,x2,...,xn. The last output, the green one, is duplicated to the same count as the input for the decoding LSTM layers above. The tensorflow code is as following
series_input = tf.placeholder(tf.float32, [None, conf.max_series, conf.series_feature_num])
print("Encode input Shape", series_input.get_shape())
# encoding layer
encode_cell = tf.contrib.rnn.MultiRNNCell(
[tf.contrib.rnn.BasicLSTMCell(conf.rnn_hidden_num, reuse=False) for _ in range(conf.rnn_layer_num)]
)
encode_output, _ = tf.nn.dynamic_rnn(encode_cell, series_input, dtype=tf.float32, scope='encode')
print("Encode output Shape", encode_output.get_shape())
# last output
encode_output = tf.transpose(encode_output, [1, 0, 2])
last = tf.gather(encode_output, int(encode_output.get_shape()[0]) - 1)
# duplite the last output of the encoding layer
decoder_input = tf.stack([last for _ in range(conf.max_series)], axis=1)
print("Decoder input shape", decoder_input.get_shape())
# decoding layer
decode_cell = tf.contrib.rnn.MultiRNNCell(
[tf.contrib.rnn.BasicLSTMCell(conf.series_feature_num, reuse=False) for _ in range(conf.rnn_layer_num)]
)
decode_output, _ = tf.nn.dynamic_rnn(decode_cell, decoder_input, dtype=tf.float32, scope='decode')
print("Decode output", decode_output.get_shape())
# Loss Function
loss = tf.losses.mean_squared_error(labels=series_input, predictions=decode_output)
print("Loss", loss)

Loss not converging in Polynomial regression in Tensorflow

import numpy as np
import tensorflow as tf
#input data:
x_input=np.linspace(0,10,1000)
y_input=x_input+np.power(x_input,2)
#model parameters
W = tf.Variable(tf.random_normal([2,1]), name='weight')
#bias
b = tf.Variable(tf.random_normal([1]), name='bias')
#placeholders
#X=tf.placeholder(tf.float32,shape=(None,2))
X=tf.placeholder(tf.float32,shape=[None,2])
Y=tf.placeholder(tf.float32)
x_modified=np.zeros([1000,2])
x_modified[:,0]=x_input
x_modified[:,1]=np.power(x_input,2)
#model
#x_new=tf.constant([x_input,np.power(x_input,2)])
Y_pred=tf.add(tf.matmul(X,W),b)
#algortihm
loss = tf.reduce_mean(tf.square(Y_pred -Y ))
#training algorithm
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
#initializing the variables
init = tf.initialize_all_variables()
#starting the session session
sess = tf.Session()
sess.run(init)
epoch=100
for step in xrange(epoch):
# temp=x_input.reshape((1000,1))
#y_input=temp
_, c=sess.run([optimizer, loss], feed_dict={X: x_modified, Y: y_input})
if step%50==0 :
print c
print "Model paramters:"
print sess.run(W)
print "bias:%f" %sess.run(b)
I'm trying to implement Polynomial regression(quadratic) in Tensorflow. The loss isn't converging. Could anyone please help me out with this. The similar logic is working for linear regression though!
First there is a problem in your shapes, for Y_pred and Y:
Y has unknown shape, and is fed with an array of shape (1000,)
Y_pred has shape (1000, 1)
Y - Y_pred will then have shape (1000, 1000)
This small code will prove my point:
a = tf.zeros([1000]) # shape (1000,)
b = tf.zeros([1000, 1]) # shape (1000, 1)
print (a-b).get_shape() # prints (1000, 1000)
You should use consistent types:
y_input = y_input.reshape((1000, 1))
Y = tf.placeholder(tf.float32, shape=[None, 1])
Anyway, the loss is exploding because you have very high values (input between 0 and 100, you should normalize it) and thus very high loss (around 2000 at the beginning of training).
The gradient is very high and the parameters explode, and the loss gets to infinite.
The quickest fix is to lower your learning rate (1e-5 converges for me, albeit very slowly at the end). You can make it higher after the loss converges to around 1.

Minimal RNN example in tensorflow

Trying to implement a minimal toy RNN example in tensorflow.
The goal is to learn a mapping from the input data to the target data, similar to this wonderful concise example in theanets.
Update: We're getting there. The only part remaining is to make it converge (and less convoluted). Could someone help to turn the following into running code or provide a simple example?
import tensorflow as tf
from tensorflow.python.ops import rnn_cell
init_scale = 0.1
num_steps = 7
num_units = 7
input_data = [1, 2, 3, 4, 5, 6, 7]
target = [2, 3, 4, 5, 6, 7, 7]
#target = [1,1,1,1,1,1,1] #converges, but not what we want
batch_size = 1
with tf.Graph().as_default(), tf.Session() as session:
# Placeholder for the inputs and target of the net
# inputs = tf.placeholder(tf.int32, [batch_size, num_steps])
input1 = tf.placeholder(tf.float32, [batch_size, 1])
inputs = [input1 for _ in range(num_steps)]
outputs = tf.placeholder(tf.float32, [batch_size, num_steps])
gru = rnn_cell.GRUCell(num_units)
initial_state = state = tf.zeros([batch_size, num_units])
loss = tf.constant(0.0)
# setup model: unroll
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
step_ = inputs[time_step]
output, state = gru(step_, state)
loss += tf.reduce_sum(abs(output - target)) # all norms work equally well? NO!
final_state = state
optimizer = tf.train.AdamOptimizer(0.1) # CONVERGEs sooo much better
train = optimizer.minimize(loss) # let the optimizer train
numpy_state = initial_state.eval()
session.run(tf.initialize_all_variables())
for epoch in range(10): # now
for i in range(7): # feed fake 2D matrix of 1 byte at a time ;)
feed_dict = {initial_state: numpy_state, input1: [[input_data[i]]]} # no
numpy_state, current_loss,_ = session.run([final_state, loss,train], feed_dict=feed_dict)
print(current_loss) # hopefully going down, always stuck at 189, why!?
I think there are a few problems with your code, but the idea is right.
The main issue is that you're using a single tensor for inputs and outputs, as in:
inputs = tf.placeholder(tf.int32, [batch_size, num_steps]).
In TensorFlow the RNN functions take a list of tensors (because num_steps can vary in some models). So you should construct inputs like this:
inputs = [tf.placeholder(tf.int32, [batch_size, 1]) for _ in xrange(num_steps)]
Then you need to take care of the fact that your inputs are int32s, but a RNN cell works on float vectors - that's what embedding_lookup is for.
And finally you'll need to adapt your feed to put in the input list.
I think the ptb tutorial is a reasonable place to look, but if you want an even more minimal example of an out-of-the-box RNN you can take a look at some of the rnn unit tests, e.g., here.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/kernel_tests/rnn_test.py#L164