Tensorflow: About the Variable value after Session run? - variables

In Tensor, I don't understand the value of Variable. Below is my code, I think after I do
sess.run()
The value of W should be calculated, however, After print it , I find it didn't change.
The code is an MNIST example code from TensorFlow website. Anyone can explain why W doesn't change?
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
mnist = input_data.read_data_sets("MNIST_dat/",one_hot=True)
x= tf.placeholder(tf.float32,[None,784])#need input x
W= tf.Variable(tf.zeros([784,10]))
b= tf.Variable(tf.zeros([10]))
y= tf.nn.softmax(tf.matmul(x,W)+b)
y_= tf.placeholder(tf.float32,[None,10])#need input y
cross_entropy = tf.reduce_mean(-tf.reduce_mean(y_*tf.log(y),reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs,batch_ys = mnist.train.next_batch(100)
sess.run(train_step,feed_dict={x:batch_xs,y_:batch_ys})
ww=W.eval(sess)
print(ww)

OK, when I run your code, the output I get looks like this
[[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
...,
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]]
But you have to realize, that W is 784 by a 100 elements, this display only shows you a few values at the start and end of that. And for most images in MNIST, the first and the last few pixels are not going to be significant (the actual important data is somewhere in the middle, where the actual digit is drawn, right?). But if I change the print statement to
print(ww.min(), ww.max())
I get this output
-0.330358 0.429428
Which means that some weights are being trained, as expected.

Related

What does Keras do with the initial values of cell & hidden states (RNN, LSTM) for inference?

Assuming training is finished: what values does Keras use for the 0th cell state and hidden states at inference (in LSTM and RNN layers)? I could think of at least three scenarios, and could not find any conclusive answer in the documentation:
(a) The initial states are learned and then used for all predictions
(b) or the initial states are always set at zero
(c) the initial states are always random (let's hope not...?)
If using LSTM(stateful=True), hidden states are initialized to zero, change with fit or predict, and are kept at whatever they are until .reset_states() is called. If LSTM(stateful=False), states are reset after fitting/predicting/etc each batch.
This can be verified from the .reset_states() source code, and by direct inspection; both for stateful=True below. For more info on how states are passed, see this answer.
Direct inspection:
batch_shape = (2, 10, 4)
model = make_model(batch_shape)
X = np.random.randn(*batch_shape)
y = np.random.randint(0, 2, (batch_shape[0], 1))
show_lstm_states("STATES INITIALIZED")
model.train_on_batch(X, y)
show_lstm_states("STATES AFTER TRAIN")
model.reset_states()
show_lstm_states("STATES AFTER RESET")
model.predict(X)
show_lstm_states("STATES AFTER PREDICT")
Output:
STATES INITIALIZED
[[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]]
STATES AFTER TRAIN
[[0.12061571 0.03639204 0.20810013 0.05309075]
[0.01832913 0.00062357 0.10566339 0.60108346]]
[[0.21241754 0.0773523 0.37392718 0.15590034]
[0.08496398 0.00112716 0.23814857 0.95995367]]
STATES AFTER RESET
[[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]]
STATES AFTER PREDICT
[[0.12162527 0.03720453 0.20628096 0.05421837]
[0.01849432 0.00064993 0.1045063 0.6097021 ]]
[[0.21398112 0.07894284 0.3709934 0.15928769]
[0.08605779 0.00117485 0.23606434 0.97212094]]
Functions / imports used:
import tensorflow as tf
import tensorflow.keras.backend as K
from tensorflow.keras.layers import Input, Dense, LSTM
from tensorflow.keras.models import Model
import numpy as np
def make_model(batch_shape):
ipt = Input(batch_shape=batch_shape)
x = LSTM(4, stateful=True, activation='relu')(ipt)
out = Dense(1, activation='sigmoid')(x)
model = Model(ipt, out)
model.compile('adam', 'binary_crossentropy')
return model
def show_lstm_states(txt=''):
print('\n' + txt)
states = model.layers[1].states
for state in states:
if tf.__version__[0] == '2':
print(state.numpy())
else:
print(K.get_value(state))
Inspect source code:
from inspect import getsource
print(getsource(model.layers[1].reset_states))
My understanding from this is that they are initialized to zero in most cases.

Feeding supplementary sequence to LSTM/dynamic_rnn in Tensorflow

I have a network that takes in a sequence of one-hot-encoded observations and feeds it into an LSTM in batches and sequences of pre-determined length.
I am trying to introduce a modification, whereby along with feeding the sequences of actual observations, I could also feed a 'supplementary' sequence, which would reflect something about each corresponding observation in the sequence. For example, if I feed a sequence of [1,2,3,4,5], I'd also like to feed another sequence of say [0,0,1,0,0] to indicate that '3' in the original sequence has a certain property, while others don't.
Following someone's suggestion, I am trying to concatenate the one-hot-encoded original sequence with this supplementary sequence (which I call the 'mode') but while this runs OK, all I seem to be achieving ultimately is adding an extra element to the one-hot vectors, which become not-so one-hot anymore)
I have reduced the experimental code to the following minimum:
BATCH = 2
SEQ = 3
VOCAB = 5
CELL_SIZE = 4
X = tf.placeholder(tf.int32, [BATCH, SEQ])
X_hot = tf.one_hot(X, VOCAB, 1.0, 0.0)
X_mode = tf.placeholder(tf.float32, [BATCH, SEQ])
X_mode_exp = tf.expand_dims(X_mode, axis=2)
X_tuple = tf.concat([X_hot, X_mode_exp], axis=2)
Hin = tf.placeholder(tf.float32, [BATCH, CELL_SIZE])
cell = rnn.GRUCell(CELL_SIZE)
Y, H = tf.nn.dynamic_rnn(cell, X_tuple, dtype=tf.float32)
with tf.Session() as sess:
gInit = tf.global_variables_initializer().run()
h = np.zeros([BATCH, CELL_SIZE])
x = np.array([[1,2,3],[3,2,1]])
x_mode = np.array([[0,1,0],[1,0,1]])
x_h_out, x_tuple_out, hout = sess.run([X_hot, X_tuple, H], feed_dict={X:x, X_mode: x_mode, Hin: h})
print('x_hot:\n{}\n'.format(x_h_out))
print('x_tuple:\n{}'.format(x_tuple_out))
Which produces the following output:
x_hot:
[[[ 0. 1. 0. 0. 0.]
[ 0. 0. 1. 0. 0.]
[ 0. 0. 0. 1. 0.]]
[[ 0. 0. 0. 1. 0.]
[ 0. 0. 1. 0. 0.]
[ 0. 1. 0. 0. 0.]]]
x_tuple:
[[[ 0. 1. 0. 0. 0. 0.]
[ 0. 0. 1. 0. 0. 1.]
[ 0. 0. 0. 1. 0. 0.]]
[[ 0. 0. 0. 1. 0. 1.]
[ 0. 0. 1. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 1.]]]
What could be a better alternative approach to achieving this? Once again, what I'm after is feeding a line of 'additional information' into the LSTM which would reflect some additional information about each element in the sequence.

Tensorflow initialize variable with numpy doesn't work

So I am trying to initialize my variable with some specific weights:
W = tf.Variable(np.eye(19) , name = 'Diag')
But if i now run this code:
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
print(W.eval())
I end up with a zeros matrix:
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
I do not understand what is going on since e.g.
W = tf.Variable([1,2,3] , name = 'Diag')
preserves the values. What should I do?
I guess the tf.initialize_all_variables() overwrites my values but without it I get the FailedPreconditionError that complains about unitialized variables.

How can I encode labels in TensorFlow?

I need to convert my string labels into vectors like [0, 0, ... , 1, ... 0].
As far as I could understand this is something that called one hot vector.
I have 10 classes, so 10 different string labels.
Could anyone please help with direct and inverse transformation?
I'm newbie in tensorflow so please be kind.
The forward direction is easy, since there's the tf.one_hot op:
import tensorflow as tf
original_indices = tf.constant([1, 5, 3])
depth = tf.constant(10)
one_hot_encoded = tf.one_hot(indices=original_indices, depth=depth)
with tf.Session():
print(one_hot_encoded.eval())
Outputs:
[[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
The inverse of this isn't too bad either, with tf.where to find the non-zero indices:
def decode_one_hot(batch_of_vectors):
"""Computes indices for the non-zero entries in batched one-hot vectors.
Args:
batch_of_vectors: A Tensor with length-N vectors, having shape [..., N].
Returns:
An integer Tensor with shape [...] indicating the index of the non-zero
value in each vector.
"""
nonzero_indices = tf.where(tf.not_equal(
batch_of_vectors, tf.zeros_like(batch_of_vectors)))
reshaped_nonzero_indices = tf.reshape(
nonzero_indices[:, -1], tf.shape(batch_of_vectors)[:-1])
return reshaped_nonzero_indices
with tf.Session():
print(decode_one_hot(one_hot_encoded).eval())
Prints:
[1 5 3]

tf.gradients applied to pooling wrong result?

I have a problem in tensorflow with tf.gradients applied to pooling:
[edit]: I was able to reproduce my expectation by changing the equation to:
gradpooltest, = tf.gradients((pooltest * pooltest)/2 , [x1])
Anyway, I am not sure why I have to do it this way and people answered below do not seem to understand my problem.
input x1:
[[ 0. 0. 0. 0. 0. 0.]
[ 0. 2. 2. 2. 0. 0.]
[ 0. -2. 0. 0. 2. 1.]
[ 0. 1. 0. 1. 2. 2.]
[ 0. 1. 1. 2. 0. 1.]
[ 0. -2. 2. 1. -1. 1.]]
pooling test forward:
[[ 2. 2. 0.]
[ 1. 1. 2.]
[ 1. 2. 1.]]
tf.gradients pool test backward:
[[ 0. 0. 0. 0. 1. 0.]
[ 0. 1. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 1. 0.]
[ 0. 1. 0. 1. 0. 0.]
[ 0. 1. 0. 1. 0. 1.]
[ 0. 0. 0. 0. 0. 0.]]
but I expect actually this result by tf.gradients pool test backward:
0 0 0 0 0 0
0 2 2 0 0 0
0 0 0 0 2 0
0 1 0 1 0 0
0 1 0 0 0 1
0 0 2 0 0 0
I don't understand the tf result for tf.gradients pool test backward. (Looks like tensorflow only returns the store matrix for the locations??). Any idea why tf does not return the actual upsampling result?
Here is my code:
import numpy as np
import tensorflow as tf
sess = tf.Session()
#init input-----------------------------------------------------------
init1=np.array([ [0,0,0,0,0,0],
[0,2,2,2,0,0],
[0,-2,0,0,2,1],
[0,1,0,1,2,2],
[0,1,1,2,0,1],
[0,-2,2,1,-1,1] ],dtype="float32")
init2 = init1.reshape(1,6,6,1)
x1 = tf.Variable(init2)
#init weight-----------------------------------------------------------
init3 = np.array( [[[[3, 5], [2, -1]]]], dtype="float32")
init4 = init3.reshape(2,2,1,1)
w1 = tf.Variable(init4)
#init model-----------------------------------------------------------
model = tf.initialize_all_variables()
sess.run(model)
#print values-----------------------------------------------------------
print('x1:')
#print sess.run(x6)
x1y = tf.reshape(x1, [6, 6])
print sess.run(x1y)
###################################
#ff: pooling
###################################
#needs 4D volumes as inputs:
pooltest = tf.nn.max_pool(x1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
print('pooltest:')
#print sess.run(pooltest)
pooltesty = tf.reshape(pooltest, [3, 3])
print sess.run(pooltesty)
###################################
#bw: pooling
###################################
#needs 4D volumes as inputs:
gradpooltest, = tf.gradients(pooltest , [x1])
print('gradpooltest:')
#print sess.run(gradpooltest)
gradpooltesty = tf.reshape(gradpooltest, [6, 6])
print sess.run(gradpooltesty)
sess.close()
You are computing gradients of maxpool operation and this is correct — they are 1 at maximums and 0 at other locations.
Please refer to the following page: http://cs231n.github.io/optimization-2/#patterns-in-backward-flow
Imagine your max-pooling operation with a kernel size of 2x2 implemented like:
max(x1, x2, x3, x4)
Where x1, ..., x4 are the location in the input image under the kernel.
In the forward pass, you extract the maximum value, for example:
max(x1, x2, x3, x4) = x2
This means that for these 4 variables, in the forward pass only the x2 variable will be passed trough the network.
In the backward pass, thus, you have only one variable to calculate the derivative and its derivative is 1.
Therefore the output you got is correct, what do you expect is not.