use variational_recurrent in tf.contrib.rnn.DropoutWrapper - tensorflow

In the api of tf.contrib.rnn.DropoutWrapper, I am trying to set variational_recurrent=True, in which case, input_size is mandatory. As explained, input_size is TensorShape objects containing the depth(s) of the input tensors.
depth(s) is confusing, what is it please? Is it just the shape of the tensor as we can get by tf.shape()? Or the number of channels for the special case of images? But my input tensor is not an image.
And I don't understand why dtype is demanded when variational_recurrent=True.
Thanks!

Inpput_size for tf.TensorShape([200, None, 300]) is just 300
Play with this example.
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see TF issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="1"
import tensorflow as tf
import numpy as np
n_steps = 2
n_inputs = 3
n_neurons = 5
keep_prob = 0.5
learning_rate = 0.001
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
basic_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_neurons)
basic_cell_drop = tf.contrib.rnn.DropoutWrapper(
basic_cell,
input_keep_prob=keep_prob,
variational_recurrent=True,
dtype=tf.float32,
input_size=n_inputs)
output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell_drop,
X_seqs,
dtype=tf.float32)
outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
init = tf.global_variables_initializer()
X_batch = np.array([
# t = 0 t = 1
[[0, 1, 2], [9, 8, 7]], # instance 1
[[3, 4, 5], [0, 0, 0]], # instance 2
[[6, 7, 8], [6, 5, 4]], # instance 3
[[9, 0, 1], [3, 2, 1]], # instance 4
])
with tf.Session() as sess:
init.run()
outputs_val = outputs.eval(feed_dict={X: X_batch})
print(outputs_val)
See this for more details: https://github.com/tensorflow/tensorflow/issues/7927

Related

How do I structure 3D Input properly for Keras LSTM

I have a dataset which contains many snapshot observations in time and a 1 or 0 as a label for each observation. Lets say each observation contains 3 features. I am wanting to train an LSTM which will take a sequence of n observations and attempt to classify nth observation as a 1 or 0.
So if we have a dataset that looks like this:
# X = [[0, 1, 1], [1, 0, 0], [1, 1, 1], [1, 1, 0]]
# y = [1, 0, 1, 0]
# so X[0] = y[0], X[1] = y[1]
# . and I would like to input X[0] + X[1] to classify X[1] as y[1]
# . How would I need to structure this below?
X = [[0, 1, 1], [1, 0, 0], [1, 1, 1], [1, 1, 0]]
y = [1, 0, 1, 0]
def create_model():
model = Sequential()
# input_shape[0] is equal to 2 timesteps?
# input_shape[1] is equal to the 3 features per row?
model.add(LSTM(20, input_shape=(2, 3)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
m = create_model()
m.fit(X, y)
So I want X[0] and X[1] to be the input for one iteration of training and should be classified as y[1].
My question is this. How do I structure the model in order to take this input properly? I am very confused by input_shape, features, input_length, batches etc ...
The below code snippet might help clarify:
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
# Number of samples = 4, sequence length = 3, features = 2
X = np.array( [ [ [0, 1], [1, 0,], [1, 1] ],
[ [1, 1], [1, 1,], [1, 0] ],
[ [0, 1], [1, 0,], [0, 0] ],
[ [1, 1], [1, 1,], [1, 1] ]] )
y = np.array([[1], [0], [1], [0]])
print(X)
print(X.shape)
print(y.shape)
model = Sequential()
model.add(LSTM(20, input_shape=(3, 2)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(X, y)
Also, on the Keras documentation page: https://keras.io/getting-started/sequential-model-guide/ look at the example for "Stacked LSTM for sequence classification" near the bottom. It might help.
In general using Keras, the batch dimension/sample dimension is not specified in layers - it is automatically inferred from the input data.
I hope this helps.
You have the input shape correct.
I would reshape the input data to be (batch_size, timesteps, features)
m = create_model()
X.reshape((batch_size, 2, 3))
m.fit(X, y)
Common batch sizes are 4, 8 , 16, 32 but for small dataset the impact of the batch size is less important.
And when you want to predict use batch_size = 1

use tf.shape() on tensorflow placeholder

Let's looke at this simple made up tf operation:
data = np.random.rand(1,2,3)
x = tf.placeholder(tf.float32, shape=[None, None, None], name='x_pl')
out = x
print ('shape:', tf.shape(out))
sess = tf.Session()
sess.run(out, feed_dict={x: data})
and the print is:
shape: Tensor("Shape_13:0", shape=(3,), dtype=int32)
I read that you should use tf.shape() to get the 'dynamic' shape of the tensor, which seems to be what I need, but why the shape is shape=(3,)?
why it is not (1,2,3)? as it should be determined when the session is run?
suppose this is part of a neural network where I need to know the last dimension of x, for example, to pass x into a Dense layer, for which the last dimension of x needed to be known.
how do it do it then?
It is because tf.shape() is an op and you have to run it within a session.
data = np.random.rand(1,2,3)
x = tf.placeholder(tf.float32, shape=[None, None, None], name='x_pl')
out = x
print ('shape:', tf.shape(out))
z = tf.shape(out)
sess = tf.Session()
out_, z_ =sess.run([out,z], feed_dict={x: data})
print(f"shape of out: {z_}")
will return
shape: Tensor("Shape:0", shape=(3,), dtype=int32)
shape of out: [1 2 3]
Even if you look at the example from the docs (https://www.tensorflow.org/api_docs/python/tf/shape):
t = tf.constant([[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]])
tf.shape(t)
If you run it just like that it will return something like
<tf.Tensor 'Shape_4:0' shape=(3,) dtype=int32>
but if you run it within a session then you will get the expected result
t = tf.constant([[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]])
print(sess.run(tf.shape(t)))
[2 2 3]

Element-wise assignment in tensorflow

In numpy, it could be easily done as
>>> img
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]], dtype=int32)
>>> img[img>5] = [1,2,3,4]
>>> img
array([[1, 2, 3],
[4, 5, 1],
[2, 3, 4]], dtype=int32)
However, there seems not exist similar operation in tensorflow.
You can never assign a value to a tensor in tensorflow as the change in tensor value is not traceable by backpropagation, but you can still get another tensor from origin tensor, here is a solution
import tensorflow as tf
tf.enable_eager_execution()
img = tf.constant(list(range(1, 10)), shape=[3, 3])
replace_mask = img > 5
keep_mask = tf.logical_not(replace_mask)
keep = tf.boolean_mask(img, keep_mask)
keep_index = tf.where(keep_mask)
replace_index = tf.where(replace_mask)
replace = tf.random_uniform((tf.shape(replace_index)[0],), 0, 10, tf.int32)
updates = tf.concat([keep, replace], axis=0)
indices = tf.concat([keep_index, replace_index], axis=0)
result = tf.scatter_nd(tf.cast(indices, tf.int32), updates, shape=tf.shape(img))
Actually there is a way to achieve this. Very similar to #Jie.Zhou's answer, you can replace tf.constant with tf.Variable, then replace tf.scatter_nd with tf.scatter_nd_update

Manipulate indices in a SparseTensor

I would like to shift the indices of a SparseTensor in Tensorflow. Is there a way to alter a SparseTensor's indices?
import numpy as np
import tensorflow as tf
# build graph
x = tf.sparse_placeholder(tf.float32, [None, 10], name='x')
sparse_indices = x.indices
# This line does not work:
sparse_indices[:, 1] = sparse_indices[:, 1] + 1
shifted_x = tf.SparseTensor(indices=sparse_indices,
values=x.values,
dense_shape=[2,20])
# start session
session = tf.InteractiveSession()
indices = np.array([[1, 0], [2, 1]], dtype=np.int64)
values = np.array([1, 1], dtype=np.float32)
shape = np.array([2, 10], dtype=np.int64)
shifted = session.run(shifted_x,
{x:tf.SparseTensorValue(indices, values, shape)})

alexnet_v2/conv1/biases not found in checkpoint

I try to save and restore alxenet slim model. But I always get this error when I run the codesaver.restore(sess, tf.train.latest_checkpoint('I:/model/mnist/')).
And which throw the error:
NotFoundError (see above for traceback): Key alexnet_v2/conv1/biases not found in checkpoint.
When I run the tf.global_variables(), I can only get the weights of conv2d, and there is no biases in the result.
I don't understand what's the problem is. Here is my code:
here is my alex model
def alexnet_v2_arg_scope(weight_decay=0.0005,
stddev=0.1,
batch_norm_var_collection='moving_vars',
use_fused_batchnorm=True):
batch_norm_params = {
# Decay for the moving averages.
'decay': 0.9997,
# epsilon to prevent 0s in variance.
'epsilon': 0.001,
# collection containing update_ops.
'updates_collections': ops.GraphKeys.UPDATE_OPS,
# Use fused batch norm if possible.
'fused': use_fused_batchnorm,
# collection containing the moving mean and moving variance.
'variables_collections': {
'beta': None,
'gamma': None,
'moving_mean': [batch_norm_var_collection],
'moving_variance': [batch_norm_var_collection],
}
}
with slim.arg_scope([slim.conv2d, slim.fully_connected],
activation_fn=tf.nn.relu,
biases_initializer=tf.constant_initializer,
normalizer_fn=slim.batch_norm,
normalizer_params=batch_norm_params,
weights_regularizer=slim.l2_regularizer(weight_decay)):
with slim.arg_scope([slim.conv2d], padding='SAME'):
with slim.arg_scope([slim.max_pool2d], padding='VALID') as arg_sc:
return arg_sc
def alex_net(inputs,
num_classes=10,
is_training=True,
droupout_keep_prob=0.5,
spatial_squeze=True,
scope='alexnet_v2'):
with tf.variable_scope(scope, 'alexnet_v2',[inputs]) as sc:
end_points_collection = sc.name + '_end_points'
with slim.arg_scope([slim.conv2d],
weights_initializer=trunc_norm(0.1),
biases_initializer=tf.constant_initializer(0.1),
outputs_collections=end_points_collection):
inputs = tf.reshape(inputs,[-1,28,28,1])
net = slim.conv2d(inputs, 64, [3, 3], 1, padding='VALID', scope='conv1')
net = slim.max_pool2d(net, [2, 2], 2, scope='pool1')
net = slim.conv2d(net, 128, [3, 3], scope='conv2')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool2')
# net = slim.conv2d(net, 384, [3, 3], scope='conv3')
# net = slim.conv2d(net, 384, [3, 3], scope='conv4')
net = slim.conv2d(net, 256, [3, 3], scope='conv3')
# net = slim.max_pool2d(net, [3, 3], 2, scope='pool5')
net = slim.conv2d(net, 512, [3, 3], scope='conv4')
with slim.arg_scope([slim.conv2d],
weights_initializer=trunc_norm(0.1),
biases_initializer=tf.constant_initializer(0.1)):
# net = slim.conv2d(net, 1028, [6, 6], padding='VALID', scope='fc6')
net = slim.avg_pool2d(net, [6,6], stride=1,padding='VALID',scope='avg_pool5' )
net = slim.dropout(net, droupout_keep_prob, is_training=is_training, scope='dropout6')
# net = slim.conv2d(net, 512, [1, 1], scope='fc7')
# net = slim.dropout(net, droupout_keep_prob, is_training=is_training, scope='dropout7')
net = slim.conv2d(net, num_classes, [1, 1],
activation_fn=None,
normalizer_fn=None,
biases_initializer=tf.zeros_initializer(),
scope='fc7x')
end_points = slim.utils.convert_collection_to_dict(end_points_collection)
if spatial_squeze:
net = tf.squeeze(net, [1,2], name='fc8/squeezed')
end_points[sc.name + '/fc8'] = net
return net, end_point
save model
varialbes_to_restore = slim.get_variables_to_restore()
saver = tf.train.Saver(varialbes_to_restore)
saver.save(sess,'I:/model/mnist/')
restore model
with tf.Session() as sess:
logits, _ = _alex_slim.alex_net(teX[:200])
saver = tf.train.Saver()
saver.restore(sess, tf.train.latest_checkpoint('I:/model/mnist/'))
logits_var = sess.run(logits)
print(logits_var)
Perhapse tensorflow expects you to save it in a checkpoint because you put it in a scope.
You will get the same error if you give a name to a tensor which isn't being saved.
If you don't want to save it, don't give it a name or scope.