I'm building a statefull LSTM used for language recognition.
Being statefull I can train the network with smaller files and a new batch will be like a next sentence in a discussion.
However for the network to be properly trained I need to reset the hidden state of the LSTM between some batches.
I'm using a variable to store the hidden_state of the LSTM for performance :
with tf.variable_scope('Hidden_state'):
hidden_state = tf.get_variable("hidden_state", [self.num_layers, 2, self.batch_size, self.hidden_size],
tf.float32, initializer=tf.constant_initializer(0.0), trainable=False)
# Arrange it to a tuple of LSTMStateTuple as needed
l = tf.unstack(hidden_state, axis=0)
rnn_tuple_state = tuple([tf.contrib.rnn.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(self.num_layers)])
# Build the RNN
with tf.name_scope('LSTM'):
rnn_output, _ = tf.nn.dynamic_rnn(cell, rnn_inputs, sequence_length=input_seq_lengths,
initial_state=rnn_tuple_state, time_major=True)
Now I'm confused on how to reset the hidden state. I've tried two solutions but it's not working :
First solution
Reset the "hidden_state" variable with :
rnn_state_zero_op = hidden_state.assign(tf.zeros_like(hidden_state))
It does work and I think it's because the unstack and tuple construction are not "re-played" into the graph after running the rnn_state_zero_op operation.
Second solution
Following LSTMStateTuple vs cell.zero_state() for RNN in Tensorflow I tried to reset the cell state with :
rnn_state_zero_op = cell.zero_state(self.batch_size, tf.float32)
It doesn't seem to work either.
Question
I've another solution in mind but it's guessing at best : I'm not keeping the state returned by tf.nn.dynamic_rnn, I've thought of it but I get a tuple and I can't find a way to build an op to reset the tuple.
At this point I've to admit that I don't quite understand the internal working of tensorflow and if it's even possible to do what I'm trying to do.
Is there a proper way to do it ?
Thanks !
Thanks to this answer to another question I was able to find a way to have complete control on whether or not (and when) the internal state of the RNN should be reset to 0.
First you need to define some variables to store the state of the RNN, this way you will have control over it :
with tf.variable_scope('Hidden_state'):
state_variables = []
for state_c, state_h in cell.zero_state(self.batch_size, tf.float32):
state_variables.append(tf.nn.rnn_cell.LSTMStateTuple(
tf.Variable(state_c, trainable=False),
tf.Variable(state_h, trainable=False)))
# Return as a tuple, so that it can be fed to dynamic_rnn as an initial state
rnn_tuple_state = tuple(state_variables)
Note that this version define directly the variables used by the LSTM, this is much better than the version in my question because you don't have to unstack and build the tuple, which add some ops to the graph that you cannot run explicitly.
Secondly build the RNN and retrieve the final state :
# Build the RNN
with tf.name_scope('LSTM'):
rnn_output, new_states = tf.nn.dynamic_rnn(cell, rnn_inputs,
sequence_length=input_seq_lengths,
initial_state=rnn_tuple_state,
time_major=True)
So now you have the new internal state of the RNN. You can define two ops to manage it.
The first one will update the variables for the next batch. So in the next batch the "initial_state" of the RNN will be fed with the final state of the previous batch :
# Define an op to keep the hidden state between batches
update_ops = []
for state_variable, new_state in zip(rnn_tuple_state, new_states):
# Assign the new state to the state variables on this layer
update_ops.extend([state_variable[0].assign(new_state[0]),
state_variable[1].assign(new_state[1])])
# Return a tuple in order to combine all update_ops into a single operation.
# The tuple's actual value should not be used.
rnn_keep_state_op = tf.tuple(update_ops)
You should add this op to your session anytime you want to run a batch and keep the internal state.
Beware : if you run batch 1 with this op called then batch 2 will start with the batch 1 final state, but if you don't call it again when running batch 2 then batch 3 will start with batch 1 final state also. My advice is to add this op every time you run the RNN.
The second op will be used to reset the internal state of the RNN to zeros:
# Define an op to reset the hidden state to zeros
update_ops = []
for state_variable in rnn_tuple_state:
# Assign the new state to the state variables on this layer
update_ops.extend([state_variable[0].assign(tf.zeros_like(state_variable[0])),
state_variable[1].assign(tf.zeros_like(state_variable[1]))])
# Return a tuple in order to combine all update_ops into a single operation.
# The tuple's actual value should not be used.
rnn_state_zero_op = tf.tuple(update_ops)
You can call this op whenever you want to reset the internal state.
Simplified version of AMairesse post for one LSTM layer:
zero_state = tf.zeros(shape=[1, units[-1]])
self.c_state = tf.Variable(zero_state, trainable=False)
self.h_state = tf.Variable(zero_state, trainable=False)
self.init_encoder = tf.nn.rnn_cell.LSTMStateTuple(self.c_state, self.h_state)
self.output_encoder, self.state_encoder = tf.nn.dynamic_rnn(cell_encoder, layer, initial_state=self.init_encoder)
# save or reset states
self.update_ops += [self.c_state.assign(self.state_encoder.c, use_locking=True)]
self.update_ops += [self.h_state.assign(self.state_encoder.h, use_locking=True)]
or you can use replacement for init_encoder to reset states at step == 0 (you need to pass self.step_tf into session.run() as placeholder):
self.step_tf = tf.placeholder_with_default(tf.constant(-1, dtype=tf.int64), shape=[], name="step")
self.init_encoder = tf.cond(tf.equal(self.step_tf, 0),
true_fn=lambda: tf.nn.rnn_cell.LSTMStateTuple(zero_state, zero_state),
false_fn=lambda: tf.nn.rnn_cell.LSTMStateTuple(self.c_state, self.h_state))
I'm using bidirectional_rnn with GRUCell but this is a general question regarding the RNN in Tensorflow.
I couldn't find how to initialize the weight matrices (input to hidden, hidden to hidden). Are they initialized randomly? to zeros? are they initialized differently for each LSTM I create?
EDIT: Another motivation for this question is in pre-training some LSTMs and using their weights in a subsequent model. I don't currently know how to do that currently without saving all the states and restoring the entire model.
Thanks.
How to initialize weight matrices for RNN?
I believe people are using random normal initialization for weight matrices for RNN. Check out the example in TensorFlow GitHub Repo. As the notebook is a bit long, they have a simple LSTM model where they use tf.truncated_normal to initialize weights and tf.zeros to initialize biases (although I have tried using tf.ones to initialize biases before, seem to also work). I believe that the standard deviation is a hyperparameter you could tune yourself. Sometimes weights initialization is important to the gradient flow. Although as far as I know, LSTM itself is designed to handle gradient vanishing problem (and gradient clipping is for helping gradient exploding problem), so perhaps you don't need to be super careful with the setup of std_dev in LSTM? I've read papers recommending Xavier initialization (TF API doc for Xavier initializer) in Convolution Neural Network context. I don't know if people use that in RNN, but I imagine you can even try those in RNN if you want to see if it helps.
Now to follow up with #Allen's answer and your follow up question left in the comments.
How to control initialization with variable scope?
Using the simple LSTM model in the TensorFlow GitHub python notebook that I linked to as an example.
Specifically, if I want to re-factorize the LSTM part of the code in above picture using variable scope control, I may code something as following...
import tensorflow as tf
def initialize_LSTMcell(vocabulary_size, num_nodes, initializer):
'''initialize LSTMcell weights and biases, set variables to reuse mode'''
gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
with tf.variable_scope('LSTMcell') as scope:
for gate in gates:
with tf.variable_scope(gate) as gate_scope:
wx = tf.get_variable("wx", [vocabulary_size, num_nodes], initializer)
wt = tf.get_variable("wt", [num_nodes, num_nodes], initializer)
bi = tf.get_variable("bi", [1, num_nodes, tf.constant_initializer(0.0)])
gate_scope.reuse_variables() #this line can probably be omitted, b.z. by setting 'LSTMcell' scope variables to 'reuse' as the next line, it'll turn on the reuse mode for all its child scope variables
scope.reuse_variables()
def get_scope_variables(scope_name, variable_names):
'''a helper function to fetch variable based on scope_name and variable_name'''
vars = {}
with tf.variable_scope(scope_name, reuse=True):
for var_name in variable_names
var = tf.get_variable(var_name)
vars[var_name] = var
return vars
def LSTMcell(i, o, state):
'''a function for performing LSTMcell computation'''
gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
var_names = ['wx', 'wt', 'bi']
gate_comp = {}
with tf.variable_scope('LSTMcell', reuse=True):
for gate in gates:
vars = get_scope_variables(gate, var_names)
gate_comp[gate] = tf.matmul(i, vars['wx']) + tf.matmul(o, vars['wt']) + vars['bi']
state = tf.sigmoid(gate_comp['forget_gate']) * state + tf.sigmoid(gate_comp['input_gate']) * tf.tanh(gate_comp['memory_cell'])
output = tf.sigmoid(gate_comp['output_gate']) * tf.tanh(state)
return output, state
The usage of the re-factorized code would be something like following...
initialize_LSTMcell(volcabulary_size, num_nodes, tf.truncated_normal_initializer(mean=-0.1, stddev=.01))
#...Doing some computation...
LSTMcell(input_tensor, output_tensor, state)
Even though the refactorized code may look less straightforward, but using scope variable control ensures scope encapsulation and allows flexible variable controls (in my opinion at least).
In pre-training some LSTMs and using their weights in a subsequent model. How to do that without saving all the states and restoring the entire model.
Assuming you have a pre-trained model froze and loaded in, if you wanna use their frozen 'wx', 'wt' and 'bi', you can simply find their parent scope names and variable names, then fetch the variables using similar structure in get_scope_variables func.
with tf.variable_scope(scope_name, reuse=True):
var = tf.get_variable(var_name)
Here is a link to understanding variable scope and sharing variables. I hope this is helpful.
The RNN models will create their variables with get_variable, and you can control the initialization by wrapping the code which creates those variables with a variable_scope and passing a default initializer to it. Unless the RNN specifies one explicitly (looking at the code, it doesn't), uniform_unit_scaling_initializer is used.
You should also be able to share model weights by declaring the second model and passing reuse=True to its variable_scope. As long as the namespaces match up, the new model will get the same variables as the first model.
A simple way to initialize all kernel weights with certain initializer is to leave the initializer in tf.variable_scope(). For example:
with tf.variable_scope('rnn', initializer=tf.variance_scaling_initializer()):
basic_cell= tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
outputs, state= tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)
I'm making my first steps learning TF and have some trouble training RNNs.
My toy problem goes like this: a two layers LSTM + dense layer network is fed with raw audio data and should test whether a certain frequency is present in the sound.
so the network should 1 to 1 map float(audio data sequence) to float(pre-chosen frequency volume)
I've got this to work on Keras and seen a similar TFLearn solution but would like to implement this on bare Tensorflow in a relatively efficient way.
what i've done:
lstm = rnn_cell.BasicLSTMCell(LSTM_SIZE,state_is_tuple=True,forget_bias=1.0)
lstm = rnn_cell.DropoutWrapper(lstm)
stacked_lstm = rnn_cell.MultiRNNCell([lstm] * 2,state_is_tuple=True)
outputs, states = rnn.dynamic_rnn(stacked_lstm, in, dtype=tf.float32)
outputs = tf.transpose(outputs, [1, 0, 2])
last = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)
network= tf.matmul(last, W) + b
# cost function, optimizer etc...
during training I fed this with (BATCH_SIZE, SEQUENCE_LEN,1) batches and it seems like the loss converged correctly but I can't figure out how to predict with the trained network.
My (awful lot of) questions:
how do i make this network return a sequence right from Tensorflow without going back to python for each sample(feed a sequence and predict a sequence of the same size)?
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
after some research:
how do i make this network return a sequence right from Tensorflow
without going back to python for each sample(feed a sequence and
predict a sequence of the same size)?
you can use state_saving_rnn
class Saver():
def __init__(self):
self.d = {}
def state(self, name):
if not name in self.d:
return tf.zeros([1,LSTM_SIZE],tf.float32)
return self.d[name]
def save_state(self, name, val):
self.d[name] = val
return tf.identity('save_state_name') #<-important for control_dependencies
outputs, states = rnn.state_saving_rnn(stacked_lstm, inx, Saver(),
('lstmstate', 'lstmstate2', 'lstmstate3', 'lstmstate4'),sequence_length=[EVAL_SEQ_LEN])
#4 states are for two layers of lstm each has hidden and CEC variables to restore
network = [tf.matmul(outputs[-1], W) for i in xrange(EVAL_SEQ_LEN)]
one problem is that state_saving_rnn is using rnn() and not dynamic_rnn() therefore unroll at compile time EVAL_SEQ_LEN steps you might want to re-implement state_saving_rnn with dynamic_rnn if you want to input long sequences
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
you can use dynamic_rnn and supply initial_state. this is probably just as efficient as state_saving_rnn. look at state_saving_rnn implementations for reference
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
dynamic_rnn does do unrolling at runtime similarly to compile time rnn(). I guess it returns all the steps for you to branch the graph in some other places - after less time steps. in a network that use [one time step input * current state -> one output, new state] like the one described above it's not needed in testing but could be used for training truncated time back propagation
I have seen two different ways of calling lstm on tensorflow and I am confused on what is the difference of one method with the other. And in which situation to use one or the other
The first one is to create an lstm and then call it immediatly like the code below
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state = tf.zeros([batch_size, lstm.state_size])
for i in range(num_steps):
# The value of state is updated after processing each batch of words.
output, state = lstm(words[:, i], state)
And the second one is call lstm cell through rnn.rnn() like below.
# Define a lstm cell with tensorflow
lstm = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
inputToLstmSplited = tf.split(0, n_steps, inputToLstm) # n_steps * (batch_size, n_hidden)
inputToLstmSplitedFiltered = tf.matmul(inputToLstmSplited, weights['hidden']) + biases['hidden']
# Get lstm cell out
outputs, states = rnn.rnn(lstm, inputToLstmSplited, initial_state=istate)
The second effectively does the same as the loop in the first, returning a list of all the outputs collected in the loop and the final state. It does it a bit more efficiently though and with a number of safety checks. It also supports useful features like variable sequence lengths. The first option is presented in Tensorflow tutorials to give you an idea of how an RNN is unravelled, but the second option is preferred for "production" code.
I'm a newbie to TensorFlow. I'm confused about the difference between tf.placeholder and tf.Variable. In my view, tf.placeholder is used for input data, and tf.Variable is used to store the state of data. This is all what I know.
Could someone explain to me more in detail about their differences? In particular, when to use tf.Variable and when to use tf.placeholder?
In short, you use tf.Variable for trainable variables such as weights (W) and biases (B) for your model.
weights = tf.Variable(
tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))), name='weights')
biases = tf.Variable(tf.zeros([hidden1_units]), name='biases')
tf.placeholder is used to feed actual training examples.
images_placeholder = tf.placeholder(tf.float32, shape=(batch_size, IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))
This is how you feed the training examples during the training:
for step in xrange(FLAGS.max_steps):
feed_dict = {
images_placeholder: images_feed,
labels_placeholder: labels_feed,
}
_, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)
Your tf.variables will be trained (modified) as the result of this training.
See more at https://www.tensorflow.org/versions/r0.7/tutorials/mnist/tf/index.html. (Examples are taken from the web page.)
The difference is that with tf.Variable you have to provide an initial value when you declare it. With tf.placeholder you don't have to provide an initial value and you can specify it at run time with the feed_dict argument inside Session.run
Since Tensor computations compose of graphs then it's better to interpret the two in terms of graphs.
Take for example the simple linear regression
WX+B=Y
where W and B stand for the weights and bias and X for the observations' inputs and Y for the observations' outputs.
Obviously X and Y are of the same nature (manifest variables) which differ from that of W and B (latent variables). X and Y are values of the samples (observations) and hence need a place to be filled, while W and B are the weights and bias, Variables (the previous values affect the latter) in the graph which should be trained using different X and Y pairs. We place different samples to the Placeholders to train the Variables.
We only need to save or restore the Variables (at checkpoints) to save or rebuild the graph with the code.
Placeholders are mostly holders for the different datasets (for example training data or test data). However, Variables are trained in the training process for the specific tasks, i.e., to predict the outcome of the input or map the inputs to the desired labels. They remain the same until you retrain or fine-tune the model using different or the same samples to fill into the Placeholders often through the dict. For instance:
session.run(a_graph, dict = {a_placeholder_name : sample_values})
Placeholders are also passed as parameters to set models.
If you change placeholders (add, delete, change the shape etc) of a model in the middle of training, you can still reload the checkpoint without any other modifications. But if the variables of a saved model are changed, you should adjust the checkpoint accordingly to reload it and continue the training (all variables defined in the graph should be available in the checkpoint).
To sum up, if the values are from the samples (observations you already have) you safely make a placeholder to hold them, while if you need a parameter to be trained harness a Variable (simply put, set the Variables for the values you want to get using TF automatically).
In some interesting models, like a style transfer model, the input pixes are going to be optimized and the normally-called model variables are fixed, then we should make the input (usually initialized randomly) as a variable as implemented in that link.
For more information please infer to this simple and illustrating doc.
TL;DR
Variables
For parameters to learn
Values can be derived from training
Initial values are required (often random)
Placeholders
Allocated storage for data (such as for image pixel data during a feed)
Initial values are not required (but can be set, see tf.placeholder_with_default)
The most obvious difference between the tf.Variable and the tf.placeholder is that
you use variables to hold and update parameters. Variables are
in-memory buffers containing tensors. They must be explicitly
initialized and can be saved to disk during and after training. You
can later restore saved values to exercise or analyze the model.
Initialization of the variables is done with sess.run(tf.global_variables_initializer()). Also while creating a variable, you need to pass a Tensor as its initial value to the Variable() constructor and when you create a variable you always know its shape.
On the other hand, you can't update the placeholder. They also should not be initialized, but because they are a promise to have a tensor, you need to feed the value into them sess.run(<op>, {a: <some_val>}). And at last, in comparison to a variable, placeholder might not know the shape. You can either provide parts of the dimensions or provide nothing at all.
There other differences:
the values inside the variable can be updated during optimizations
variables can be shared, and can be non-trainable
the values inside the variable can be stored after training
when the variable is created, 3 ops are added to a graph (variable op, initializer op, ops for the initial value)
placeholder is a function, Variable is a class (hence an uppercase)
when you use TF in a distributed environment, variables are stored in a special place (parameter server) and are shared between the workers.
Interesting part is that not only placeholders can be fed. You can feed the value to a Variable and even to a constant.
Adding to other's answers, they also explain it very well in this MNIST tutorial on Tensoflow website:
We describe these interacting operations by manipulating symbolic
variables. Let's create one:
x = tf.placeholder(tf.float32, [None, 784]),
x isn't a specific value. It's a placeholder, a value that we'll input when we ask TensorFlow to
run a computation. We want to be able to input any number of MNIST
images, each flattened into a 784-dimensional vector. We represent
this as a 2-D tensor of floating-point numbers, with a shape [None,
784]. (Here None means that a dimension can be of any length.)
We also need the weights and biases for our model. We could imagine
treating these like additional inputs, but TensorFlow has an even
better way to handle it: Variable. A Variable is a modifiable tensor
that lives in TensorFlow's graph of interacting operations. It can be
used and even modified by the computation. For machine learning
applications, one generally has the model parameters be Variables.
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
We create these Variables by giving tf.Variable the initial value of
the Variable: in this case, we initialize both W and b as tensors full
of zeros. Since we are going to learn W and b, it doesn't matter very
much what they initially are.
Tensorflow uses three types of containers to store/execute the process
Constants :Constants holds the typical data.
variables: Data values will be changed, with respective the functions such as cost_function..
placeholders: Training/Testing data will be passed in to the graph.
Example snippet:
import numpy as np
import tensorflow as tf
### Model parameters ###
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
### Model input and output ###
x = tf.placeholder(tf.float32)
linear_model = W * x + b
y = tf.placeholder(tf.float32)
### loss ###
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
### optimizer ###
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
### training data ###
x_train = [1,2,3,4]
y_train = [0,-1,-2,-3]
### training loop ###
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
sess.run(train, {x:x_train, y:y_train})
As the name say placeholder is a promise to provide a value later i.e.
Variable are simply the training parameters (W(matrix), b(bias) same as the normal variables you use in your day to day programming, which the trainer updates/modify on each run/step.
While placeholder doesn't require any initial value, that when you created x and y TF doesn't allocated any memory, instead later when you feed the placeholders in the sess.run() using feed_dict, TensorFlow will allocate the appropriately sized memory for them (x and y) - this unconstrained-ness allows us to feed any size and shape of data.
In nutshell:
Variable - is a parameter you want trainer (i.e. GradientDescentOptimizer) to update after each step.
Placeholder demo -
a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
adder_node = a + b # + provides a shortcut for tf.add(a, b)
Execution:
print(sess.run(adder_node, {a: 3, b:4.5}))
print(sess.run(adder_node, {a: [1,3], b: [2, 4]}))
resulting in the output
7.5
[ 3. 7.]
In the first case 3 and 4.5 will be passed to a and b respectively, and then to adder_node ouputting 7. In second case there's a feed list, first step 1 and 2 will be added, next 3 and 4 (a and b).
Relevant reads:
tf.placeholder doc.
tf.Variable doc.
Variable VS placeholder.
Variables
A TensorFlow variable is the best way to represent shared, persistent state manipulated by your program. Variables are manipulated via the tf.Variable class. Internally, a tf.Variable stores a persistent tensor. Specific operations allow you to read and modify the values of this tensor. These modifications are visible across multiple tf.Sessions, so multiple workers can see the same values for a tf.Variable. Variables must be initialized before using.
Example:
x = tf.Variable(3, name="x")
y = tf.Variable(4, name="y")
f = x*x*y + y + 2
This creates a computation graph. The variables (x and y) can be initialized and the function (f) evaluated in a tensorflow session as follows:
with tf.Session() as sess:
x.initializer.run()
y.initializer.run()
result = f.eval()
print(result)
42
Placeholders
A placeholder is a node (same as a variable) whose value can be initialized in the future. These nodes basically output the value assigned to them during runtime. A placeholder node can be assigned using the tf.placeholder() class to which you can provide arguments such as type of the variable and/or its shape. Placeholders are extensively used for representing the training dataset in a machine learning model as the training dataset keeps changing.
Example:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
Note: 'None' for a dimension means 'any size'.
with tf.Session as sess:
B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})
print(B_val_1)
[[6. 7. 8.]]
print(B_val_2)
[[9. 10. 11.]
[12. 13. 14.]]
References:
https://www.tensorflow.org/guide/variables
https://www.tensorflow.org/api_docs/python/tf/placeholder
O'Reilly: Hands-On Machine Learning with Scikit-Learn & Tensorflow
Think of Variable in tensorflow as a normal variables which we use in programming languages. We initialize variables, we can modify it later as well. Whereas placeholder doesn’t require initial value. Placeholder simply allocates block of memory for future use. Later, we can use feed_dict to feed the data into placeholder. By default, placeholder has an unconstrained shape, which allows you to feed tensors of different shapes in a session. You can make constrained shape by passing optional argument -shape, as I have done below.
x = tf.placeholder(tf.float32,(3,4))
y = x + 2
sess = tf.Session()
print(sess.run(y)) # will cause an error
s = np.random.rand(3,4)
print(sess.run(y, feed_dict={x:s}))
While doing Machine Learning task, most of the time we are unaware of number of rows but (let’s assume) we do know the number of features or columns. In that case, we can use None.
x = tf.placeholder(tf.float32, shape=(None,4))
Now, at run time we can feed any matrix with 4 columns and any number of rows.
Also, Placeholders are used for input data ( they are kind of variables which we use to feed our model), where as Variables are parameters such as weights that we train over time.
Placeholder :
A placeholder is simply a variable that we will assign data to at a later date. It allows us to create our operations and build our computation graph, without needing the data. In TensorFlow terminology, we then feed data into the graph through these placeholders.
Initial values are not required but can have default values with tf.placeholder_with_default)
We have to provide value at runtime like :
a = tf.placeholder(tf.int16) // initialize placeholder value
b = tf.placeholder(tf.int16) // initialize placeholder value
use it using session like :
sess.run(add, feed_dict={a: 2, b: 3}) // this value we have to assign at runtime
Variable :
A TensorFlow variable is the best way to represent shared,
persistent state manipulated by your program.
Variables are manipulated via the tf.Variable class. A tf.Variable
represents a tensor whose value can be changed by running ops on it.
Example : tf.Variable("Welcome to tensorflow!!!")
Tensorflow 2.0 Compatible Answer: The concept of Placeholders, tf.placeholder will not be available in Tensorflow 2.x (>= 2.0) by default, as the Default Execution Mode is Eager Execution.
However, we can use them if used in Graph Mode (Disable Eager Execution).
Equivalent command for TF Placeholder in version 2.x is tf.compat.v1.placeholder.
Equivalent Command for TF Variable in version 2.x is tf.Variable and if you want to migrate the code from 1.x to 2.x, the equivalent command is
tf.compat.v2.Variable.
Please refer this Tensorflow Page for more information about Tensorflow Version 2.0.
Please refer the Migration Guide for more information about migration from versions 1.x to 2.x.
Think of a computation graph. In such graph, we need an input node to pass our data to the graph, those nodes should be defined as Placeholder in tensorflow.
Do not think as a general program in Python. You can write a Python program and do all those stuff that guys explained in other answers just by Variables, but for computation graphs in tensorflow, to feed your data to the graph, you need to define those nods as Placeholders.
For TF V1:
Constant is with initial value and it won't change in the computation;
Variable is with initial value and it can change in the computation; (so good for parameters)
Placeholder is without initial value and it won't change in the computation. (so good for inputs like prediction instances)
For TF V2, same but they try to hide Placeholder (graph mode is not preferred).
In TensorFlow, a variable is just another tensor (like tf.constant or tf.placeholder). It just so happens that variables can be modified by the computation. tf.placeholder is used for inputs that will be provided externally to the computation at run-time (e.g. training data). tf.Variable is used for inputs that are part of the computation and are going to be modified by the computation (e.g. weights of a neural network).