Tensorflow: Save and restore the output of a tensor - tensorflow

I would like to achieve something like this using tensorflow.
I can only find documentation on saving and restoring variables (weights). However, like #2-2, I want to utilize the output of a hidden layer (tensor) as input of another model. Can this be done?

As far as I'm aware, it is not possible to chain different computation graphs after they have been created however, you have a few options.
Option 2: Create one large graph and use a control flow op
output_layer, placeholder = build_my_model()
something = tf.where(output_layer < 0, do_something_1(), do_something_2())
where all function calls above should return tensorflow operations.
Option2: Create two graphs and perform the conditional statement inside python
# Build the first graph
with tf.Graph().as_default() as graph:
output_layer, placeholder = build_my_model()
# Build the second two graphs
with tf.Graph().as_default() as graph_1:
something_1 = do_something_1()
with tf.Graph().as_default() as graph_2:
something_2 = do_something_2()
As a result, you will also end up with three different sessions and you need to feed the output from the first session to one of the other two
# Get the output
_output_layer = sess.run(output_layer, {placeholder: ...})
if _output_layer < 0:
something = sess1.run(something_1, {...})
else:
something = sess2.run(something_2, {...})
As you can see, if you can get away with the control flow op, your code will be significantly simpler. Another advantage of having everything in one graph is that the entire graph is differentiable and you can train parameters of the first stage of your model conditional on the loss at a later stage.

Related

Creating an image summary only for a subset of validation set images using Tensorflow Estimator API

I'm trying to add image summary operations to visualize how well my network manages to reconstruct inputs from the validation set. However, since there are too many images in the validation set I would only like to plot a small subset of them.
I managed to achieve this with manual training loop, but I struggle to achieve the same with the new Tensorflow Estimator/Experiment/Datasets API. Has anyone done something like this?
The Experiment and Estimator are high level TensorFlow APIs. Although you could probably solve your issue with a hook, if you want more control on what's happening during the training process, it may be easier not to use these APIs.
That said, you can still use the Dataset API which will bring you a lot of useful features.
To solve your problem with the Dataset API, you will need to switch between train and validation datasets in your training loop.
One way to do that is to use a feedable iterator. See here for more details:
https://www.tensorflow.org/programmers_guide/datasets
You can also see a full example switching between training and validation with the Dataset API in this notebook.
In brief, after having created your train_dataset and your val_dataset, your training loop could be something like this:
# create TensorFlow Iterator objects
training_iterator = val_dataset.make_initializable_iterator()
val_iterator = val_dataset.make_initializable_iterator()
with tf.Session() as sess:
# Initialize variables
init = tf.global_variables_initializer()
sess.run(init)
# Create training data and validation data handles
training_handle = sess.run(training_iterator.string_handle())
validation_handle = sess.run(val_iterator.string_handle())
for epoch in range(number_of_epochs):
# Tell iterator to go to beginning of dataset
sess.run(training_iterator.initializer)
print ("Starting epoch: ", epoch)
# iterate over the training dataset and train
while True:
try:
sess.run(train_op, feed_dict={handle: training_handle})
except tf.errors.OutOfRangeError:
# End of epoch
break
# Tell validation iterator to go to beginning of dataset
sess.run(val_iterator.initializer)
# run validation on only 10 examples
for i in range(10):
my_value = sess.run(my_validation_op, feed_dict={handle: validation_handle}))
# Do whatever you want with my_value
...
I figured out a solution that uses Estimator/Experiment API.
First you need to modify your Dataset input to not only provide labels and features, but also some form of an identifier for each sample (in my case it was a filename). Then in the hyperparameters dictionary (params argument) you need to specify which of the validation samples you want to plot. You also will have to pass the model_dir in those parameters. For example:
params = tf.contrib.training.HParams(
model_dir=model_dir,
images_to_plot=["100307_EMOTION.nii.gz", "100307_FACE-SHAPE.nii.gz",
"100307_GAMBLING.nii.gz", "100307_RELATIONAL.nii.gz",
"100307_SOCIAL.nii.gz"]
)
learn_runner.run(
experiment_fn=experiment_fn,
run_config=run_config,
schedule="train_and_evaluate",
hparams=params
)
Having this set up you can create conditional Summary operations in your model_fn and an evaluation hook to include them in your outputs.
if mode == tf.contrib.learn.ModeKeys.EVAL:
summaries = []
for image_to_plot in params.images_to_plot:
is_to_plot = tf.equal(tf.squeeze(filenames), image_to_plot)
summary = tf.cond(is_to_plot,
lambda: tf.summary.image('predicted', predictions),
lambda: tf.summary.histogram("ignore_me", [0]),
name="%s_predicted" % image_to_plot)
summaries.append(summary)
evaluation_hooks = [tf.train.SummarySaverHook(
save_steps=1,
output_dir=os.path.join(params.model_dir, "eval"),
summary_op=tf.summary.merge(summaries))]
else:
evaluation_hooks = None
Note that the summaries have to be conditional - we are either plotting an image (computationally expensive) or saving a constant (computationally cheap). I opted for using histogram versus scalar in for the dummy summaries to avoid cluttering my tensorboard dashboard.
Finally you need to pass the hook in the return object of your `model_fn'
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=loss,
train_op=train_op,
evaluation_hooks=evaluation_hooks
)
Please note that this only works when your batch size is 1 when evaluating the model (which should not be a problem).

Using tf.train.Saver() on convolutional layers in tensorflow

I'm attempting to use tf.train.Saver() to apply transfer learning between two convolutional neural network graphs in tensorflow and I'd like to validate that my methods are working as expected. Is there a way to inspect the trainable features in a tf.layers.conv2d() layer?
My methods
1. initialize layer
conv1 = tf.layers.conv2d(inputs=X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize,
strides=conv1_stride, padding=conv1_pad,
activation=tf.nn.relu,
kernel_initializer=tf.contrib.layers.xavier_initializer(),
bias_initializer=tf.zeros_initializer(), trainable=True,
name="conv1")
2. {Train the network}
3. Save current graph
tf.train.Saver().save(sess, "./my_model_final.ckpt")
4. Build new graph that includes the same layer, load specified weights with Saver()
reuse_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="conv[1]")
reuse_vars_dict = dict([(var.op.name, var) for var in reuse_vars])
restore_saver = tf.train.Saver(reuse_vars_dict)
...
restore_saver.restore(sess, "./my_model_final.ckpt")
5. {Train and evaluate the new graph}
My Question:
1) My code works 'as expected' and without error, but I'm not 100% confident it's working like I think it is. Is there a way to print the trainable features from a layer to ensure that I'm loading and saving weights correctly? Is there a "better" way to save/load parameters with the tf.layers API? I noticed a request on GitHub related to this. Ideally, I'd like to check these values on the first graph a) after initialization b) after training and on the new graph i) after loading the weights ii) after training/evaluation.
Is there a way to print the trainable features from a layer to ensure that I'm loading and saving weights correctly?
Yes, you first need to get a handle on the layer's variables. There are several ways to do that, but arguably the simplest is using the get_collection() function:
conv1_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="conv1")
Note that the scope here is treated as a regular expression, so you can write things like conv[123] if you want all variables from scopes conv1, conv2 and conv3.
If you just want trainable variables, you can replace GLOBAL_VARIABLES with TRAINABLE_VARIABLES.
If you just want to check a single variable, such as the layer's kernel, then you can use get_tensor_by_name() like this:
graph = tf.get_default_graph()
kernel_var = graph.get_tensor_by_name("conv1/kernel:0")
Yet another option is to just iterate on all variables and filter based on their names:
conv1_vars = [var for var in tf.global_variables()
if var.op.name.startswith("conv1/")]
Once you have a handle on these variables, you can just evaluate them at different points, e.g. just after initialization, just after restoring the graph, just after training, and so on, and compare the values. For example, this is how you would get the values just after initialization:
with tf.Session() as sess:
init.run()
conv1_var_values_after_init = sess.run(conv1_vars)
Then once you have captured the variable values at the various points that you are interested in, you can check whether or not they are equal (or close enough, taking into account tiny floating point imprecisions) like so:
same = np.allclose(conv1_var_values_after_training,
conv1_var_values_after_restore)
Is there a "better" way to save/load parameters with the tf.layers API?
Not that I'm aware of. The feature request you point to is not really about saving/loading the parameters to disk, but rather to be able to easily get a handle on a layer's variables, and to easily create an assignment node to set their values.
For example, it will be possible (in TF 1.4) to get a handle on a layer's kernel and get its value very simply, like this:
conv1_kernel_value = conv1.kernel.eval()
Of course, you can use this to get/set a variable's value and load/save it to disk, like this:
conv1 = tf.layers.conv2d(...)
new_kernel = tf.placeholder(...)
assign_kernel = conv1.kernel.assign(new_kernel)
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
loaded_kernel = my_function_to_load_kernel_value_from_disk(...)
assign_kernel.run(feed_dict={new_kernel: loaded_kernel})
...
It's not pretty. It might be useful if you want to load/save to a database (instead of a flat file), but in general I would recommend using a Saver.
I hope this helps.

Using tf.cond() to feed my graph for training and validation

In my TensorFlow code I want my network to take inputs from one of the two StagingArea objects depending upon whether I want to do training or testing.
A part of the graph construction code I wrote is as follows :
with tf.device("/gpu:0"):
for i in range(numgpus):
with tf.variable_scope(tf.get_variable_scope(), reuse=i>0) as vscope:
with tf.device('/gpu:{}'.format(i)):
with tf.name_scope('GPU-Tower-{}'.format(i)) as scope:
phase = tf.get_variable("phase", [], initializer=tf.zeros_initializer(),dtype=tf.uint8, trainable=False)
phaseassigntest = phase.assign(1)
phaseassigntrain = phase.assign(0)
phasetest = tf.equal(phase, 0)
is_training = tf.cond(phasetest, lambda: tf.constant(True), lambda: tf.constant(False))
trainstagingarea = tf.contrib.staging.StagingArea([tf.float32, tf.int32], shapes=[[trainbatchsize, 3, 221, 221], [trainbatchsize]], capacity=20)
putoptrain = trainstagingarea.put(train_iterator.get_next())
trainputop.append(putoptrain)
getoptrain = trainstagingarea.get()
traingetop.append(getoptrain)
trainclearop = trainstagingarea.clear()
trainstageclear.append(trainclearop)
trainsizeop = trainstagingarea.size()
trainstagesize.append(trainsizeop)
valstagingarea = tf.contrib.staging.StagingArea([tf.float32, tf.int32], shapes=[[valbatchsize, 3, 221, 221], [valbatchsize]], capacity=20)
putopval = valstagingarea.put(val_iterator.get_next())
valputop.append(putopval)
getopval = valstagingarea.get()
valgetop.append(getopval)
valclearop = valstagingarea.clear()
valstageclear.append(valclearop)
valsizeop = valstagingarea.size()
valstagesize.append(valsizeop)
#elem = valgetop[i]
elem = tf.cond(is_training,lambda: traingetop[i],lambda: valgetop[i])
img = elem[0]
label = elem[1]
labelonehot = tf.one_hot(label, depth=numclasses)
net, networksummaries = overfeataccurate(img,numclasses=numclasses, phase=is_training)
I have used tf.cond to make sure that the network is fed by one of the two StagingArea objects. One is meant for training and the other one is meant for validation.
Now, when I try to execute the graph as follows, I do not get any result and infact the code just hangs and I have to kill the process.
with tf.Session(graph=g,config=config) as sess:
sess.run(init_op)
sess.run(tf.local_variables_initializer())
sess.run(val_initialize)
for i in range(20):
sess.run(valputop)
print(sess.run(valstagesize))
writer = tf.summary.FileWriter('.', graph=tf.get_default_graph())
epoch = 0
iter = 0
print("Performing Validation")
sess.run(phaseassigntest)
saver = tf.train.Saver()
while(epoch<10):
time_init = time.time()
while True:
try:
[val_accu, _, summaries] = sess.run([towervalidation, towervalidationupdateop,validation_summary_op])
print(val_accu)
when instead of tf.cond() I directly assign elem = valgetop[i], the code works just fine.
Am I missing something over here ?
What is the right way to feed my network based on whether I want to do training or testing ?
NOTE The error does not go away even if I set numgpus to 1.
Your problem
What you think tf.cond does
Based on the flag, execute what is required to put either traingetop[i] or valgetop[i] into your elem tensor.
What tf.cond actually does
Executes what is required to get both traingetop[i] and valgetop[i], then passes one of them into your elem tensor.
So
The reason it is hanging forever is because it's waiting for an element to be added to your training staging area (so that it can get that element and discard it). You're forgiven for not realising this is what it's doing; it's actually very counter-intuitive. The documentation is awfully unclear on how to deal with this.
Recommended Solution (by Tensorflow documentation)
If you really need the queues to be in the same graph, then you need to make two copies of your ENTIRE graph, one that is fed by your training staging area, and one that is fed by your validation staging area. Then you just use the relevant tensor in your sess.run call. I recommend creating a function that takes a queue output tensor, and returns a model_output tensor. Now you have a train_time_output tensor and a validation_time_output tensor, and you can choose which one you want to execute in your sess.run.
Warning
You need to make sure that you aren't actually creating new variables to go along with these new ops. To do that take a look at the latest documentation on variables. It looks like they've simplified it from v0.12, and it essentially boils down to using tf.get_variable instead of tf.Variable to create your variables.
My preferred work around
Although that is the recommended solution (AFAIK), it is extremely unsatisfying to me; you're creating a whole other set of operations on the graph that just happen to use the same weights. It seems like there's a lot of potential for programmer error by abusing the separation between train time and test/validation time (resulting in the model acting unexpectedly different at these times). Worse; it doesn't solve the problem of tf.cond demanding the values for inputs to both branches, it just forces you to copy your whole graph, which is not always possible.
I prefer to just not have my queues in the graph like that, and treat the model as a function which can be fed an example without caring where it's from. That is, I would instantiate the model with a tf.placeholder as the input, and at execution time I would use feed_dict to actually provide the value. It would function something like this
#inside main training loop
if time_to_train:
example = sess.run(traingettop)
else:
example = sess.run(valgettop)
result = sess.run(model_output, {input_placeholder: example})
It's very useful to note that you can use the feed_dict to feed any value for any tensor anywhere in your model. So, you can change any model definition that, due to tf.cond would always require an input, like:
a = tf.constant(some_value)
b = tf.placeholder(tf.float32)
flag = tf.placeholder(tf.bool, [])
one_of_them = tf.cond(flag, a, b)
model_output = build_graph(one_of_them)
Into a definition that doesn't, like:
a = tf.constant(some_value)
model_output = build_graph(a)
Remembering that you can always overwrite what a is at execution time:
# In main training loop,
sess.run(train_op, {a: some_other_value})
This essentially pushes the conditional into native python land. In your code you might end up with something like:
if condition_satisfied:
sess.run(train_op, {a:some_other_value})
else:
sess.run(train_op)
Performance concerns
If you are using tensorflow on a single machine, then there is practically no performance cost to this solution, as the numpy array/s put into the example python variable are actually still stored on the GPU.
If you are using tensorflow in a distributed fashion, then this solution would kill your performance; it would require sending the example from whatever machine it's on to the master so that it can send it back.

TensorFlow - Removing name scope when plotting summary?

I'm currently building my operations twice, once for training and once for validation with the variable_scope set to have reuse=True to ensure I've only got one set of weights to train.
To organize the operations though, I wrap the operation building call for training in a
with tf.name_scope='train':
and similarly do the same for validation. This allows me to create a few summary hooks easily, by simply calling
tf.summary.merge(tf.get_collection(tf.GraphKeys.SUMMARIES, scope='train'))
at the end to get summaries for either the training graph or the validation graph and save these summaries with the appropriate summary saver.
Unfortunately, this also means that a scalar in the training summaries is not displayed on the same plot as the equivalent scalar in the validation (because they are in different name scopes).
Is there either a way to remove the name scope before saving the summary, or a different method of wrapping the summaries for a specific case together without applying the name scope to begin with? Or do I need to manually keep track of the summaries for each case?
EDIT:
Just clarify, my code looks something like:
with tf.name_scope('train'):
create_network() # Summaries create in here.
with tf.name_scope('validation'):
create_network(reuse=True) # More summaries in here.
train_summaries = tf.summary.merge(tf.get_collection(tf.GraphKeys.SUMMARIES, scope='train'))
validation_summaries = tf.summary.merge(tf.get_collection(tf.GraphKeys.SUMMARIES, scope='validation'))
# Down here, create the summary saver hooks, etc.
Something like this is done in the multi-GPU CIFRA-10 example code to get rid of unnecessary prefixes:
loss_name = re.sub('%s_[0-9]*/' % cifar10.TOWER_NAME, '', l.op.name)
tf.summary.scalar(loss_name, l)
Perhaps you can report the scalar with the same name from both validation as well as the training part of your code.

How do you create a dynamic_rnn with dynamic "zero_state" (Fails with Inference)

I have been working with the "dynamic_rnn" to create a model.
The model is based upon a 80 time period signal, and I want to zero the "initial_state" before each run so I have setup the following code fragment to accomplish this:
state = cell_L1.zero_state(self.BatchSize,Xinputs.dtype)
outputs, outState = rnn.dynamic_rnn(cell_L1,Xinputs,initial_state=state, dtype=tf.float32)
This works great for the training process. The problem is once I go to the inference, where my BatchSize = 1, I get an error back as the rnn "state" doesn't match the new Xinputs shape. So what I figured is I need to make "self.BatchSize" based upon the input batch size rather than hard code it. I tried many different approaches, and none of them have worked. I would rather not pass a bunch of zeros through the feed_dict as it is a constant based upon the batch size.
Here are some of my attempts. They all generally fail since the input size is unknown upon building the graph:
state = cell_L1.zero_state(Xinputs.get_shape()[0],Xinputs.dtype)
.....
state = tf.zeros([Xinputs.get_shape()[0], self.state_size], Xinputs.dtype, name="RnnInitializer")
Another approach, thinking the initializer might not get called until run-time, but still failed at graph build:
init = lambda shape, dtype: np.zeros(*shape)
state = tf.get_variable("state", shape=[Xinputs.get_shape()[0], self.state_size],initializer=init)
Is there a way to get this constant initial state to be created dynamically or do I need to reset it through the feed_dict with tensor-serving code? Is there a clever way to do this only once within the graph maybe with an tf.Variable.assign?
The solution to the problem was how to obtain the "batch_size" such that the variable is not hard coded.
This was the correct approach from the given example:
Xinputs = tf.placeholder(tf.int32, (None, self.sequence_size, self.num_params), name="input")
state = cell_L1.zero_state(Xinputs.get_shape()[0],Xinputs.dtype)
The problem is the use of "get_shape()[0]", this returns the "shape" of the tensor and takes the batch_size value at [0]. The documentation doesn't seem to be that clear, but this appears to be a constant value so when you load the graph into an inference, this value is still hard coded (maybe only evaluated at graph creation?).
Using the "tf.shape()" function, seems to do the trick. This doesn't return the shape, but a tensor. So this seems to be updated more at run-time. Using this code fragment solved the problem of a training batch of 128 and then loading the graph into TensorFlow-Service inference handling a batch of just 1.
Xinputs = tf.placeholder(tf.int32, (None, self.sequence_size, self.num_params), name="input")
batch_size = tf.shape(Xinputs)[0]
state = self.cell_L1.zero_state(batch_size,Xinputs.dtype)
Here is a good link to TensorFlow FAQ which describes this approach 'How do I build a graph that works with variable batch sizes?':
https://www.tensorflow.org/resources/faq