Proper way to optimize the input in TensorFlow for visualization - tensorflow

I have trained a model in TensorFlow and now I would like to visualize which inputs maximally activate an output. I'd like to know what the cleanest way to do this is.
I had thought to do this by creating a trainable input variable which I can assign once per run. Then by using an appropriate loss function and using an optimizer with a var_list containing just this input variable I would update this input variable until convergence. i.e.
trainable_input = tf.get_variable(
'trainable_input',
shape=data_op.get_shape(),
dtype=data_op.dtype,
initializer=tf.zeros_initializer(),
trainable=True,
collections=[tf.GraphKeys.LOCAL_VARIABLES])
trainable_input_assign_op = tf.assign(trainable_input, data_op)
data_op = trainable_input
# ... run the rest of the graph building code here, now with a trainable input
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
# loss_op is defined on one of the outputs
train_op = optimizer.minimize(loss_op, var_list=[trainable_input])
However, when I do this I run into issues. If I try to restore the pre-trained graph using a Supervisor, then it naturally complains that the new variables created by the AdamOptimizer do not exist in the graph I'm trying to restore. I can remedy this by using get_slots to get the variables the AdamOptimizer creates and manually adding those variables to the tf.GraphKeys.LOCAL_VARIABLES collection, but it feels pretty hacky and I'm not sure what the consequences of this would be. I can also exclude those variables explicitly from the Saver that is passed to the Supervisor without adding them to the tf.GraphKeys.LOCAL_VARIABLES collection, but then I get an exception that they do not get properly initialized by the Supervisor:
File "/usr/local/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 973, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 801, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python3.5/site-packages/tensorflow/python/training/coordinator.py", line 386, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python3.5/site-packages/six.py", line 686, in reraise
raise value
File "/usr/local/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 962, in managed_session
start_standard_services=start_standard_services)
File "/usr/local/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 719, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/usr/local/lib/python3.5/site-packages/tensorflow/python/training/session_manager.py", line 280, in prepare_session
self._local_init_op, msg))
RuntimeError: Init operations did not make model ready. Init op: init, init fn: None, local_init_op: name: "group_deps_5"
op: "NoOp"
input: "^init_1"
input: "^init_all_tables"
, error: Variables not initialized: trainable_input/trainable_input/Adam, trainable_input/trainable_input/Adam_1
I'm not really sure why these variables are not getting initialized since I have used that technique before to exclude some variables from the restore process (GLOBAL and LOCAL) and they seem to get initialized as expected.
In short, my question is whether there is a simple way to add an optimizer to the graph and do a checkpoint restore (where the checkpoint does not contain the optimizer variables) without having to muck around with the internals of the optimizer. If that's not possible, then is there any downside to just adding the optimizer variables to the LOCAL_VARIABLES collection?

The same error occurs when I use slim library. In fact, the slim.learning.train() uses tf.train.Supervisor inside. I hope my answer on this GitHub issue may help your Supervisor problem.
I have the same problem with you. I solve it by doing following two steps.
1. pass the parameter saver to slim.learning.train()
ckpt = tf.train.get_checkpoint_state(FLAGS.train_dir)
saver = tf.train.Saver(var_list=optimistic_restore_vars(ckpt.model_checkpoint_path) if ckpt else None)
where function optimistic_restore_vars is defined as
def optimistic_restore_vars(model_checkpoint_path):
reader = tf.train.NewCheckpointReader(model_checkpoint_path)
saved_shapes = reader.get_variable_to_shape_map()
var_names = sorted([(var.name, var.name.split(':')[0]) for var in tf.global_variables() if var.name.split(':')[0] in saved_shapes])
restore_vars = []
name2var = dict(zip(map(lambda x:x.name.split(':')[0], f.global_variables()), tf.global_variables()))
with tf.variable_scope('', reuse=True):
for var_name, saved_var_name in var_names:
curr_var = name2var[saved_var_name]
var_shape = curr_var.get_shape().as_list()
if var_shape == saved_shapes[saved_var_name]:
restore_vars.append(curr_var)
return restore_vars
```
2. pass the parameter local_init_op to slim.learning.train() to initialize the added new variables
local_init_op = tf.global_variables_initializer()
In last, the code should look like this
ckpt = tf.train.get_checkpoint_state(FLAGS.train_dir)
saver = tf.train.Saver(var_list=optimistic_restore_vars ckpt.model_checkpoint_path) if ckpt else None)
local_init_op = tf.global_variables_initializer()
###########################
# Kicks off the training. #
###########################
learning.train(
train_tensor,
saver=saver,
local_init_op=local_init_op,
logdir=FLAGS.train_dir,
master=FLAGS.master,
is_chief=(FLAGS.task == 0),
init_fn=_get_init_fn(),
summary_op=summary_op,
number_of_steps=FLAGS.max_number_of_steps,
log_every_n_steps=FLAGS.log_every_n_steps,
save_summaries_secs=FLAGS.save_summaries_secs,
save_interval_secs=FLAGS.save_interval_secs,
sync_optimizer=optimizer if FLAGS.sync_replicas else None
)

Related

How to use tf.train.Saver in SessionRunHook?

I have trained many sub-models, each sub-models is a part of the last model. And then I want to use those pretrained sub models to initial the last model's parameters. I try to use SessionRunHook to load other ckpt file's model parameters to initial the last model's.
I tried the follow code but failed. Hope some advices. Thanks!
The error info is:
Traceback (most recent call last):
File "train_high_api_local.py", line 282, in <module>
tf.app.run()
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "train_high_api_local.py", line 266, in main
clf_.train(input_fn=lambda: read_file([tables[0]], epochs_per_eval), steps=None, hooks=[hook_test]) # input yield: x, y
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 314, in train
.......
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 674, in create_session
hook.after_create_session(self.tf_sess, self.coord)
File "train_high_api_local.py", line 102, in after_create_session
saver = tf.train.Saver([ti]) # TODO: ERROR INFO: Graph is finalized and cannot be modified.
.......
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3135, in create_op
self._check_not_finalized()
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2788, in _check_not_finalized
raise RuntimeError("Graph is finalized and cannot be modified.")
RuntimeError: Graph is finalized and cannot be modified.
and the code detail is:
class SetTensor(session_run_hook.SessionRunHook):
""" like tf.train.LoggingTensorHook """
def after_create_session(self, session, coord):
""" Called when new TensorFlow session is created: graph is finalized and ops can no longer be added. """
graph = tf.get_default_graph()
ti = graph.get_tensor_by_name("h_1_15/bias:0")
with session.as_default():
with tf.name_scope("rewrite"):
saver = tf.train.Saver([ti]) # TODO: ERROR INFO: Graph is finalized and cannot be modified.
saver.restore(session, "/Users/zhouliaoming/data/credit_dnn/model_retrain/rm_gene_v2_sall/model.ckpt-2102")
pass
def main(unused_argv):
""" train """
norm_all_func = lambda x: tf.cond(x>1, lambda: tf.log(x), lambda: tf.identity(x))
feature_columns=[[tf.feature_column.numeric_column(COLUMNS[i], shape=fi, normalizer_fn=lambda x: tf.py_func(weight_norm2, [x], tf.float32) )] for i, fi in enumerate(FEA_DIM)] # normlized: running OK!
## use self-defined model
param = {"learning_rate": 0.0001, "feature_columns": feature_columns, "isanalysis": FLAGS.isanalysis, "isall": False}
clf_ = tf.estimator.Estimator(model_fn=model_fn_wide2deep, params=param, model_dir=ckpt_dir)
hook_test = SetTensor(["h_1_15/bias", "h_1_15/kernel"])
epochs_per_eval = 1
for n in range(int(FLAGS.num_epochs/epochs_per_eval)):
# train num_epochs
clf_.train(input_fn=lambda: read_file([tables[0]], epochs_per_eval), steps=None, hooks=[hook_test]) # input yield: x, y
SessionRunHook is not meant for this use case. As the error says, you cannot change the graph once sess.run() has been invoked.
You can assign variables using saver.restore() in your "normal code". You don't have to be inside any hooks.
Also, if you want to restore many variables and can match them to their names and shapes in a checkpoint, you might want to take a look at https://gist.github.com/iganichev/d2d8a0b1abc6b15d4a07de83171163d4. It shows some example code to restore a subset of variables.
You can do this:
class SaveAtEnd(tf.train.SessionRunHook):
def begin(self):
self._saver = # create your saver
def end(self, session):
self._saver.save(session, ...)

How to use tf.train.MonitoredTrainingSession to restore only certain variables

How does one tell a tf.train.MonitoredTrainingSession to restore only a subset of the variables, and perform intialization on the rest?
Starting with the cifar10 tutorial ..
https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_train.py
.. I created lists of the variables to restore and initialize, and specified them using a Scaffold that I pass to the MonitoredTrainingSession:
restoration_saver = Saver(var_list=restore_vars)
restoration_scaffold = Scaffold(init_op=variables_initializer(init_vars),
ready_op=constant([]),
saver=restoration_saver)
but this gives the following error:
RuntimeError: Init operations did not make model ready for local_init. Init op: group_deps, init fn: None, error: Variables not initialized: conv2a/T, conv2b/T, [...]
.. where the uninitialized variables listed in the error message are the variables in my "init_vars" list.
The exception is raised by SessionManager.prepare_session(). The source code for that method seems to indicate that if the session is restored from a checkpoint, then the init_op is not run. So it looks like you can either have restored variables or initialized variables, but not both.
OK so as I suspected, I got what I wanted by implementing a new RefinementSessionManager class based on the existing tf.training.SessionManager. The two classes are almost identical, except I modified the prepare_session method to call the init_op regardless of whether the model was loaded from a checkpoint.
This allows me to load a list of variables from the checkpoint and initialize the remaining variables in the init_op.
My prepare_session method is this:
def prepare_session(self, master, init_op=None, saver=None,
checkpoint_dir=None, wait_for_checkpoint=False,
max_wait_secs=7200, config=None, init_feed_dict=None,
init_fn=None):
sess, is_loaded_from_checkpoint = self._restore_checkpoint(
master,
saver,
checkpoint_dir=checkpoint_dir,
wait_for_checkpoint=wait_for_checkpoint,
max_wait_secs=max_wait_secs,
config=config)
# [removed] if not is_loaded_from_checkpoint:
# we still want to run any supplied initialization on models that
# were loaded from checkpoint.
if not is_loaded_from_checkpoint and init_op is None and not init_fn and self._local_init_op is None:
raise RuntimeError("Model is not initialized and no init_op or "
"init_fn or local_init_op was given")
if init_op is not None:
sess.run(init_op, feed_dict=init_feed_dict)
if init_fn:
init_fn(sess)
# [...]
Hope this helps somebody else.
The hint from #avital works, to be more complete: pass a scaffolding object into MonitoredTrainingSession with a local_init_op and a ready_for_local_init_op. Like so:
model_ready_for_local_init_op = tf.report_uninitialized_variables(
var_list=var_list)
model_init_tmp_vars = tf.variables_initializer(var_list)
scaffold = tf.train.Scaffold(saver=model_saver,
local_init_op = model_init_tmp_vars,
ready_for_local_init_op = model_ready_for_local_init_op)
with tf.train.MonitoredTrainingSession(...,
scaffold=scaffold,
...) as mon_sess:
...
You can solve this with the local_init_op argument, which does get run after loading from a checkpoint.
Scaffold 's arguments contain following:
init_op
ready_op
local_init_op
ready_for_local_init_op
init_op will only be called when we do NOT restore from a checkpoint.
if not is_loaded_from_checkpoint:
if init_op is None and not init_fn and self._local_init_op is None:
raise RuntimeError("Model is not initialized and no init_op or "
"init_fn or local_init_op was given")
if init_op is not None:
sess.run(init_op, feed_dict=init_feed_dict)
if init_fn:
init_fn(sess)
So actually init_op can not help here. If you can write a new SessionManager, you can follow #user550701. We can also use local_init_op, but it may be a little tricky in distributed situations.
Scaffold will generate default init_op and local_init_op for us: Details here
init_op: will initialize tf.global_variables
local_init_op: will initialize tf.local_variables
We should initialize our variables and do not break the default mechanism at the same time.
One worker situation
You can create local_init_op like this:
target_collection = [] # Put your target tensors here
collection = tf.local_variables() + target_collection
local_init_op = tf.variables_initializer(collection)
ready_for_local_init_op = tf.report_uninitialized_variables(collection)
Distributed situation
We should take care of duplicate initialization of our target_collection because local_init_op will be called multiple times on multiple workers. If variables are local, it makes no difference. If they are global variables, we should make sure that it only be initialized once. To solve the duplicate problem, we can manipulate the collection variable. On chief worker, it includes both local variables and our target_collection. While for non-chief worker, we only put local variables into it.
if is_chief:
collection = tf.local_variables() + target_collection
else:
collection = tf.local_variables()
All in all, it is a little tricky, but we do not have to hack into tensorflow.
I had encountered the same problem, and my solution is
checkpoint_restore_dir_for_monitered_session = None
scaffold = None
if params.restore:
checkpoint_restore_dir_for_monitered_session = checkpoint_save_dir
restore_exclude_name_list = params.restore_exclude_name_list
if len(restore_exclude_name_list) != 0:
variables_to_restore, variables_dont_restore = get_restore_var_list(restore_exclude_name_list)
saver_for_restore = tf.train.Saver(var_list=variables_to_restore, name='saver_for_restore')
ready_for_local_init_op = tf.report_uninitialized_variables(variables_to_restore.values())
local_init_op = tf.group([
tf.initializers.local_variables(),
tf.initializers.variables(variables_dont_restore)
])
scaffold = tf.train.Scaffold(saver=saver_for_restore,
ready_for_local_init_op=ready_for_local_init_op,
local_init_op=local_init_op)
with tf.train.MonitoredTrainingSession(
checkpoint_dir=checkpoint_restore_dir_for_monitered_session,
save_checkpoint_secs=None, # don't save ckpt
hooks=train_hooks,
config=config,
scaffold=scaffold,
summary_dir=params.log_dir) as sess:
pass
In this code fragment, get_restore_var_list gets variables_to_restore and variables_dont_restore.
saver_for_restore only restores variables in variables_to_restore, which are checked and pass through by ready_for_local_init_op after that.
Then local_init_op will run, which initialize local_variables() and variables_dont_restore (maybe tf.variance_scaling_initializer...).

tf.contrib.slim.get_variables_to_restore() does not return value

Running below code tf.contrib.slim.get_variables_to_restore() return empty value [] for all_vars, and then causing failure when calling tf.train.Saver. Detail error message shows below.
Am I missing anything?
>>> import tensorflow as tf
>>> inception_exclude_scopes = ['InceptionV3/AuxLogits', 'InceptionV3/Logits', 'global_step', 'final_ops']
>>> inception_checkpoint_file = '/Users/morgan.du/git/machine-learning/projects/capstone/yelp/model/inception_v3_2016_08_28.ckpt'
>>> with tf.Session(graph=tf.Graph()) as sess:
... init_op = tf.global_variables_initializer()
... sess.run(init_op)
... reader = tf.train.NewCheckpointReader(inception_checkpoint_file)
... var_to_shape_map = reader.get_variable_to_shape_map()
... all_vars = tf.contrib.slim.get_variables_to_restore(exclude=inception_exclude_scopes)
... inception_saver = tf.train.Saver(all_vars)
... inception_saver.restore(sess, inception_checkpoint_file)
...
Traceback (most recent call last):
File "<stdin>", line 7, in <module>
File "/Users/morgan.du/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1051, in __init__
self.build()
File "/Users/morgan.du/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1072, in build
raise ValueError("No variables to save")
ValueError: No variables to save
The problem here seems to be that your graph is empty—i.e. it does not contain any variables. You create a new graph on the line with tf.Session(graph=tf.Graph()):, and none of the following lines creates a tf.Variable object.
To restore a pre-trained TensorFlow model, you need to do one of three things:
Rebuild the model graph, by executing the same Python graph building code that was used to train the model in the first place.
Load a "MetaGraph" that contains information about how to reconstruct the graph structure and model variables. See this tutorial for more details on how to create and use a MetaGraph. MetaGraphs are often created alongside checkpoint files, and typically have the extension .meta.
Load a "SavedModel", which contains a "MetaGraph". See the documentation here for more details.

Why can't I access the variable I create using the variable name plus scope path in TensorFlow?

I was trying to get a variable I created in a simple function but I keep getting errors. I am doing:
x = tf.get_variable('quadratic/x')
but the python complains as follow:
python qm_tb_scopes.py
quadratic/x:0
Traceback (most recent call last):
File "qm_tb_scopes.py", line 24, in <module>
x = tf.get_variable('quadratic/x')
File "/Users/my_username/path/tensor_flow_experiments/venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 732, in get_variable
partitioner=partitioner, validate_shape=validate_shape)
File "/Users/my_username/path/tensor_flow_experiments/venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 596, in get_variable
partitioner=partitioner, validate_shape=validate_shape)
File "/Users/my_username/path/tensor_flow_experiments/venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 161, in get_variable
caching_device=caching_device, validate_shape=validate_shape)
File "/Users/my_username/path/tensor_flow_experiments/venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 457, in _get_single_variable
"but instead was %s." % (name, shape))
ValueError: Shape of a new variable (quadratic/x) must be fully defined, but instead was <unknown>.
it seems its trying to create a new variable, but I am simply trying to get a defined one. Why is it doing this?
The whole code is:
import tensorflow as tf
def get_quaratic():
# x variable
with tf.variable_scope('quadratic'):
x = tf.Variable(10.0,name='x')
# b placeholder (simualtes the "data" part of the training)
b = tf.placeholder(tf.float32,name='b')
# make model (1/2)(x-b)^2
xx_b = 0.5*tf.pow(x-b,2)
y=xx_b
return y,x
y,x = get_quaratic()
learning_rate = 1.0
# get optimizer
opt = tf.train.GradientDescentOptimizer(learning_rate)
# gradient variable list = [ (gradient,variable) ]
print x.name
x = tf.get_variable('quadratic/x')
x = tf.get_variable(x.name)
You need to pass the option reuse=True to tf.variable_scope() if you want to get the same variable twice.
See the documentation (https://www.tensorflow.org/versions/r0.9/how_tos/variable_scope/index.html)
for more details.
Alternatively, you could get the variable once, outside your Python function, and pass it in as a argument in Python. I find that a bit cleaner since it makes it explicit what variables the code uses.
I hope that helps!
This is not the best solution, but try creating the variable through tf.get_variable() with reuse=False to ensure a new variable is created. Then, when obtaining the variable, use tf.get_variable() with reuse=True to get the current variable. Setting reuse to tf.AUTO_REUSE risks the creation of a new variable if the exact var is not present. Also make sure to specify the shape of the variable in tf.get_variable().
import tensorflow as tf
def get_quaratic():
# x variable
with tf.variable_scope('quadratic', reuse=False):
x = tf.get_variable('x', ())
tf.assign(x, 10)
# b placeholder (simualtes the "data" part of the training)
b = tf.placeholder(tf.float32,name='b')
# make model (1/2)(x-b)^2
xx_b = 0.5*tf.pow(x-b,2)
y=xx_b
return y,x
y,x = get_quaratic()
learning_rate = 1.0
# get optimizer
opt = tf.train.GradientDescentOptimizer(learning_rate)
# gradient variable list = [ (gradient,variable) ]
print (x.name)
with tf.variable_scope('', reuse=True):
x = tf.get_variable('quadratic/x', shape=())
print(tf.global_variables()) # there is only 1 variable

How to get the value of a variable defined in tf.name_scope()?

with tf.name_scope('hidden4'):
weights = tf.Variable(tf.convert_to_tensor(weights4))
biases = tf.Variable(tf.convert_to_tensor(biases4))
hidden4 = tf.sigmoid(tf.matmul(hidden3, weights) + biases)
I want to ues tf.get_variable to get the variable hidden4/weights defined as above, but failed as below:
hidden4weights = tf.get_variable("hidden4/weights:0")
*** ValueError: Variable hidden4/weights:0 already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/pdb.py", line 234, in default
exec code in globals, locals
File "/usr/local/lib/python2.7/cmd.py", line 220, in onecmd
return self.default(line)
Then I try hidden4/weights.eval (sess), but it also failed.
(Pdb) hidden4/weights.eval(sess)
*** NameError: name 'hidden4' is not defined
tf.name_scope() is used to visualize variables.
tf.name_scope(name)
Wrapper for Graph.name_scope() using the default graph.
What I think you are looking for is tf.variable_scope():
Variable Scope mechanism in TensorFlow consists of 2 main functions:
tf.get_variable(, , ): Creates or returns a variable with a given name.
tf.variable_scope(): Manages namespaces for names passed to tf.get_variable().
with tf.variable_scope('hidden4'):
# No variable in this scope with name exists, so it creates the variable
weights = tf.get_variable("weights", <shape>, tf.convert_to_tensor(weights4)) # Shape of a new variable (hidden4/weights) must be fully defined
biases = tf.get_variable("biases", <shape>, tf.convert_to_tensor(biases4)) # Shape of a new variable (hidden4/biases) must be fully defined
hidden4 = tf.sigmoid(tf.matmul(hidden3, weights) + biases)
with tf.variable_scope('hidden4', reuse=True):
hidden4weights = tf.get_variable("weights")
assert weights == hidden4weights
That should do it.
I have solved the problem above:
classifyerlayer_W=[v for v in tf.all_variables() if v.name == "softmax_linear/weights:0"][0] #find the variable by name "softmax_linear/weights:0"
init= numpy.random.randn(2048, 4382) # create a array you use to re-initial the variable
assign_op = classifyerlayer_W.assign(init) # create a assign operation
sess.run(assign_op) # run op to finish the assign