I'm currently building my operations twice, once for training and once for validation with the variable_scope set to have reuse=True to ensure I've only got one set of weights to train.
To organize the operations though, I wrap the operation building call for training in a
with tf.name_scope='train':
and similarly do the same for validation. This allows me to create a few summary hooks easily, by simply calling
tf.summary.merge(tf.get_collection(tf.GraphKeys.SUMMARIES, scope='train'))
at the end to get summaries for either the training graph or the validation graph and save these summaries with the appropriate summary saver.
Unfortunately, this also means that a scalar in the training summaries is not displayed on the same plot as the equivalent scalar in the validation (because they are in different name scopes).
Is there either a way to remove the name scope before saving the summary, or a different method of wrapping the summaries for a specific case together without applying the name scope to begin with? Or do I need to manually keep track of the summaries for each case?
EDIT:
Just clarify, my code looks something like:
with tf.name_scope('train'):
create_network() # Summaries create in here.
with tf.name_scope('validation'):
create_network(reuse=True) # More summaries in here.
train_summaries = tf.summary.merge(tf.get_collection(tf.GraphKeys.SUMMARIES, scope='train'))
validation_summaries = tf.summary.merge(tf.get_collection(tf.GraphKeys.SUMMARIES, scope='validation'))
# Down here, create the summary saver hooks, etc.
Something like this is done in the multi-GPU CIFRA-10 example code to get rid of unnecessary prefixes:
loss_name = re.sub('%s_[0-9]*/' % cifar10.TOWER_NAME, '', l.op.name)
tf.summary.scalar(loss_name, l)
Perhaps you can report the scalar with the same name from both validation as well as the training part of your code.
Related
I use this code to restore my model, but I don't know how to predict after restoring it, which function can I use? I'm a beginner in tensorflow, I have no idea to which parameters or function will be saved.
In the meta model:
sess = tf.Session()
saver = tf.train.import_meta_graph("/home/MachineLearning/model.ckpt.meta")
saver.restore(sess,tf.train.latest_checkpoint('./'))
print("Model restored with success ")
x_predict,y_predict= load_svmlight_file('/MachineLearning/to_predict.csv')
x_predict = x_valid.toarray()
sess.run([] ,feed_dict ) #i don't know how to use predict function
These are the results:
$python predict.py
Model restored with success
Traceback (most recent call last):
File "predict.py", line 23, in <module>
sess.run([] ,feed_dict )
NameError: name 'feed_dict' is not defined
You're almost there. Tensorflow is simply a math library. Your graph is a collection of math operations with the associated dependencies (e.g. a graph, DAG specifically).
When you loaded the graph and associated variables (weights) you loaded all the definitions. Now you need to ask tensorflow to compute some value in the graph. There are lots of values it could compute, the one you want is often named logits (a typical name for the output layer of a neural network). But note that it could be named anything (especially if this isn't a neural network model), you need to understand the model. You might also want to compute an operation named accuracy which is defined to compute the accuracy of a particular batch of inputs (again depends on your model).
Note that you will need to provide tensorflow with whatever it needs to perform these computations. There is generally a placeholder where you pass in your data (and during training a placeholder for your labels which you don't need for prediction because none of the operations you will ask tensorflow to compute depend on it).
But you will need to get references to these various operations (logits, and accuracy) and placeholders (x is a typical name). Since you loaded your graph from disk you don't have the references (note that an alternative way of loading the model is to re-run the code that builds the model, which gives you easy access to the references you need).
In order to get the right references you can look them up by name. Here's how you would get a list of all the operations:
List of tensor names in graph in Tensorflow
Then to get a specific OP (operation) by name:
How to get a tensorflow op by name?
So you'll have something like this:
logits = tf.get_default_graph().get_operation_by_name("logits:0")
x = tf.get_default_graph().get_operation_by_name("x:0")
accuracy = tf.get_default_graph().get_operation_by_name("accuracy:0")
Note that the :0 is an index added to all names in tensorflow to avoid duplicate names. Now you have all the references you need and you can use sess.run to perform a specific computation, providing the input data, and OPs you'd like to have computed:
sess.run([logits, accuracy], feed_dict={x:your_input_data_in_numpy_format})
The names of these elements will vary in your implementation, I've used the most common names. If they weren't given pretty names it'll be hard to identify them and you'll need to look through the original code that produced the graph. In fact if they weren't named properly looking them up by name is so painful that it's probably better to just re-run the code that produced the original graph rather than import the meta graph. Notice that saver.restore only restores the actual data, import_meta_graph is the optional piece which can be replaced by simply re-building the graph programmatically.
Tensorflow describes writing file summaries to visualize graph execution.
I envision three stages:
training the data (with optimization)
measuring accuracy on the training set (no optimization)
measuring accuracy on the test set (no optimization!)
I'd like all stages in the same script, as in the evaluate function of the wide_and_deep tutorial, but with the low-level API. I'd like three different graphs for stats like loss or AUC, one for each stage.
Suppose I use one session, and in each stage I define an AUC summary op:
# define auc
auc, auc_op = tf.metrics.auc(labels, predictions)
# summary scalar to track it
tf.summary.scalar("auc", auc_op, family=family_name)
# merge all summaries for evaluation and later writing
summary_op = tf.summary.merge_all()
...
summary_writer.add_summary(summary, step_num)
There are three graphs, but the first graph has all three runs on it, and the second graph has the last two runs (see below). What's worse, each stage starts from the previous state. This makes sense, because all the variables from the previous stages are still around.
I could use a different session for each stage, but that would throw away the model as well.
What is the smooth way to handle this?
I'd like to just clear some of the summary variables. I've tried re-initializing some variables, looked at related questions, read about name scope and variable scope and tried not to re-use variables for AUC, read about variables and sharing, looked into pruning nodes (though I don't understand it), etc. I have not made it work yet.
I am using the low-level API. I saw something like this in the high-level API in _eval_metric_ops, but I don't understand how they 'clear' the different stages. With name_scope?
Do I have to save and load the model into a new session just for this, or is there some clean way to graph each summary separately?
The metric ops will be local variables, so you could run tf.local_variables_initializer() in your Session, which will reset all of your metrics. You could also look through the local variables collection for those with "auc" in the name if you wanted to be a bit more discerning. The high-level way to do this would be to use an Estimator, which will manage metrics for you.
I want to train a model. Every 1000 steps, I want to evaluate it on the test set and write it to the tensorboard log. However, there's a problem. I have a code like this:
image_b_train, label_b_train = tf.train.shuffle_batch(...)
out_train = model.inference(image_b_train)
accuracy_train = tf.reduce_mean(...)
image_b_test, label_b_test = tf.train.shuffle_batch(...)
out_test = model.inference(image_b_test)
accuracy_test = tf.reduce_mean(...)
where model inference declares the variables in the model. However, there's a problem. For the test set I have a separate queue, and I can't swap one queue for another with tensorflow.
Currently I solved the problem by creating 2 graphs, one for training and the other for testing. I copy from one graph to the other with tf.train.Saver. Another solution might be to use tf.get_variable, but this is a global variable, and I don't like it because my code becomes less reusable.
Yes, you need two graphs. These graphs can share variables. This can be done by:
Using Keras layers (from tf.contrib.keras) which let you define the model once and use it to compute two inference graphs
Using slim-style layers (from tf.layers) with tf.get_variable and reuse
Using tf.make_template to make your own model-like object which can be called once to build the training graph and once to build the inference graph
Using tf.estimator.Estimator which lets you define a model function once and runs it automatically for training and evaluation for you
There are other options, but any of these is well-supported and should unblock you.
I would like to achieve something like this using tensorflow.
I can only find documentation on saving and restoring variables (weights). However, like #2-2, I want to utilize the output of a hidden layer (tensor) as input of another model. Can this be done?
As far as I'm aware, it is not possible to chain different computation graphs after they have been created however, you have a few options.
Option 2: Create one large graph and use a control flow op
output_layer, placeholder = build_my_model()
something = tf.where(output_layer < 0, do_something_1(), do_something_2())
where all function calls above should return tensorflow operations.
Option2: Create two graphs and perform the conditional statement inside python
# Build the first graph
with tf.Graph().as_default() as graph:
output_layer, placeholder = build_my_model()
# Build the second two graphs
with tf.Graph().as_default() as graph_1:
something_1 = do_something_1()
with tf.Graph().as_default() as graph_2:
something_2 = do_something_2()
As a result, you will also end up with three different sessions and you need to feed the output from the first session to one of the other two
# Get the output
_output_layer = sess.run(output_layer, {placeholder: ...})
if _output_layer < 0:
something = sess1.run(something_1, {...})
else:
something = sess2.run(something_2, {...})
As you can see, if you can get away with the control flow op, your code will be significantly simpler. Another advantage of having everything in one graph is that the entire graph is differentiable and you can train parameters of the first stage of your model conditional on the loss at a later stage.
I am trying to implement the CBOW word2vec model based on the skipgrams implementation on the tensorflow repository:
https://github.com/tensorflow/tensorflow/blob/v0.10.0/tensorflow/models/embedding/word2vec.py
I have previously implemented the simplified version following the TensorFlow tutorials, so I understand that I will have to modify the data batching function as well as a small part of the graph to get the context embedding.
In the skipgram implementation, the data batching function is used in lines 348-351.
(words, counts, words_per_epoch, self._epoch, self._words, examples,
labels) = word2vec.skipgram(filename=opts.train_data,
batch_size=opts.batch_size,
window_size=opts.window_size,
min_count=opts.min_count,
subsample=opts.subsample)
From my understanding, the variables assigned are as follows:
words: terms in the vocabulary
counts: associated counts of terms used in the corpus
words_per_epoch: total word count in the corpus
self._epoch: current count of epochs used
self._words: current count of training examples used
examples: current batch of training examples
labels: current batch of training labels
I have managed to replicate the tensor for words, counts, words_per_epoch, examples and labels. However, self._epoch and self._words have eluded me. If my understanding is correct, I need to be able to track the count of the training examples used. However, this is not provided by the sample batching function. The counts are later used in a multi-threaded manner to terminate the training loop, hence I can't simply use a loop to add up the counts.
I understand that bits of the tensorflow ops are implemented in C++. However, as I am not familiar with C++, I will have to replicate those parts using Python.
Will be great if I can get some suggestions to obtain the tensor for self._words. The tensor basically has to increment only when every time a new batch of examples/labels are called. With that, I can simply use a self._epoch = self._words // words_per_epoch to get the other tensor.
Figured out the trick while looking at the source code for tensorflow.models.embedding.word2vec_optimized.py. Specifically, how global_step was incremented when loss was called in lines 218-225.
In my case, I would have to do it as so:
# codes to prepare features and labels tensors
data_processed = tf.Variable(0, trainable=False, dtype=tf.int64)
epochs_processed = data_processed // data_per_epoch
inc_op = data_processed.assign_add(batch_size)
with tf.control_dependencies([inc_op]):
features_batch, labels_batch = tf.train.batch([features, labels],
batch_size=batch_size)
In this case, the tensor data_processed will always be incremented by batch_size whenever features_batch or labels_batch is called. epochs_processed will also be incremented accordingly.
The use of tf.control_dependencies(control_inputs) is key here. It returns a context manager. The operations specified in control_inputs must be executed before the operations defined in the context.