What is tensorflow.matmul? - tensorflow

From the output of print, it is function. But according to the official document:
An Operation is a node in a TensorFlow Graph that takes zero or more
Tensor objects as input, and produces zero or more Tensor objects as
output. Objects of type Operation are created by calling a Python op
constructor (such as tf.matmul) or tf.Graph.create_op.
it is a constructor. So I think it is a class name. But, printing the return value of tf.matmul shows it is a tensor, not an "Object of type Operation". Is the class Tensor inherited from the class Operation? I tried to find the definition of tf.matmul in tensorflow source code but could not get it.

tf.matmul (or tf.linalg.matmul) is a function. You can find its definition in the math_ops module. The behavior of these functions do depend on whether you are using eager execution (default in 2.x) or graph mode (default in 1.x).
With eager execution, the function receives a couple of eager tensors (tensors with their actual value, as opposed to "symbolic") and runs the computation of their matrix product. What you get is another eager tensor containing the result.
In graph mode, the function does not run any computation. It just receives two symbolic tensors (for which the value will not be determined until later), adds a matrix production operation to the current graph and gives you the symbolic tensor of its result. Tensors do not inherit from operations in any case. The graph contains nodes which are operations, which generally have inputs and/or outputs that are tensors. In graph mode, functions like tf.linalg.matmul usually give you the resulting tensor, not the operation, because it is more convenient (you rarely need to access the operation itself). When you give a name to these functions (e.g. name='MyMatMul'), it will be the name of the operation, and the output tensors of the operation (which in most cases is only one) will have that name plus : and its output index (e.g. MyMatMul:0). When you have a tensor t, you can access the operation that produced it with t.op. When you have an operation op, you can access the input and output tensors of the operation with op.inputs and op.outputs, and its type (the kind of operation it is representing, like MatMul) with op.type. These properties cannot be accessed with eager execution, as they only make sense when you have a graph.

Related

Why variables and constants are operations in TensorFlow?

Intuitively, I expected an operation to be something that takes an input and modifies it (add, substract, divide, square root...). In fact, that's the definition of operation I found on the Internet. Then, why variables and constants are also operations in TensorFlow?
TensorFlow generalizes your definition of operation as something that takes zero or more inputs and produces zero or more outputs. Concretely, a TensorFlow Operation is defined as:
An Operation is a node in a TensorFlow Graph that takes zero or more Tensor objects as input, and produces zero or more Tensor objects as output.
Therefore:
A constant is an operation without inputs that produces a single Tensor as output.
A variable is a special (stateful) operation that takes one Tensor (initial value) as input and produces another Tensor as output.

Tensorflow: difference get_tensor_by_name vs get_operation_by_name?

The answer here says that one returns an operation while the other returns a tensor. That is pretty obvious from the name and from the documentation. However, suppose I do the following:
logits = tf.add(tf.matmul(inputs, weights), biases, name='logits')
I am following the pattern described in Tensorflow Mechanics 101. Should I restore it as an operation or as a tensor? I am afraid that if I restore it as a tensor I will only get the last computed values for the logits; nonetheless, the post here, seems to suggest that there is no difference or that I should just use get_tensor_by_name. The idea is to compute the logits for a new set of inputs and then make predictions accordingly.
Short answer: you can use both, get_operation_by_name() and get_tensor_by_name(). Long answer:
tf.Operation
When you call
op = graph.get_operation_by_name('logits')
... it returns an instance of type tf.Operation, which is a node in the computational graph, which performs some op on its inputs and produces one or more outputs. In this case, it's a plus op.
One can always evaluate an op in a session, and if this op needs some placehoder values to be fed in, the engine will force you to provide them. Some ops, e.g. reading a variable, don't have any dependencies and can be executed without placeholders.
In your case, (I assume) logits are computed from the input placeholder x, so logits doesn't have any value without a particular x.
tf.Tensor
On the other hand, calling
tensor = graph.get_tensor_by_name('logits:0')
... returns an object tensor, which has the type tf.Tensor:
Represents one of the outputs of an Operation.
A Tensor is a symbolic handle to one of the outputs of an Operation.
It does not hold the values of that operation's output, but instead
provides a means of computing those values in a TensorFlow tf.Session.
So, in other words, tensor evaluation is the same as operation execution, and all the restrictions described above apply as well.
Why is Tensor useful? A Tensor can be passed as an input to another Operation, thus forming the graph. But in your case, you can assume that both entities mean the same.

Tensorflow, node which is dead

I'm studying tensorflow process, however there are lots of hardness to understand the process.
One of term that I not fully understand is about node. In source code(C++ core), there are lots of various type of node. However I'm curious about dead node. It would be different from constant node. I want to know the reason of dead node existence, in other words, role of dead node.
What is live data vs dead data? Is that like data which is not used vs already used? ... I think I'm still not fully understand about this term. It can be shown in function ActivateNodes() (executor.cc)
I think these questions could be so basic for studying tensorflow, however I want to know exactly.
Thanks
First of all, dead tensors are an implementation detail of TensorFlow's control flow constructs: tf.cond() and tf.while_loop(). These constructs enable TensorFlow to determine whether or not to execute a subgraph based on a data-dependent value.
Let's consider the simpler tf.cond(pred, true_fn, false_fn) case. The value of pred determines whether the ops in true_fn or false_fn will be executed. In the current implementation, pred feeds into a Switch op, which sends a regular tensor on one input and a dead tensor on the other input. If pred is true, the dead tensor is sent along output_false (and vice versa) The tf.cond() implementation is set up so that the ops in true_fn depend on the output_true and the ops in the false_fn depend on output_false.
When a tensor receives a dead tensor as one of its input, it doesn't execute; instead it sends a dead tensor on all of its outputs. This dead-tensor propagation ensures that only the ops in the appropriate branch will execute.
How does tf.cond() stop a dead tensor from propagating all the way to the output? A second special op, called a Merge op handles dead inputs differently. A Merge op has two or more inputs, and it expects to get a dead input for all except one of the inputs; it then forwards the not-dead input to its output. tf.cond() uses Merge ops to combine the results from the true_fn and false_fn, and so the results of the taken branch are returned as the output of the overall tf.cond() subgraph.

What caching model does TensorFlow use?

I read the question here
TensorFlow - get current value of a Variable
and the answer has left me confused.
On one hand, dga says "And to be very clear: Running the variable will
produce only the current value of the variable; it will not run any
assign operations associated with it. It's cheap."
On the other hand, Salvador Dali says "#dga yes, if the variable depends
on n other variables, they also need to be evaluated."
So, which is it? Does evaluating the variable only return its current
value, or does it recompute its value from scratch from the variables it
depends on?
What happens if I evaluate the same variable twice in a row? Does
Tensorflow have any notion of "stale" variables, i.e. variables that
need to be recomputed because their dependencies actually changed (i.e. like in
build system)?
I ask because I work with multiple nets where the partial output of one
net becomes the partial input of another net. I want to fetch the
gradients computed at the input layer of one net and merge+apply them to
the output layer of another net. I was hoping to do this by manually
retrieving/storing gradients in the variables of a graph, and then
running graph operations to backpropagate the gradients. Thus I need to
understand how it all works under the hood.
What I do is similar to this
How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?, but I can't conclude whether it's possible based on the last answer (experimental support now in?)
Thanks!
#dga is correct. If you pass a tf.Variable object to tf.Session.run() TensorFlow will return the current value of the variable, and it will not perform any computation. It is cheap (the cost of a memory copy, or possibly a network transfer in the case of a distributed TensorFlow setup). TensorFlow does not retain any history* about how the value of a tf.Variable was updated, so it cannot in general recompute its value from scratch.
(* Technically TensorFlow remembers the tf.Tensor that was used to initialize each variable, so it is possible to recompute the inital value of the variable.)

At what stage is a tensorflow graph set up?

An optimizer typically run the same computation graph for many steps until convergence. Does tensorflow setup the graph at the beginning and reuse it for every step? What if I change the batch size during training? What if I make some minus change to the graph like changing the loss function? What if I made some major change to the graph? Does tensorflow pre-generate all possible graphs? Does tensorflow know how to optimize the entire computation when the graph changes?
As keveman says, from the client's perspective there is a single TensorFlow graph. In the runtime, there can be multiple pruned subgraphs that contain just the nodes that are necessary to compute the values t1, t2 etc. that you fetch when calling sess.run([t1, t2, ...]).
If you call sess.run([t1, t2]) will prune the overall graph (sess.graph) down to the subgraph required to compute those values: i.e. the operations that produce t1 and t2 and all of their antecedents. If you subsequently call sess.run([t3, t4]), the runtime will prune the graph down to the subgraph required to compute t3 and t4. Each time you pass a new combination of values to fetch, TensorFlow will compute a new pruned graph and cache it—this is why the first sess.run() can be somewhat slower than subsequent ones.
If the pruned graphs overlap, TensorFlow will reuse the "kernel" for the ops that are shared. This is relevant because some ops (e.g. tf.Variable and tf.FIFOQueue) are stateful, and their contents can be used in both pruned graphs. This allows you, for example, to initialize your variables with one subgraph (e.g. sess.run(tf.initialize_all_variables())), train them with another (e.g. sess.run(train_op)), and evaluate your model with a third (e.g. sess.run(loss, feed_dict={x: ...})). It also lets you enqueue elements to a queue with one subgraph, and dequeue them with another, which is the foundation of the input pipelines.
TensorFlow exposes only one graph that is visible to the user, namely the one specified by the user. The user can run the graph with Session.run() or by calling Tensor.eval() on some tensor. A Session.run() call can specify some tensors to be fed and others to be fetched. Depending on what needs to be fetched, the TensorFlow runtime could be internally constructing and optimizing various data structures, including a pruned version of the user visible graph. However, this internal graph is not visible to the user in anyway. No, TensorFlow doesn't 'pre-generate' all possible graphs. Yes, TensorFlow does perform extensive optimizations on the computation graph. And finally, changing the batch size of a tensor that is fed doesn't change the structure of the graph.