in TensorFlow runtime, how tensors are copied? - tensorflow

I am reading the whole Tensorflow source code, and has been puzzled by one thing. In an Op, we can get the underlying data buffer of an input tensor and change its value, but this change will not reflected outside this op (the input is not a Ref type).
For example,
y = op1(x)
z = op2(x)
in op1, suppose we get the underlying buffer of x and change its value, but when I run y_val, z_val = sess.run([y, z]), it seems that this does not affect the value of z (it x really changes, z should change).
here since x tensor is consumed by two ops, I initially think maybe tensorflow split x into two tensors, one as the input of op1, and the other as input of op2. However, I checked the code, it seems not.
Another possibility is that tensor is copy-on-write, but it also seems not after I checked the code.
Anyone know what really happen here? Thanks a lot.

Related

TFP Linear Regression yhat=model(x_tst) - doesn't work for other data

I cannot see the difference between what I am doing and the working Google TFP example, whose structure I am following. What am I doing wrong/should I be doing differently?
[Setup: Win 10 Home 64-bit 20H2, Python 3.7, TF2.4.1, TFP 0.12.2, running in Jupyter Lab]
I have been building a model step by step following the example of TFP Probabilistic Layers Regression. The Case 1 code runs fine, but my parallel model doesn't and I cannot see the difference that might cause this
yhat = model(x_tst)
to fail with message Input 0 of layer sequential_14 is incompatible with the layer: : expected min_ndim=2, found ndim=1. Full shape received: (2019,) (which is the correct 1D size of x_tst)
For comparison: Google's load_dataset function for the TFP example returns y, x, x_tst, which are all np.ndarray of size 150, whereas I read data from a csv file with pandas.read_csv, split it into train_ and test_datasets and then take 1 col of data as independent variable 'g' and dependent variable 'redz' from the training dataset.
I know x, y, etc. need to be np.ndarray, but one does not create ndarray directly, so I have...
x = np.array(train_dataset['g'])
y = np.array(train_dataset['redz'])
x_tst = np.array(test_dataset['g'])
where x, y, x_tst are all 1-dimensional - just like the TFP example.
The model itself runs
model = tf.keras.Sequential([
tf.keras.layers.Dense(1),
tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1)),
])
# Do inference.
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
model.fit(x, y, epochs=1, verbose=False);
(and when plotted gives the expected output for the google data - I don't get this far):
But, per the example when I try to "profit" by doing yhat = model(x_tst) I get the dimensions error given above.
What's wrong?
(If I try mode.predict I think I hit a known bug/gap in TFP; then it fails the assert)
Update - Explicit Reshape Resolves Issue
The hint from Frightera led to further investigation: x_tst had shape (2019,)
Reshaping by x_tst = x_tst.rehape(2019,1) resolved the issue. Is TF inconsistent in its requirements or is there some good reason that the explicit final dimension 1 was required? Who knows. At least predictions can be made now.
In this question Difference between numpy.array shape (R, 1) and (R,), the OP asked for the difference between (R,) and (R,1) but the answers given did not address this specific point.
Similarly in this question Difference between these array shapes in numpy
I believe the answer lies in the numpy glossary, where it says of (n,) that
A parenthesized number followed by a comma denotes a tuple with one
element. The trailing comma distinguishes a one-element tuple from a
parenthesized n.
Which, naturally, echoes the Python statements concerning tuples here
Thus an array of shape (R,) is a tuple describing an array as being 1D of a certain extent R, where the comma is appended to distinguish the tuple (R,) from the non-tuple (R).
However, for a 1D array, there is no sense of row or column ordering; (R,1) is R rows by 1 column, but (1, R) would be 1 row of R columns, and though it shouldn't matter to a 1D iterator either it does or the iterator doesn't correctly recognise ( ,) and thinks it is 2D. (i.e. I don't know the technical details of that part, but these seem to be the only options that account for the behaviour.)
This issue is unrelated to the indeterminacy of size that occurs in tensor definition in Tensorflow. In the context of Tensorflow, Tensors (arrays) may have indeterminate shapes, so that more data may be added along a certain axis as processing occurs, e.g. in batches, in which case the initial Tensor shape includes a leading None to indicate where array expansion is expected to occur. (See e.g. tensor's shape here)

Changing the Contents of a Tensor in TensorFlow

Before I continue, please excuse my ignorance. I have some experience programming before this, but my previous intuition has failed me presently.
Essentially, I need to expand a 1-D vector (size M x 1) of numbers ranging from 0...K, to a 2-D matrix (or Tensor, size M x K) where each row is a 1-D vector (size 1 x K), and each element is a 0 except for the index of the initial value being 1.
Yes, this is a multiclass classification problem for a ML class.
I had the idea of creating a zeros matrix of the correct shape, and then assigning the index of the element I need manually to a 1, but cannot seem to change the values of the already created Variable. I get the error:
TypeError: 'Tensor' object does not support item assignment
Can anyone assist with this? If you feel as though my way of going about creating this final Tensor could use a different approach, any advice would be appreciated.
In tensorflow, the function tf.one_hot() is what you seek. One hot encoding is the term describing the operation you are looking to implement. See https://www.tensorflow.org/api_docs/python/tf/one_hot .

TensorFlow shape checker

Unlike most programming languages, TensorFlow does not regard the shape of an array as part of the type. The downside of this is that, if you make a mistake and accidentally provide data of the wrong shape, it may silently give a wrong answer e.g. Slightly different shape converges to wrong number - why? which makes debugging difficult.
Does there exist a shape checker for TF? That is, a function or program that can analyze a graph (with sample feed_dict if need be) and raise the alarm if there is a shape mismatch?
Tensorflow does offer a shape checker mechanism which is technically the shape argument you should specify while declaring Tensorflow place holders. By default, tensorflow takes [None,None] for shape. But , for example if you do specify the shape while declaring your place holders, then it will raise shape error whenever user enters data of incorrect/conflicting shape. For example
lets say I declared a place holder named X and did specify its shape argument too:
X=tf.placeholder(dtype=tf.float32, shape=[None,256])
Now, this means that number of rows of X can vary but number of features will always be 256. And now , if I mistakenly feed data of shape lets say 1000 rows and 20 features, shape error will be raised.
Also, check this link :https://www.tensorflow.org/api_docs/python/tf/placeholder
Use:
print(str(tf.Shape(test_tensor))) # where test_tensor is
whatever your tensor's name is
Documentation available here: https://www.tensorflow.org/api_docs/python/tf/shape

taking the gradient in Tensorflow, tf.gradient

I am using this function of tensorflow to get my function jacobian. Came across two problems:
The tensorflow documentation is contradicted to itself in the following two paragraph if I am not mistaken:
gradients() adds ops to the graph to output the partial derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys.
Blockquote
Blockquote
Returns:
A list of sum(dy/dx) for each x in xs.
Blockquote
According to my test, it is, in fact, return a vector of len(ys) which is the sum(dy/dx) for each x in xs.
I do not understand why they designed it in a way that the return is the sum of the columns(or row, depending on how you define your Jacobian).
How can I really get the Jacobian?
4.In the loss, I need the partial derivative of my function with respect to input (x), but when I am optimizing with respect to the network weights, I define x as a placeholder whose value is fed later, and weights are variable, in this case, can I still define the symbolic derivative of function with respect to input (x)? and put it in the loss? ( which later when we optimize with respect to weights will bring second order derivative of the function.)
I think you are right and there is a typo there, it was probably meant to be "of length len(ys)".
For efficiency. I can't explain exactly the reasoning, but this seems to be a pretty fundamental characteristic of how TensorFlow handles automatic differentiation. See issue #675.
There is no straightforward way to get the Jacobian matrix in TensorFlow. Take a look at this answer and again issue #675. Basically, you need one call to tf.gradients per column/row.
Yes, of course. You can compute whatever gradients you want, there is no real difference between a placeholder and any other operation really. There are a few operations that do not have a gradient because it is not well defined or not implemented (in which case it will generally return 0), but that's all.

Force copy of tensor when enqueuing

first, I'm not sure if the title is very good, but it was the best I could come up with given my understanding of the situation.
The background is that I'm trying to understand how queues work in tensorflow and ran into the following issue which puzzled me.
I have a variable n, which I enqueue to a tf.FIFOQueue, and then I increment the variable. This is repeated several times, and one would expect a result similar to 0, 1, 2, ... However, when emptying the queue all values are the same.
More precisely, the code is as follows:
from __future__ import print_function
import tensorflow as tf
q = tf.FIFOQueue(10, tf.float32)
n = tf.Variable(0, trainable=False, dtype=tf.float32)
inc = n.assign(n+1)
enqueue = q.enqueue(n)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
sess.run(enqueue)
sess.run(inc)
sess.run(enqueue)
sess.run(inc)
sess.run(enqueue)
sess.run(inc)
print(sess.run(q.dequeue()))
print(sess.run(q.dequeue()))
print(sess.run(q.dequeue()))
Which I expect would print:
0.0
1.0
2.0
Instead I get the following result:
3.0
3.0
3.0
It seems like I'm pushing some pointer to n to the queue, instead of the actual value, which is what I want. However, I don't really have any actual understanding of tensorflow internals, so maybe something else is going on?
I tried changing
enqueue = q.enqueue(n)
to
enqueue = q.enqueue(tf.identity(n))
since answers to How can I copy a variable in tensorflow and In TensorFlow, what is tf.identity used for? gives me the impression that it might help, but it does not change the result. I also tried adding a tf.control_dependencies(), but again, all values are the same when dequeueing.
Edit: The output above is from running the code on a computer with a single CPU, when trying to see if there was some difference between different versions of tensorflow, I noticed if I run the code on a computer with CPU and GPU I get the "expected" result. Indeed, if I run with CUDA_VISIBLE_DEVICES="" I get the result above, and with CUDA_VISIBLE_DEVICES="0" I get the "expected" result.
To force a non-caching read you can do
q.enqueue(tf.add(q, 0))
This is what's currently done by the batch-normalization layer to force a copy.
Semantics of how variables get read vs. referenced are in the process of getting revamped so they are temporarily non-intuitive. In particular, I expected q.enqueue(v.read_value()) to force a non-caching read, but it doesn't fix your example on TF 0.12rc1
Using GPU machine puts variable on GPU, while Queue is CPU only, so enqueue op forces a GPU->CPU copy.
In case it helps, I've found that the other answers despite correct they do not work for all dtypes.
For example, this works fine with floats or ints but fails when n is a string tensor:
q.enqueue(tf.add(n, 0))
This one fails when the queue uses tuples with heterogeneous types (e.g., ints and floats):
q.enqueue_many([[n]])
So, if you see yourself caught in any of these situations try this instead:
q.enqueue(tf.add(n, tf.zeros_like(n)))
Or, to enqueue a tuple t:
q.enqueue([tf.add(n, tf.zeros_like(n)) for n in t])
That works even for string tensors and heterogeneous tuple types.
Hope it helps!
--
Update: it looks like tf.bool types do not work with tf.zeros_like(). For those, an explicit cast to an integer type might be needed.