Tensorflow error using while_loop: "List of Tensors when single Tensor expected" - tensorflow

I'm getting a TypeError("List of Tensors when single Tensor expected") when I run a Tensorflow while_loop. The error is from the third parameter, which should be a list of Tensors, according to the documentation. x, W, Win, Y, temp, and Wout are all previously declared as floats and arrays of floats. cond2 and test2 are functions I've written to be the condition and body. I use an almost identical call earlier in the program with no issues.
t=0
t,x,W,Win,Y,temp,Wout = sess.run(tf.while_loop(cond2, test2,
[t, tf.Variable(x), tf.constant(W),
tf.constant(Win), tf.Variable(Y),
tf.Variable(temp), tf.constant(Wout)],
shape_invariants=[tf.TensorShape(None),
tf.TensorShape(None),
tf.TensorShape(None),
tf.TensorShape(None),
tf.TensorShape(None),
tf.TensorShape(None),
tf.TensorShape(None)]))

I fixed the error by removing the tf.constant() for Wout, since Wout was already declared as a tensor.

This would be easier to diagnose with (a) your definitions for condition and body, and (b) the full error output from TensorFlow (it usually also outputs a full dump of the input tensors when issuing these errors.)
With that said, the source of the problem seems to be that TensorFlow is viewing your loop_vars list as a single Tensor, and/or your cond2 and test2 functions only accept a single argument each. If neither of these is true, then providing more detail would help answer the question (specifically the full error message and the definition for every value/tensor/function you're passing to tf.while_loop. I've found that the majority of while_loop errors can be fixed by paying attention to the tensors in the error output.
The while_loop can throw pretty confusing errors at times so I'd like to help; I'll check back and update/edit my answer if more info is provided.

Related

tf.reshape with the tensor size raises mismatched number of values

I have the following code:
shape = tf.shape(tensor, out_type=tf.int64, name='sparse_shape')
nelems = tf.size(tensor, out_type=tf.int64, name='num_elements')
indices = tf.transpose(
tf.unravel_index(tf.range(nelems, dtype=tf.int64), shape),
name='sparse_indices')
values = tf.reshape(tensor, [nelems], name='sparse_values')
This code snippet is simply transforming a dense tensor into a sparse tensor. However I found that the reshape op sometimes raises an error in runtime:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 906 values, but the requested shape has 1024
It's hard to write a simple demo to reproduce this bad case. So please understand that I cannot provide a reproducible demo.
But notice that my code is very simple. The reshape op is simply reshaping the tensor into a 1D tensor with the dimension size as the tensor's size, which is the number of elements of the tensor (illustrated in TensorFlow's doc). And in my mind, the number of elements here simply means the number of of values in the error message. Thus the above error should never appear.
I tried to use production of the shape as the target dimension size instead of tf.size but it was no use:
shape = tf.shape(tensor, out_type=tf.int64, name='sparse_shape')
# use production as the number of elements
nelems = tf.reduce_prod(shape, name='num_elements')
....
values = tf.reshape(tensor, [nelems], name='sparse_values')
So my question is, why is there a possibility that, for a certain tensor tensor, tf.size(tensor) or tf.shape(tensor) does not tell the actual number of elements of tensor? Can anyone remind if I have missed anything? Thanks.
I have figured out the problem on myself.
Problem:
In my project, the problem is that, tensor is produced by a third-party library. The library called tensor.set_shape([1024]) before returning tensor. While it can't ensure that there must be 1024 elements in tensor.
According to these codes, in TensorFlow's python frontend implementation, when the shape is fully determined, tf.shape and tf.size can go a fast way to get the result without really running the ShapeOp or SizeOp, and returning a constant tensor of the determined shape dimensions as the result.
As a result, in my case, the shape is obviously fully determined as [1024], so the code goes in the fast way and returned tf.constant([1024]). However the real shape of the Tensor object in the backend is [906].
Solution
According to the previously mentioned codes, we can see that tf.shape and tf.size actually calls shape_internal and size_internal defined in tensorflow.python.ops.array_ops. The latter functions takes one more argument optimize with default value True. And if optimize is false, the fast way will be ignored.
So the solution is to replace the tf.shape or tf.size with shape_internal or size_internal, and pass optimize=False.
# internal functions are not exposed by `tensorflow` root package
# so we have to import the `array_ops` package manualy
from tensorflow.python.ops import array_ops
....
shape = tf.shape(tensor, out_type=tf.int64, name='sparse_shape')
#nelems = tf.size(tensor, out_type=tf.int64, name='num_elements')
nelems = array_ops.size_internal(tensor, optimize=False, out_type=tf.int64, name='num_elements')
....
values = tf.reshape(tensor, [nelems], name='sparse_values')

TFP Linear Regression yhat=model(x_tst) - doesn't work for other data

I cannot see the difference between what I am doing and the working Google TFP example, whose structure I am following. What am I doing wrong/should I be doing differently?
[Setup: Win 10 Home 64-bit 20H2, Python 3.7, TF2.4.1, TFP 0.12.2, running in Jupyter Lab]
I have been building a model step by step following the example of TFP Probabilistic Layers Regression. The Case 1 code runs fine, but my parallel model doesn't and I cannot see the difference that might cause this
yhat = model(x_tst)
to fail with message Input 0 of layer sequential_14 is incompatible with the layer: : expected min_ndim=2, found ndim=1. Full shape received: (2019,) (which is the correct 1D size of x_tst)
For comparison: Google's load_dataset function for the TFP example returns y, x, x_tst, which are all np.ndarray of size 150, whereas I read data from a csv file with pandas.read_csv, split it into train_ and test_datasets and then take 1 col of data as independent variable 'g' and dependent variable 'redz' from the training dataset.
I know x, y, etc. need to be np.ndarray, but one does not create ndarray directly, so I have...
x = np.array(train_dataset['g'])
y = np.array(train_dataset['redz'])
x_tst = np.array(test_dataset['g'])
where x, y, x_tst are all 1-dimensional - just like the TFP example.
The model itself runs
model = tf.keras.Sequential([
tf.keras.layers.Dense(1),
tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1)),
])
# Do inference.
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
model.fit(x, y, epochs=1, verbose=False);
(and when plotted gives the expected output for the google data - I don't get this far):
But, per the example when I try to "profit" by doing yhat = model(x_tst) I get the dimensions error given above.
What's wrong?
(If I try mode.predict I think I hit a known bug/gap in TFP; then it fails the assert)
Update - Explicit Reshape Resolves Issue
The hint from Frightera led to further investigation: x_tst had shape (2019,)
Reshaping by x_tst = x_tst.rehape(2019,1) resolved the issue. Is TF inconsistent in its requirements or is there some good reason that the explicit final dimension 1 was required? Who knows. At least predictions can be made now.
In this question Difference between numpy.array shape (R, 1) and (R,), the OP asked for the difference between (R,) and (R,1) but the answers given did not address this specific point.
Similarly in this question Difference between these array shapes in numpy
I believe the answer lies in the numpy glossary, where it says of (n,) that
A parenthesized number followed by a comma denotes a tuple with one
element. The trailing comma distinguishes a one-element tuple from a
parenthesized n.
Which, naturally, echoes the Python statements concerning tuples here
Thus an array of shape (R,) is a tuple describing an array as being 1D of a certain extent R, where the comma is appended to distinguish the tuple (R,) from the non-tuple (R).
However, for a 1D array, there is no sense of row or column ordering; (R,1) is R rows by 1 column, but (1, R) would be 1 row of R columns, and though it shouldn't matter to a 1D iterator either it does or the iterator doesn't correctly recognise ( ,) and thinks it is 2D. (i.e. I don't know the technical details of that part, but these seem to be the only options that account for the behaviour.)
This issue is unrelated to the indeterminacy of size that occurs in tensor definition in Tensorflow. In the context of Tensorflow, Tensors (arrays) may have indeterminate shapes, so that more data may be added along a certain axis as processing occurs, e.g. in batches, in which case the initial Tensor shape includes a leading None to indicate where array expansion is expected to occur. (See e.g. tensor's shape here)

How to implement tf.nn.sigmoid_cross_entropy_with_logits

I am currently learning tensorflow, and I have run into an issue with
tf.nn.sigmoid_cross_entropy_with_logits(labels=y,logits=logits). The function description says that both labels and logits must be of the same type. I have the function below that I am using to classify MNIST images. The following are key section of my code
X=tf.placeholder(tf.float32,shape=(None,n_inputs),name="X")
y=tf.placeholder(tf.int32,shape=(None),name="y")
def neuron_layer(X,W,b,n_neurons,name,activation=None):
with tf.name_scope(name):
n_inputs=int(X.get_shape()[1])
stddev=2/np.sqrt(n_inputs)
z=tf.matmul(X,W)+b
if activation=="sigmoid":
return tf.math.sigmoid(z)
else:
return z
with tf.name_scope("dnn"):
hidden1=neuron_layer(X,W1,b1,n_hidden1,"hidden",activation="sigmoid")
logits=neuron_layer(hidden1,W2,b2,n_outputs,"outputs",activation="sigmoid")
with tf.name_scope("loss"):
xentropy=tf.nn.sigmoid_cross_entropy_with_logits(labels=y,logits=logits)
loss=tf.reduce_mean(xentropy,name="loss")
I get the error: input 'y' of 'Mul' Op has type int32 that does not match type float32 of argument 'x
if I change
y=tf.placeholder(tf.float32,shape=(None),name="y"). I get the error
Value passed to parameter 'targets' has DataType float32 not in list of allowed values: int32, int64. Yet logits can only be float32 or float64. Please help me fix the issue. Thanks
As mentioned in the comments, tf.nn.sigmoid_cross_entropy_with_logits is the wrong function. In your case you should use tf.nn.softmax_cross_entropy_with_logits instead (actually, that one yields a deprecation warning, so tf.nn.softmax_cross_entropy_with_logits_v2 is the correct one). Also note, as also mentioned in the comments, that the point of these two functions is that they have a sigmoid (or softmax, respectively) built in, so your model shouldn't have any activation function on the last layer.
Regarding the issue: I just tried it with tensorflow version 1.14.0. There, the issue still occurs if y has type int32. However, it works smoothly if both, y and labels, have type float32.
It's kind of inconsistent that tf.nn.sigmoid_cross_entropy_with_logits does not perform this cast itself, while tf.nn.softmax_cross_entropy_with_logits has no issue with y being int32.

What exactly qualifies as a 'Tensor' in TensorFlow?

I am new to TensorFlow and just went through the eager execution tutorial and came across the tf.decode_csv function. Not knowing about it, I read the documentation. https://www.tensorflow.org/api_docs/python/tf/decode_csv
I don't really understand it.
The documentation says 'records: A Tensor of type string.'
So, my question is: What qualifies as a 'Tensor'?
I tried the following code:
dec_res = tf.decode_csv('0.1,0.2,0.3', [[0.0], [0.0], [0.0]])
print(dec_res, type(dec_res))
l = [[1,2,3],[4,5,6],[7,8,9]]
r = tf.reshape(l, [9,-1])
print(l, type(l))
print(r, type(r))
So the list dec_res contains tf.tensor objects. That seems reasonable to me. But is an ordinary string also a 'Tensor' according to the documentation?
Then I tried something else with the tf.reshape function. In the documentation https://www.tensorflow.org/api_docs/python/tf/reshape it says that 'tensor: A Tensor.' So, l is supposed to be a tensor. But it is not of type tf.tensor but simply a python list. This is confusing.
Then the documentation says
Returns:
A Tensor. Has the same type as tensor.
But the type of l is list where the type of r is tensorflow.python.framework.ops.Tensor. So the types are not the same.
Then I thought that TensorFlow is very generous with things being a tensor. So I tried:
class car(object):
def __init__(self, color):
self.color = color
red_car = car('red')
#test_reshape = tf.reshape(red_car, [1, -1])
print(red_car.color) # to check, that red_car exists.
Now, the line in comments results in an error.
So, can anyone help me to find out, what qualifies as a 'Tensor'?
P.S.: I tried to read the source code of tf.reshape as given in the documentation
Defined in tensorflow/python/ops/gen_array_ops.py.
But this file does not exist in the Github repo. Does anyone know how to read it?
https://www.tensorflow.org/programmers_guide/tensors
TensorFlow, as the name indicates, is a framework to define and run
computations involving tensors. A tensor is a generalization of
vectors and matrices to potentially higher dimensions. Internally,
TensorFlow represents tensors as n-dimensional arrays of base
datatypes.
What you are observing commes from the fact that tensorflow operations (like reshape) can be built from various python types using the function tf.convert_to_tensor:
https://www.tensorflow.org/api_docs/python/tf/convert_to_tensor
All standard Python op constructors apply this function to each of
their Tensor-valued inputs, which allows those ops to accept numpy
arrays, Python lists, and scalars in addition to Tensor objects

Clarification on tf.Tensor.set_shape()

I have an image that is 478 x 717 x 3 = 1028178 pixels, with a rank of 1. I verified it by calling tf.shape and tf.rank.
When I call image.set_shape([478, 717, 3]), it throws the following error.
"Shapes %s and %s must have the same rank" % (self, other))
ValueError: Shapes (?,) and (478, 717, 3) must have the same rank
I tested again by first casting to 1028178, but the error still exists.
ValueError: Shapes (1028178,) and (478, 717, 3) must have the same rank
Well, that does make sense because one is of rank 1 and the other is of rank 3. However, why is it necessary to throw an error, as the total number of pixels still match.
I could of course use tf.reshape and it works, but I think that's not optimal.
As stated on the TensorFlow FAQ
What is the difference between x.set_shape() and x = tf.reshape(x)?
The tf.Tensor.set_shape() method updates the static shape of a Tensor
object, and it is typically used to provide additional shape
information when this cannot be inferred directly. It does not change
the dynamic shape of the tensor.
The tf.reshape() operation creates a new tensor with a different dynamic shape.
Creating a new tensor involves memory allocation and that could potentially be more costly when more training examples are involved. Is this by design, or am I missing something here?
As far as I know (and I wrote that code), there isn't a bug in Tensor.set_shape(). I think the misunderstanding stems from the confusing name of that method.
To elaborate on the FAQ entry you quoted, Tensor.set_shape() is a pure-Python function that improves the shape information for a given tf.Tensor object. By "improves", I mean "makes more specific".
Therefore, when you have a Tensor object t with shape (?,), that is a one-dimensional tensor of unknown length. You can call t.set_shape((1028178,)), and then t will have shape (1028178,) when you call t.get_shape(). This doesn't affect the underlying storage, or indeed anything on the backend: it merely means that subsequent shape inference using t can rely on the assertion that it is a vector of length 1028178.
If t has shape (?,), a call to t.set_shape((478, 717, 3)) will fail, because TensorFlow already knows that t is a vector, so it cannot have shape (478, 717, 3). If you want to make a new Tensor with that shape from the contents of t, you can use reshaped_t = tf.reshape(t, (478, 717, 3)). This creates a new tf.Tensor object in Python; the actual implementation of tf.reshape() does this using a shallow copy of the tensor buffer, so it is inexpensive in practice.
One analogy is that Tensor.set_shape() is like a run-time cast in an object-oriented language like Java. For example, if you have a pointer to an Object but know that, in fact, it is a String, you might do the cast (String) obj in order to pass obj to a method that expects a String argument. However, if you have a String s and try to cast it to a java.util.Vector, the compiler will give you an error, because these two types are unrelated.