tf.reshape with the tensor size raises mismatched number of values - tensorflow

I have the following code:
shape = tf.shape(tensor, out_type=tf.int64, name='sparse_shape')
nelems = tf.size(tensor, out_type=tf.int64, name='num_elements')
indices = tf.transpose(
tf.unravel_index(tf.range(nelems, dtype=tf.int64), shape),
name='sparse_indices')
values = tf.reshape(tensor, [nelems], name='sparse_values')
This code snippet is simply transforming a dense tensor into a sparse tensor. However I found that the reshape op sometimes raises an error in runtime:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 906 values, but the requested shape has 1024
It's hard to write a simple demo to reproduce this bad case. So please understand that I cannot provide a reproducible demo.
But notice that my code is very simple. The reshape op is simply reshaping the tensor into a 1D tensor with the dimension size as the tensor's size, which is the number of elements of the tensor (illustrated in TensorFlow's doc). And in my mind, the number of elements here simply means the number of of values in the error message. Thus the above error should never appear.
I tried to use production of the shape as the target dimension size instead of tf.size but it was no use:
shape = tf.shape(tensor, out_type=tf.int64, name='sparse_shape')
# use production as the number of elements
nelems = tf.reduce_prod(shape, name='num_elements')
....
values = tf.reshape(tensor, [nelems], name='sparse_values')
So my question is, why is there a possibility that, for a certain tensor tensor, tf.size(tensor) or tf.shape(tensor) does not tell the actual number of elements of tensor? Can anyone remind if I have missed anything? Thanks.

I have figured out the problem on myself.
Problem:
In my project, the problem is that, tensor is produced by a third-party library. The library called tensor.set_shape([1024]) before returning tensor. While it can't ensure that there must be 1024 elements in tensor.
According to these codes, in TensorFlow's python frontend implementation, when the shape is fully determined, tf.shape and tf.size can go a fast way to get the result without really running the ShapeOp or SizeOp, and returning a constant tensor of the determined shape dimensions as the result.
As a result, in my case, the shape is obviously fully determined as [1024], so the code goes in the fast way and returned tf.constant([1024]). However the real shape of the Tensor object in the backend is [906].
Solution
According to the previously mentioned codes, we can see that tf.shape and tf.size actually calls shape_internal and size_internal defined in tensorflow.python.ops.array_ops. The latter functions takes one more argument optimize with default value True. And if optimize is false, the fast way will be ignored.
So the solution is to replace the tf.shape or tf.size with shape_internal or size_internal, and pass optimize=False.
# internal functions are not exposed by `tensorflow` root package
# so we have to import the `array_ops` package manualy
from tensorflow.python.ops import array_ops
....
shape = tf.shape(tensor, out_type=tf.int64, name='sparse_shape')
#nelems = tf.size(tensor, out_type=tf.int64, name='num_elements')
nelems = array_ops.size_internal(tensor, optimize=False, out_type=tf.int64, name='num_elements')
....
values = tf.reshape(tensor, [nelems], name='sparse_values')

Related

How to return a Tensor type or an IndexedSlices type via tf.cond()?

I want to use the origin sparse tensor (tf.IndexedSlices type) when pct < 0.75, otherwise use a dense tensor (tf.Tensor type, created by tf.convert_to_tensor). Here is the code
def fn1():
return tf.convert_to_tensor(sparse_gradient)
def fn2():
return sparse_gradient
final_gradient = tf.cond(tf.less(pct, tf.constant(value=0.75, dtype=tf.float64)), fn1, fn2)
However, tf.cond need fn1() and fn2() have same return type, so this code will throw an Error:
ValueError: The two structures don't have the same nested structure.
How can I fix this? The control flow is a part of the Calculate graph, so I have to use tf.cond. Is there any other way to work it out?
I found that it is impossible in static graph mode.(Eager mode may not have this problem) Because the type will be determined after graph's compiling. So we can not use different type by the runtime tensor value.
We can also find that in merge function, which is a base op of tensorflow's control flow:
def merge(inputs, name=None):
"""
...
This op handles both `Tensor`s and `IndexedSlices`. If inputs has a mix of
`Tensor`s and `IndexedSlices`, all inputs are converted to IndexedSlices
before merging.
...
"""

TFP Linear Regression yhat=model(x_tst) - doesn't work for other data

I cannot see the difference between what I am doing and the working Google TFP example, whose structure I am following. What am I doing wrong/should I be doing differently?
[Setup: Win 10 Home 64-bit 20H2, Python 3.7, TF2.4.1, TFP 0.12.2, running in Jupyter Lab]
I have been building a model step by step following the example of TFP Probabilistic Layers Regression. The Case 1 code runs fine, but my parallel model doesn't and I cannot see the difference that might cause this
yhat = model(x_tst)
to fail with message Input 0 of layer sequential_14 is incompatible with the layer: : expected min_ndim=2, found ndim=1. Full shape received: (2019,) (which is the correct 1D size of x_tst)
For comparison: Google's load_dataset function for the TFP example returns y, x, x_tst, which are all np.ndarray of size 150, whereas I read data from a csv file with pandas.read_csv, split it into train_ and test_datasets and then take 1 col of data as independent variable 'g' and dependent variable 'redz' from the training dataset.
I know x, y, etc. need to be np.ndarray, but one does not create ndarray directly, so I have...
x = np.array(train_dataset['g'])
y = np.array(train_dataset['redz'])
x_tst = np.array(test_dataset['g'])
where x, y, x_tst are all 1-dimensional - just like the TFP example.
The model itself runs
model = tf.keras.Sequential([
tf.keras.layers.Dense(1),
tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1)),
])
# Do inference.
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
model.fit(x, y, epochs=1, verbose=False);
(and when plotted gives the expected output for the google data - I don't get this far):
But, per the example when I try to "profit" by doing yhat = model(x_tst) I get the dimensions error given above.
What's wrong?
(If I try mode.predict I think I hit a known bug/gap in TFP; then it fails the assert)
Update - Explicit Reshape Resolves Issue
The hint from Frightera led to further investigation: x_tst had shape (2019,)
Reshaping by x_tst = x_tst.rehape(2019,1) resolved the issue. Is TF inconsistent in its requirements or is there some good reason that the explicit final dimension 1 was required? Who knows. At least predictions can be made now.
In this question Difference between numpy.array shape (R, 1) and (R,), the OP asked for the difference between (R,) and (R,1) but the answers given did not address this specific point.
Similarly in this question Difference between these array shapes in numpy
I believe the answer lies in the numpy glossary, where it says of (n,) that
A parenthesized number followed by a comma denotes a tuple with one
element. The trailing comma distinguishes a one-element tuple from a
parenthesized n.
Which, naturally, echoes the Python statements concerning tuples here
Thus an array of shape (R,) is a tuple describing an array as being 1D of a certain extent R, where the comma is appended to distinguish the tuple (R,) from the non-tuple (R).
However, for a 1D array, there is no sense of row or column ordering; (R,1) is R rows by 1 column, but (1, R) would be 1 row of R columns, and though it shouldn't matter to a 1D iterator either it does or the iterator doesn't correctly recognise ( ,) and thinks it is 2D. (i.e. I don't know the technical details of that part, but these seem to be the only options that account for the behaviour.)
This issue is unrelated to the indeterminacy of size that occurs in tensor definition in Tensorflow. In the context of Tensorflow, Tensors (arrays) may have indeterminate shapes, so that more data may be added along a certain axis as processing occurs, e.g. in batches, in which case the initial Tensor shape includes a leading None to indicate where array expansion is expected to occur. (See e.g. tensor's shape here)

How do I get and use value from a tensor within a TF 2.0 Dataset map step?

I'm using TensorFlow Alpha 2.0.
I have TFRecords files I'm reading from, each one holding a short video clip with each frame encoded as jpeg byte string to save space:
{
'numframes': tf.io.FixedLenFeature([], tf.int64),
'frames': tf.io.VarLenFeature(tf.string)
}
I have a map step in my tf.data.Dataset pipeline that successfully parses each example:
def parse_tfrecord(p):
return tf.io.parse_single_example(p, example_schema)
My next step is to read out the number of frames from numframes and run the tf.io.decode_jpeg function on each frame in frames.values[i] with i being from range(numframes):
def parse_jpegs(p):
numframes = p['numframes']
return tf.map_fn(tf.io.decode_jpeg, [p['frames'].values[i] for i in range(numframes)])
My dataset pipeline for completeness:
def dataset():
dataset = tf.data.Dataset.list_files("*.tfrecord")
dataset = tf.data.TFRecordDataset(dataset)
dataset = dataset.shuffle(1000).repeat()
dataset = dataset.map(parse_tfrecord)
dataset = dataset.map(parse_jpegs)
return dataset
If I exclude the dataset.map(parse_jpegs) line it all works alright, showing me something like {'frames': <tensorflow.python.framework.sparse_tensor.SparseTensor at 0x7f394c285518>, 'numframes': <tf.Tensor: id=2937, shape=(), dtype=int64, numpy=25>}
(Note that the numframes tensor includes a numpy value of 25. I can get that outside my dataset pipeline with the tensor.numpy() method)
Within that map function though, I can't call .numpy() to get the value out of the tensor, and when printing the tensor itself it hasn't been evaluated or something because there is no value shown yet.
What is the best way to parse all these frames within the dataset pipeline?
EDIT: Error message I'm getting is TypeError: 'Tensor' object cannot be interpreted as an integer in parse_jpegs when trying to get numframes. This makes sense to me why a tensor can't be interpreted as an int, but how can I get the value from that tensor to use to set the range?
The problem I'm running into comes down to the fact that each "frames" object has a different number of frames. If I can apply tf.io.decode_jpeg to each frame in that list without needing to record number of frames separately I would be fine with that, but I have "numframes" here so I know how many frames need to be decoded in my "frames" list.
EDIT: I'll heave the question up for anyone else who might find it helpful, but I ended up just returning the raw bytestrings and doing the decode_jpeg in a separate generator function outside the dataset API. It was much easier that way, even if it might be slower.
In my specific case, I ended up finding out that map_fn was trying to turn my input tensor into an output tensor of the same type. In this case, tf.io.decode_jpeg takes in a string (of bytes) and outputs a uint8 array, which was causing problems. Another argument to tf.map_fn(... output_type=tf.uint8) seems to have fixed it for me! Maybe not exactly as written since I continued tinkering with it since asking the question, but I got it working now.

What exactly qualifies as a 'Tensor' in TensorFlow?

I am new to TensorFlow and just went through the eager execution tutorial and came across the tf.decode_csv function. Not knowing about it, I read the documentation. https://www.tensorflow.org/api_docs/python/tf/decode_csv
I don't really understand it.
The documentation says 'records: A Tensor of type string.'
So, my question is: What qualifies as a 'Tensor'?
I tried the following code:
dec_res = tf.decode_csv('0.1,0.2,0.3', [[0.0], [0.0], [0.0]])
print(dec_res, type(dec_res))
l = [[1,2,3],[4,5,6],[7,8,9]]
r = tf.reshape(l, [9,-1])
print(l, type(l))
print(r, type(r))
So the list dec_res contains tf.tensor objects. That seems reasonable to me. But is an ordinary string also a 'Tensor' according to the documentation?
Then I tried something else with the tf.reshape function. In the documentation https://www.tensorflow.org/api_docs/python/tf/reshape it says that 'tensor: A Tensor.' So, l is supposed to be a tensor. But it is not of type tf.tensor but simply a python list. This is confusing.
Then the documentation says
Returns:
A Tensor. Has the same type as tensor.
But the type of l is list where the type of r is tensorflow.python.framework.ops.Tensor. So the types are not the same.
Then I thought that TensorFlow is very generous with things being a tensor. So I tried:
class car(object):
def __init__(self, color):
self.color = color
red_car = car('red')
#test_reshape = tf.reshape(red_car, [1, -1])
print(red_car.color) # to check, that red_car exists.
Now, the line in comments results in an error.
So, can anyone help me to find out, what qualifies as a 'Tensor'?
P.S.: I tried to read the source code of tf.reshape as given in the documentation
Defined in tensorflow/python/ops/gen_array_ops.py.
But this file does not exist in the Github repo. Does anyone know how to read it?
https://www.tensorflow.org/programmers_guide/tensors
TensorFlow, as the name indicates, is a framework to define and run
computations involving tensors. A tensor is a generalization of
vectors and matrices to potentially higher dimensions. Internally,
TensorFlow represents tensors as n-dimensional arrays of base
datatypes.
What you are observing commes from the fact that tensorflow operations (like reshape) can be built from various python types using the function tf.convert_to_tensor:
https://www.tensorflow.org/api_docs/python/tf/convert_to_tensor
All standard Python op constructors apply this function to each of
their Tensor-valued inputs, which allows those ops to accept numpy
arrays, Python lists, and scalars in addition to Tensor objects

Clarification on tf.Tensor.set_shape()

I have an image that is 478 x 717 x 3 = 1028178 pixels, with a rank of 1. I verified it by calling tf.shape and tf.rank.
When I call image.set_shape([478, 717, 3]), it throws the following error.
"Shapes %s and %s must have the same rank" % (self, other))
ValueError: Shapes (?,) and (478, 717, 3) must have the same rank
I tested again by first casting to 1028178, but the error still exists.
ValueError: Shapes (1028178,) and (478, 717, 3) must have the same rank
Well, that does make sense because one is of rank 1 and the other is of rank 3. However, why is it necessary to throw an error, as the total number of pixels still match.
I could of course use tf.reshape and it works, but I think that's not optimal.
As stated on the TensorFlow FAQ
What is the difference between x.set_shape() and x = tf.reshape(x)?
The tf.Tensor.set_shape() method updates the static shape of a Tensor
object, and it is typically used to provide additional shape
information when this cannot be inferred directly. It does not change
the dynamic shape of the tensor.
The tf.reshape() operation creates a new tensor with a different dynamic shape.
Creating a new tensor involves memory allocation and that could potentially be more costly when more training examples are involved. Is this by design, or am I missing something here?
As far as I know (and I wrote that code), there isn't a bug in Tensor.set_shape(). I think the misunderstanding stems from the confusing name of that method.
To elaborate on the FAQ entry you quoted, Tensor.set_shape() is a pure-Python function that improves the shape information for a given tf.Tensor object. By "improves", I mean "makes more specific".
Therefore, when you have a Tensor object t with shape (?,), that is a one-dimensional tensor of unknown length. You can call t.set_shape((1028178,)), and then t will have shape (1028178,) when you call t.get_shape(). This doesn't affect the underlying storage, or indeed anything on the backend: it merely means that subsequent shape inference using t can rely on the assertion that it is a vector of length 1028178.
If t has shape (?,), a call to t.set_shape((478, 717, 3)) will fail, because TensorFlow already knows that t is a vector, so it cannot have shape (478, 717, 3). If you want to make a new Tensor with that shape from the contents of t, you can use reshaped_t = tf.reshape(t, (478, 717, 3)). This creates a new tf.Tensor object in Python; the actual implementation of tf.reshape() does this using a shallow copy of the tensor buffer, so it is inexpensive in practice.
One analogy is that Tensor.set_shape() is like a run-time cast in an object-oriented language like Java. For example, if you have a pointer to an Object but know that, in fact, it is a String, you might do the cast (String) obj in order to pass obj to a method that expects a String argument. However, if you have a String s and try to cast it to a java.util.Vector, the compiler will give you an error, because these two types are unrelated.