How to convert python list of tf.Tensors (of variable length) to tf.Tensor of the tensors - tensorflow

I have a python list of tensorflow tensors. These tensors are of variable length. An example of one is:
tf.Tensor(
[-5.6968699e-04 -1.8224530e-03 1.9018153e-04 2.4998413e-05
5.7804082e-06 9.0757676e-04 1.7357236e-03 3.7930862e-04
-1.1174149e-03 9.7289361e-04 1.3030922e-03 4.9432577e-04
-7.0594731e-05 -1.9857733e-04 8.9881440e-05 3.3402088e-04
9.7116083e-04 5.0820946e-04 -2.0063705e-04 -3.1353189e-03
-2.9622321e-03 2.9554308e-04 -1.1153796e-03 9.8816957e-04
-4.6766747e-04 -2.7386995e-04 -5.6890573e-04 3.5687000e-03
-1.3535956e-03 4.5281884e-04 -3.5806431e-04 -8.6313725e-04
-6.7768141e-04 2.2069726e-05 -4.3477840e-04 -1.5338012e-03
-2.7985810e-03 -1.4244686e-03 6.5509509e-04 -1.2790617e-04
1.1837900e-03 -5.8377518e-05 -6.3234463e-04 1.7508399e-03
2.9831685e-04 -2.2373318e-04 -2.8749602e-04 1.7911429e-03
-3.7155824e-04 1.2438967e-03 8.0730570e-05 1.0137054e-03
-2.6455871e-04 -7.6767977e-04 -1.1590059e-03 9.9610852e-04
-1.9824551e-04 -2.7367761e-03 6.6492974e-04 -1.3874021e-03
2.5623629e-04 -1.7116729e-03 -1.4603567e-04 2.9647996e-04], shape=(64,), dtype=float32)
But not all of these tensors have the same dimensionality so I can't use tf.convert_to_tensor() without getting an error
'Shapes of all inputs must match: values[0].shape = [8,8,4,32] != values[1].shape = [32] [Op:Pack] name: packed'
How can I convert this list of tf.Tensors to a tf.Tensor of tf.Tensors.
The reason I want to do this is as follows:
In my code I am calling the Adam optimizer as follows:
self.dqn_architecture.optimizer.apply_gradients(zip(dqn_architecture_grads, traibnable_vars))
But I noticed the following showing up in my logs:
2023-02-17 20:05:44,776 5 out of the last 5 calls to <function _BaseOptimizer._update_step_xla at 0x7f55421ab6d0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating #tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your #tf.function outside of the loop. For (2), #tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2023-02-17 20:05:44,822 6 out of the last 6 calls to <function _BaseOptimizer._update_step_xla at 0x7f55421ab6d0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating #tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your #tf.function outside of the loop. For (2), #tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
On further investigation I found that I am passing python lists of tensors to the optimizer as opposed to tensors of tensors i.e. (3)
I’ve also noticed that there seems to be a memory leak as my RAM usage continues to grow the more I train the model. This makes sense because on stackoverflow I read that:
'Passing python scalars or lists as arguments to tf.function will always build a new graph. To avoid this, pass numeric arguments as Tensors whenever possible'
So, I believe the solution would be to pass a tensor of these tensors as opposed to a list. But, on trying to convert the lists to tensors using tf.convert_to_tensor(), I get the error:
'Shapes of all inputs must match: values[0].shape = [8,8,4,32] != values[1].shape = [32] [Op:Pack] name: packed'
because the tensors have varying dimensionality.
I tried using tf.ragged.constant too. But also got the error:
raise ValueError("all scalar values must have the same nesting depth")
Any help would be appreciated. Really need to get this sorted. :)

Actually this method tf.convert_to_tensor() is used when the shapes of all the matrices are the same. But in your case each tensor has a different shape. So, for that Tensorflow has introduced new kind of Tensors which enclose different shapes of Tensors as one Tensor, known as Ragged Tensors. Now, lets do the example for your case.
# create a list of variable-length tensors
tensors = [
tf.constant([1, 2, 3]),
tf.constant([4, 5]),
tf.constant([6, 7, 8, 9]),
]
#Now I have to stack the tensors
ragged_tensors = tf.ragged.stack(tensors)
<tf.RaggedTensor [[1, 2, 3], [4, 5], [6, 7, 8, 9]]>
Now, above did you see that the size of each tensor is different, but if you want this Ragged Tensor to become your normal Tensor then just use ragged_tensors.to_tensor() method, and your different sized Tensor will become a normal tensor.
ragged_tensors.to_tensor()
<tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[1, 2, 3, 0],
[4, 5, 0, 0],
[6, 7, 8, 9]], dtype=int32)>

Related

Is broadcasting in Tensorflow a view or a copy?

Please clarify if broadcasting in Tensorflow will allocate a new memory buffer at broadcasting.
In the Tensorflow document Introduction to Tensors - Broadcasting, one sentence says (emphasis added):
Most of the time, broadcasting is both time and space efficient, as the broadcast operation never materializes the expanded tensors in memory
However in another sentence it says:
Unlike a mathematical op, for example, broadcast_to does nothing special to save memory. Here, you are materializing the tensor.
print(tf.broadcast_to(tf.constant([1, 2, 3]), [3, 3]))
tf.broadcast_to says it is a broadcast operation.
Broadcast an array for a compatible shape.
Then according to "the broadcast operation never materializes the expanded tensors in memory" statement above, it should not be materializing.
Please help clarify what the document is actually saying.
It says normally broadcast operation never materializes the expanded tensor in memory because of both time and space efficiency.
x = tf.constant([1, 2, 3])
y = tf.constant(2)
print(x * y)
tf.Tensor([2 4 6], shape=(3,), dtype=int32)
But if we want to look at how it looks after broadcasting then we use tf.broadcast_to which of course needs to materializing the tensor.
x = tf.constant([1, 2, 3, 4])
y = tf.broadcast_to(x, [3, 4])
print(y)
tf.Tensor(
[[1 2 3 4]
[1 2 3 4]
[1 2 3 4]], shape=(3, 4), dtype=int32)
According to the documentation
When doing broadcasted operations such as multiplying a tensor by a scalar, broadcasting (usually) confers some time or space benefit, as the broadcasted tensor is never materialized.
However, broadcast_to does not carry with it any such benefits. The newly-created tensor takes the full memory of the broadcasted shape. (In a graph context, broadcast_to might be fused to subsequent operation and then be optimized away, however.)

Create a TF Dataset of SparseTensors with from_generator

I have a generator that yields tf.sparse.SparseTensors. I want to turn this into a Tensorflow Dataset, but am running into some issues. I am using TF2. First, unlike regular Tensors, you cannot simply pass them in (and providing the correct data types for output_types). For a sparse tensor of [1,0,0,0,5,0], the error looks like
tensorflow.python.framework.errors_impl.InvalidArgumentError: TypeError: `generator` yielded an element that could not be converted to the expected type. The expected type was int64, but the yielded element was SparseTensor(indices=tf.Tensor(
E [[0]
E [4]], shape=(2, 1), dtype=int64), values=tf.Tensor([1 5], shape=(2,), dtype=int64), dense_shape=tf.Tensor([6], shape=(1,), dtype=int64)).
After doing some looking around on the internet, I found this open issue and tried to do something similar https://github.com/tensorflow/tensorflow/issues/16689 - read the indices, values, and shape as separate tensors into a TF Dataset, and then mapping over the dataset to create the sparse tensor. This is not working as shown in some of the examples in the github issue - tf.sparse.SparseTensor(indices, values, shape) does not seem to accept indices and shape in the form of a tf.Tensor - it will happily take in a list or numpy array, but not a Tensor. Since map is not eager, I also cannot call .numpy() on the Tensor either. What is best way to get this to work? I see there is tf.py_function/tf.numpy_function which could help, but constructing the output type can be tricky (though not impossible) for my use case - the incoming data is not fixed and can have a mix of sparse and dense tensors.

Keras Backend: Difference between random_normal and random_normal_variable

My neural network has a custom layer, which takes an input vector x, generates a normally distributed tensor A and returns both A (used in subsequent layers) and the product Ax. Assuming I want to reuse the value stored in A at the output of the custom layer, in a second different layer, is there any subtle aspect that I need to factor while determining which Keras backend function (K.backend.random_normal or K.backend.random_normal_variable) I should use in order to generate A?
a) The backend function random_normal returns a tensor storing a different value following each call (see code snippet below). To me, this suggests that random_normal acts as a generator of normally distributed values. Does this mean that one should not use random_normal to generate a normally distributed tensor if they want to hold its value following calls?
b) The backend function random_normal_variable appears safer (see code snippet below) as it retains value across calls.
Is my conceptual understanding correct? Or am I missing something basic?
I am using Keras 2.1.2 and Tensorflow 1.4.0.
Experiment with random_normal (value changes across calls):
In [5]: A = K.random_normal(shape = (2,2), mean=0.0, stddev=0.5)
In [6]: K.get_value(A)
Out[6]: array([[ 0.4459489 , -0.82019573],
[-0.39853573, -0.33919844]], dtype=float32)
In [7]: K.get_value(A)
Out[7]: array([[-0.37467018, 0.42445764],
[-0.573843 , -0.3468301 ]], dtype=float32)
Experiment with random_normal_variable (value holds across calls):
In [9]: B = K.random_normal_variable(shape=(2,2), mean=0., scale=0.5)
In [10]: K.get_value(B)
Out[10]: array([[ 0.07700552, 0.28008622],
[-0.69484973, -1.32078779]], dtype=float32)
In [11]: K.get_value(B)
Out[11]: array([[ 0.07700552, 0.28008622],
[-0.69484973, -1.32078779]], dtype=float32)
From my understanding, this is due to the fact that random_normal_variable returns an instantiated Variable while random_normal returns a Tensor.
K.random_normal(shape=(2,2), mean=0.0, stddev=0.5)
<tf.Tensor 'random_normal:0' shape=(2, 2) dtype=float32>
K.random_normal_variable(shape=(2,2), mean=0.0, scale=0.5)
<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32_ref>
As for why the values vary for the Tensor and not for the Variable, I think the answer to this thread sums it up well:
Variable is basically a wrapper on Tensor that maintains state across multiple calls to run [...]
The answer also mentions that the variable needs to be initialized to evaluate it, which is the case here as you noticed (since you did not initialize the variable to evaluate it). In fact, the returned variable is already initialized thanks to a call to tensorflow.random_normal_initializer within the random_normal_variable function. Hope this clarifies why your code has this behaviour.

shape of a sparse tensor without invoking run()

sparse tensor.shape method returns a tensor object which seems to be of no use to extract the actual shape of the sparse tensor without resorting to run function.
To clarify what I mean, first consider a sparse tensor:
a = tf.SparseTensor(indices=[[0, 0, 0], [1, 2, 1]], values=[1.0+2j, 2.0], shape=[3, 4, 2])
a.shape returns:
tf.Tensor 'SparseTensor_1/shape:0' shape=(3,) dtype=int64
This is kind of no use.
Now, consider a dense tensor:
a = tf.constant(np.random.normal(0.0, 1.0, (4, 4)).astype(dtype=np.complex128))
a.get_shape() returns:
TensorShape([Dimension(4), Dimension(4)])
I can use this output and cast it into a list or tuple of integers without ever invoking run(). However, I cannot do the same for sparse tensor, unless I first convert sparse tensor to dense (which is not implemented for complex sparse tensor yet) and then call get_shape() method on it, but this is kind of redundant, defeats the purpose of using a sparse tensor in the first place and also leads to error down the road if the input sparse tensor is complex.
Is there a way to obtain the shape of a sparse tensor without invoking run() or converting it to a dense tensor first?
tf.SparseTensor is implemented as a triple of dense Tensors under the hood. The shape of a SparseTensor is just a Tensor; if you want to know its value, your best bet is to evaluate it using session.run:
print(sess.run(a.shape))
In general, Tensorflow does not promise to compute an exact shape even for dense tensors at graph construction time; shapes are best effort and may not even have a fixed value. So even for a dense Tensor you may have to evaluate the Tensor using run to get a precise shape.

Tensorflow reshape tensor gives None dimension

I have used the model described here on the 0.6.0 branch. The code can be found here. I have done some minor changes to the linked code.
In my code I create two models, one for training and one for validation, very similar as it is done in the Tensorflow Tutorial.
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel_User(is_training=True, config=config, name='Training model')
with tf.variable_scope("model", reuse=True, initializer=initializer):
mtest = PTBModel_User(is_training=False, config=config_valid, name='Validation model')
The first model, the one for training, seems to be created just fine, but the second, used for validation, does not. The output gets a None dimension! The row I'm refering to is on row 134 in the linked code:
output = tf.reshape(tf.concat(1, outputs), [-1, size])
I've added these lines right after the reshape of the output:
output_shape = output.get_shape()
print("Model num_steps:", num_steps)
print("Model batch_size:", batch_size)
print("Output dims", output_shape[0], output_shape[1])
and that gives me this:
Model num_steps: 400
Model batch_size: 1
Output dims Dimension(None) Dimension(650)
This problem only happens with the 'validation model', not with the 'training model'. For the 'training model' I get expected output:
Model num_steps: 400
Model batch_size: 2
Output dims Dimension(800) Dimension(650)
(Note that with the 'validation model' I use a batch_size=1 instead of batch_size=2 that I use for the training model)
From what I understand, using -1 as input to the reshape function, will figure the output shape out automagically! But then why do I get None? Nothing in my config fed to the model has a None value.
Thank you for all the help and tips!
TL;DR: A dimension being None simply means that shape inference could not determine an exact shape for the output tensor, at graph-building time. When you run the graph, the tensor will have the appropriate run-time shape.
If you're not interested in how shape inference works, you can stop reading now.
Shape inference applies local rules, based on a "shape function" that takes the shapes of the inputs to an operation and computes (possibly incomplete) shapes for the outputs of an operation. To figure out why tf.reshape() gives an incomplete shape, we have to look at its inputs, and work backwards:
The shape argument to tf.reshape() includes a [-1], which means "figure the output shape automagically" based on the shape of the tensor input.
The tensor input is the output of tf.concat() on the same line.
The inputs to tf.concat() are computed by a tf.mul() in BasicLSTMCell.__call__(). The tf.mul() op multiplies the result of a tf.tanh() and a tf.sigmoid() op.
The tf.tanh() op produces an output of size [?, hidden_size], and the tf.sigmoid() op produces an output of size [batch_size, hidden_size].
The tf.mul() op performs NumPy-style broadcasting. A dimension will only be broadcast if it has size 1. Consider three cases where we compute tf.mul(x, y):
If x has shape [1, 10], and y has shape [5, 10], then broadcasting will happen, and the output shape will be [5, 10].
If x has shape [1, 10], and y has shape [1, 10], then there will be no broadcasting, and the output shape will be [1, 10].
However, if x has shape [1, 10], and y has shape [?, 10], there is insufficient static information to tell whether broadcasting will happen (even though we happen to know that case 2 applies at runtime).
Therefore, when batch_size is 1, the tf.mul() op produces an output with the shape [?, hidden_size]; but when batch_size is greater than 1, the output shape is [batch_size, hidden_size].
Where shape inference breaks down, it can be appropriate to use the Tensor.set_shape() method to add information. This would potentially be useful in the BasicLSTMCell implementation, where we know more than it is possible to infer about the shapes of the outputs.