Is broadcasting in Tensorflow a view or a copy? - tensorflow

Please clarify if broadcasting in Tensorflow will allocate a new memory buffer at broadcasting.
In the Tensorflow document Introduction to Tensors - Broadcasting, one sentence says (emphasis added):
Most of the time, broadcasting is both time and space efficient, as the broadcast operation never materializes the expanded tensors in memory
However in another sentence it says:
Unlike a mathematical op, for example, broadcast_to does nothing special to save memory. Here, you are materializing the tensor.
print(tf.broadcast_to(tf.constant([1, 2, 3]), [3, 3]))
tf.broadcast_to says it is a broadcast operation.
Broadcast an array for a compatible shape.
Then according to "the broadcast operation never materializes the expanded tensors in memory" statement above, it should not be materializing.
Please help clarify what the document is actually saying.

It says normally broadcast operation never materializes the expanded tensor in memory because of both time and space efficiency.
x = tf.constant([1, 2, 3])
y = tf.constant(2)
print(x * y)
tf.Tensor([2 4 6], shape=(3,), dtype=int32)
But if we want to look at how it looks after broadcasting then we use tf.broadcast_to which of course needs to materializing the tensor.
x = tf.constant([1, 2, 3, 4])
y = tf.broadcast_to(x, [3, 4])
print(y)
tf.Tensor(
[[1 2 3 4]
[1 2 3 4]
[1 2 3 4]], shape=(3, 4), dtype=int32)
According to the documentation
When doing broadcasted operations such as multiplying a tensor by a scalar, broadcasting (usually) confers some time or space benefit, as the broadcasted tensor is never materialized.
However, broadcast_to does not carry with it any such benefits. The newly-created tensor takes the full memory of the broadcasted shape. (In a graph context, broadcast_to might be fused to subsequent operation and then be optimized away, however.)

Related

How to convert python list of tf.Tensors (of variable length) to tf.Tensor of the tensors

I have a python list of tensorflow tensors. These tensors are of variable length. An example of one is:
tf.Tensor(
[-5.6968699e-04 -1.8224530e-03 1.9018153e-04 2.4998413e-05
5.7804082e-06 9.0757676e-04 1.7357236e-03 3.7930862e-04
-1.1174149e-03 9.7289361e-04 1.3030922e-03 4.9432577e-04
-7.0594731e-05 -1.9857733e-04 8.9881440e-05 3.3402088e-04
9.7116083e-04 5.0820946e-04 -2.0063705e-04 -3.1353189e-03
-2.9622321e-03 2.9554308e-04 -1.1153796e-03 9.8816957e-04
-4.6766747e-04 -2.7386995e-04 -5.6890573e-04 3.5687000e-03
-1.3535956e-03 4.5281884e-04 -3.5806431e-04 -8.6313725e-04
-6.7768141e-04 2.2069726e-05 -4.3477840e-04 -1.5338012e-03
-2.7985810e-03 -1.4244686e-03 6.5509509e-04 -1.2790617e-04
1.1837900e-03 -5.8377518e-05 -6.3234463e-04 1.7508399e-03
2.9831685e-04 -2.2373318e-04 -2.8749602e-04 1.7911429e-03
-3.7155824e-04 1.2438967e-03 8.0730570e-05 1.0137054e-03
-2.6455871e-04 -7.6767977e-04 -1.1590059e-03 9.9610852e-04
-1.9824551e-04 -2.7367761e-03 6.6492974e-04 -1.3874021e-03
2.5623629e-04 -1.7116729e-03 -1.4603567e-04 2.9647996e-04], shape=(64,), dtype=float32)
But not all of these tensors have the same dimensionality so I can't use tf.convert_to_tensor() without getting an error
'Shapes of all inputs must match: values[0].shape = [8,8,4,32] != values[1].shape = [32] [Op:Pack] name: packed'
How can I convert this list of tf.Tensors to a tf.Tensor of tf.Tensors.
The reason I want to do this is as follows:
In my code I am calling the Adam optimizer as follows:
self.dqn_architecture.optimizer.apply_gradients(zip(dqn_architecture_grads, traibnable_vars))
But I noticed the following showing up in my logs:
2023-02-17 20:05:44,776 5 out of the last 5 calls to <function _BaseOptimizer._update_step_xla at 0x7f55421ab6d0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating #tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your #tf.function outside of the loop. For (2), #tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2023-02-17 20:05:44,822 6 out of the last 6 calls to <function _BaseOptimizer._update_step_xla at 0x7f55421ab6d0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating #tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your #tf.function outside of the loop. For (2), #tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
On further investigation I found that I am passing python lists of tensors to the optimizer as opposed to tensors of tensors i.e. (3)
I’ve also noticed that there seems to be a memory leak as my RAM usage continues to grow the more I train the model. This makes sense because on stackoverflow I read that:
'Passing python scalars or lists as arguments to tf.function will always build a new graph. To avoid this, pass numeric arguments as Tensors whenever possible'
So, I believe the solution would be to pass a tensor of these tensors as opposed to a list. But, on trying to convert the lists to tensors using tf.convert_to_tensor(), I get the error:
'Shapes of all inputs must match: values[0].shape = [8,8,4,32] != values[1].shape = [32] [Op:Pack] name: packed'
because the tensors have varying dimensionality.
I tried using tf.ragged.constant too. But also got the error:
raise ValueError("all scalar values must have the same nesting depth")
Any help would be appreciated. Really need to get this sorted. :)
Actually this method tf.convert_to_tensor() is used when the shapes of all the matrices are the same. But in your case each tensor has a different shape. So, for that Tensorflow has introduced new kind of Tensors which enclose different shapes of Tensors as one Tensor, known as Ragged Tensors. Now, lets do the example for your case.
# create a list of variable-length tensors
tensors = [
tf.constant([1, 2, 3]),
tf.constant([4, 5]),
tf.constant([6, 7, 8, 9]),
]
#Now I have to stack the tensors
ragged_tensors = tf.ragged.stack(tensors)
<tf.RaggedTensor [[1, 2, 3], [4, 5], [6, 7, 8, 9]]>
Now, above did you see that the size of each tensor is different, but if you want this Ragged Tensor to become your normal Tensor then just use ragged_tensors.to_tensor() method, and your different sized Tensor will become a normal tensor.
ragged_tensors.to_tensor()
<tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[1, 2, 3, 0],
[4, 5, 0, 0],
[6, 7, 8, 9]], dtype=int32)>

Loop in tensorflow

I changed my question to explain my issue better:
I have a function: output_image = my_dunc(x) that x should be like (1, 4, 4, 1)
Please help me to fix the error in this part:
out = tf.Variable(tf.zeros([1, 4, 4, 3]))
index = tf.constant(0)
def condition(index):
return tf.less(index, tf.subtract(tf.shape(x)[3], 1))
def body(index):
out[:, :, :, index].assign(my_func(x[:, :, :, index]))
return tf.add(index, 1), out
out = tf.while_loop(condition, body, [index])
ValueError: The two structures don't have the same nested structure.
First structure: type=list str=[]
Second structure: type=list str=[<tf.Tensor 'while_10/Add_3:0' shape=() dtype=int32>, <tf.Variable 'Variable_2:0' shape=(1, 4, 4, 3) dtype=float32_ref>]
More specifically: The two structures don't have the same number of elements. First structure: type=list str=[<tf.Tensor 'while_10/Identity:0' shape=() dtype=int32>]. Second structure: type=list str=[<tf.Tensor 'while_10/Add_3:0' shape=() dtype=int32>, <tf.Variable 'Variable_2:0' shape=(1, 4, 4, 3) dtype=float32_ref>]
I tested my code and I can get result from out = my_func(x[:, :, :, i]) with different values for i and also while_loop works when I comment the line out[:, :, :, index].assign(my_func(x[:, :, :, index])). Something is wrong in that line.
I understand that there is no for-loop and so on and just while, why?
Control structures are hard to get right and hard to optimize. In your case, what if the next example in the same batch has 5 channels. You would need to run 5 loop iterations and either mess up or waste compute resources for the first example with only 3 channels.
You need to think what exactly you are trying to achieve. Commonly you would have different weights for each channel so the system can't just create them out of thin air, they need to be trained properly.
If you just want to apply the same logic 3 times just re-arrange your tensor to be (3, 4, 4, 1). You get 3 results and you do what you want with them.
Usually when you actually need for loops (when handling sequences) you pad the examples so that they all have the same length and generate a model where the loop in unrolled (you would have 3 different operations, one for each iteration of the loop). Look for dynamic_rnn or static_rnn (first one can handle different lengths for each batch).
I understand that there is no for-loop and so on and just while, why?
According to Implementation of Control Flow in TensorFlow
They should fit well with the dataflow model of TensorFlow, and should be amenable to parallel and distributed execution and automatic differentiation.
I think distributed data flow graphs and Automatic differentiation across devices could have been the constraints leading to the introduction of very few such loop primitives.
There are several diagrams in this doc. that distributed computing experts can understand better. A more thorough explanation is beyond me.

How to handle padding when using sequence_length parameter in TensorFlow dynamic_rnn

I'm trying to use the dynamic_rnn function in Tensorflow to speed up training. After doing some reading, my understanding is that one way to speed up training is to explicitly pass a value to the sequence_length parameter in this function. After a bit more reading, and finding this SO explanation, it seems like what I need to pass is a vector (maybe defined by a tf.placeholder) that contains the length of each sequence within a batch.
Here's where I'm confused: in order to take advantage of this, should I pad each of my batches to the longest-length sequence within the batch instead of the longest-length sequence in the training set? How does Tensorflow handle the remaining zeros/pad-tokens in any of the shorter sequences? Also, is the main advantage here really speed, or just extra assurance that we're masking pad-tokens during training? Any help/context would be appreciated.
should I pad each of my batches to the longest-length sequence within the batch instead of the longest-length sequence in the training set?
The sequences within a batch must be aligned, i.e., have to have the same length. So the general answer to your question is "yes". But different batches doesn't have to be of the same length, so you can stratify input sequences into groups that have roughly the same size and pad them accordingly. This technique is called bucketing and you can read about it in this tutorial.
How does Tensorflow handle the remaining zeros/pad-tokens in any of the shorter sequences?
Pretty much intuitive. tf.nn.dynamic_rnn returns two tensors: output and states. Suppose the actual sequence length is t and the padded sequence length is T.
Then the output will contain zeros after i > t and states will contain the t-th cell state, ignoring the states of trailing cells.
Here's an example:
import numpy as np
import tensorflow as tf
n_steps = 2
n_inputs = 3
n_neurons = 5
X = tf.placeholder(dtype=tf.float32, shape=[None, n_steps, n_inputs])
seq_length = tf.placeholder(tf.int32, [None])
basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X,
sequence_length=seq_length, dtype=tf.float32)
X_batch = np.array([
# t = 0 t = 1
[[0, 1, 2], [9, 8, 7]], # instance 0
[[3, 4, 5], [0, 0, 0]], # instance 1
[[6, 7, 8], [6, 5, 4]], # instance 2
])
seq_length_batch = np.array([2, 1, 2])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
outputs_val, states_val = sess.run([outputs, states], feed_dict={
X: X_batch,
seq_length: seq_length_batch
})
print(outputs_val)
print()
print(states_val)
Note that instance 1 is padded, so outputs_val[1,1] is a zero vector and states_val[1] == outputs_val[1,0]:
[[[ 0.76686853 0.8707901 -0.79509073 0.7430128 0.63775384]
[ 1. 0.7427926 -0.9452815 -0.93113345 -0.94975543]]
[[ 0.9998851 0.98436266 -0.9620067 0.61259484 0.43135557]
[ 0. 0. 0. 0. 0. ]]
[[ 0.99999994 0.9982034 -0.9934515 0.43735617 0.1671598 ]
[ 0.99999785 -0.5612586 -0.57177305 -0.9255771 -0.83750355]]]
[[ 1. 0.7427926 -0.9452815 -0.93113345 -0.94975543]
[ 0.9998851 0.98436266 -0.9620067 0.61259484 0.43135557]
[ 0.99999785 -0.5612586 -0.57177305 -0.9255771 -0.83750355]]
Also, is the main advantage here really speed, or just extra assurance that we're masking pad-tokens during training?
Of course, batch processing is more efficient, than feeding the sequences one by one. But the main advantage of specifying the length is that you get the reasonable state out of RNN, i.e., padded items don't affect the result tensor. You will get exactly the same result (and the same speed) if you don't set the length, but select the right states manually.

When would I want to set a stride in the batch or channel dimension for TensorFlow convolution?

Tensor flow implements a basic convolution operation with tf.nn.conv2d.
I am specifically interested in the "strides" parameter, which lets you set the stride of the convolution filter -- how far across the image you shift the filter each time.
The example given in one of the early tutorials, with an image stride of 1 in each direction, is
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
The strides array is explained more in the linked docs:
In detail, with the default NHWC format...
Must have strides[0] = strides[3] = 1. For the most common case of the same horizontal and vertices strides, strides = [1, stride, stride, 1].
Note the order of "strides" matches the order of inputs: [batch, height, width, channels] in the NHWC format.
Obviously having a stride of not 1 for batch and channels wouldn't make sense, right? (your filter should always go across every batch and every channel)
But why is it even an option to put something other than 1 in strides[0] and strides[3], then? (where it being an "option" is in regards to the fact that you could put something other than 1 in the python array you pass in, disregarding the documentation quote above)
Is there a situation where I would have a non-one stride for the batch or channels dimension, e.g.
tf.nn.conv2d(x, W, strides=[2, 1, 1, 2], padding='SAME')
If so, what would that example even mean in terms of the convolution operation?
There might be a situation where you send a video in chunks. That means your batch will be a sequence of frames. And assuming that close frames should be quite similar we can omit some of them by increasing batch stride. That as far as I understand. IDK about channel stride though

What are the parameters of TensorFlow's dynamic_rnn for this simple data set?

I want to train an RNN language model using TensorFlow.
My training data is a sequence of 5 tokens represented with integers like so
x = [0, 1, 2, 3, 4]
I want the unrolled length of the RNN to be 4, and the training batch size to be 2. (I chose these values in order to require padding.)
Each token has an embedding of length 3 like so
0 -> [0, 0 ,0]
1 -> [10, 10, 10]
2 -> [20, 20, 20]
3 -> [30, 30, 30]
4 -> [40, 40, 40]
What should I pass as parameters to tf.nn.dynamic_rnn?
This is mostly a repost of "How is the input tensor for TensorFlow's tf.nn.dynamic_rnn operator structured?".
That was helpfully answered by Eugene Brevdo. However he slightly misunderstood my question because I didn't have enough TensorFlow knowledge to ask it clearly. (Specifically he thought I meant the batch size to be 1.) Rather than risk additional confusion by editing the original question, I think it is clearest if I just rephrase it here.
I'm trying to figure this out for myself by writing an Example TensorFlow RNN Language Model.
most rnn cells require floating point inputs, so you should first do an embedding lookup on your integer tensor to go from the Categorical values to floating point vectors in your dictionary/embedding. i believe the function is tf.nn.embedding_lookup. the output of that should be a 3-tensor shaped batch x time x embedding_depth (in your case, embedding depth is 3)
you can feed embedding_lookup an integer tensor shaped batch_size x time.