crop a tensor with a tensor inside a tf graph - tensorflow

I have a Tensor which needs to be cropped with the indices of a tensor.
ex - Input (None,5x5x10) tensor
BoundingBox (None, 2) -- tensor
I want to have an operation that does the following
Output (None,3x2x10) --tensor
if BoundingBox[0,0] = 3, BoundingBox[0,1] = 2
This is same as tf.image.crop_to_bounding_box but this function does not tensor type bounding box as input. Please help.

Unfortunately this isn't possible with 'standard' tensor operations because the dimensions of the output could vary.
Consider the example where bounding_box[0] == [3,2] and bounding_box[1] == [4,2] then your output shape needs to be (None, 3 or 4, 2, 10) and (of course) having a dimension 3 or 4 is not allowed for standard tensors.
TensorFlow does, however, have the concept of a Ragged Tensor which could conceivably be used to represent crops of different dimensions but this is an unusual case and is unlikely to be suited to most mainstream downstream training operations. Still, it could be worth reading up on this to see if it fits your use case: link

Related

Keras/TensorFlow: What is the order of the weight tensor dimensions of a convolutional layer?

In channels_last format, the shape of the data tensor is (batch_size, height, width, channels) and the shape of the weight tensor is apparently (see reference 2) (rows, cols, input_depth, output_depth).
In channels_first format, the shape of the data tensor is (batch_size, channels, height, width) and the shape of the weight tensor is what?
I've looked high and low for the answer to that question. When I run my code and use model.get_weights() to get the weight and bias tensors, it appears that the format of the weight tensors is the same in channels_first as in channels_last. Yet, when I output the weight tensors to a file and read them back into my C/C++ code which is hand-crafted and doesn't use TensorFlow, it doesn't appear to be working. The results are numerically nonsensical. Maybe there is some other problem, but I would like to obtain a definitive answer to this question.
BTW, the reason I'm switching between channels_last and channels_first is that I need to be able to develop my code on a CPU machine and then run large training sessions on a GPU machine.
Any help is appreciated.
References:
Data tensor shape is explained here.
Weight tensor shape is partially explained here.
You can find the answer in source code of TF/keras keras/keras/layers/convolutional/base_conv.py, where data_format=channels_first or data_format=channels_last is working when forward calculation, but in weight definition, the kernel shape is kept as:
kernel_shape = self.kernel_size + (input_channel // self.groups, self.filters)
So, it makes you find the weight format is same in channels_first or channels_last by model.get_weights()。
In detail, convolution op is ultimately performed by conv1d, conv2d, conv3d, etc., in gen_nn_ops which defined and conducted by C/C++. Each of these operation need receive data_format to adjust inputs but not kernels (weights/filters).

Strange output of Conv2D in tflite graph

I have a tflite graph fragment of which depicted on attached picture
I needed to debug it's behavior and already on the first step I got quite puzzling results.
When I feed zeros tensor as input after first Conv2D I expect to get a tensor which consists only of values from bias of Conv2D (since all kernel elements get multiplied by zeros), but instead I've got a tensor which consists of some random data, here is the code snippet:
def test_graph(path=PATH_DEFAULT):
interp = tf.lite.Interpreter(path)
interp.allocate_tensors()
input_details = interp.get_input_details()
in_idx = input_details[0]['index']
zeros = np.zeros(shape=(1, 256, 256, 3), dtype=np.float32)
interp.set_tensor(in_idx, zeros)
interp.invoke()
# index of output of first conv2d operator is 3 (see netron pic)
after_conv_2d = interp.get_tensor(3)
# shape of bias is just [count of output channels]
n, h, w, c = after_conv_2d.shape
# if we feed zeros as input, we can expect that the only values we get are the values of bias
# since all kernel elems in that case are multiplied by zeros
uniq_vals_cnt = len(np.unique(after_conv_2d))
assert uniq_vals_cnt <= c, f"There are {uniq_vals_cnt} in output, should be <= than {c}"
output:
AssertionError: There are 287928 in output, should be <= than 24
Can someone help me with my misunderstanding?
Seems my assumption that I can get any intermediate tensor from interpreter is wrong, we can do it only for outputs, even though interpreter do not raise error and even gives tensors of the right shape for indices related to non-output tesnors.
One way to debug such graph would be to make all tensors outputs, but it seems easiest way to do it would be converting tflite file to pb with toco and then convert pb back to tflite with new outputs specified. This way is not ideal though because toco support for tflite -> pb conversion was removed after 1.9 and using versions before that can break (in my case it breaks) on some graphs.
More of it is here:
tflite: get_tensor on non-output tensors gives random values

How conv2D function change the input layer

In my ResNet32 network coded using Tensorflow, the input size is 32 x 32 x 3 and the output of the
layer is 32 x 32 x 32. Why 32 channel is used ?
tf.contrib.layers.conv2d(
inputs,
**num_outputs**, /// how to determine the number of channel to be used in my layer?
kernel_size,
stride=1,
padding='SAME',
data_format=None,
rate=1,
activation_fn=tf.nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=tf.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None
)
Thank's in advance,
The 3 in input is the number to represent that the input image is RGB (color image), also known as color channels, and if it were a black and white image then it would have been 1 (monochrome image).
The 32 in output of this represents the number of neurons\number of features\number of channels you are using, so basically you are representing the image in 3 colors with 32 channels.
This helps in learning more complex and different set of features of the image. For example, it can make the network learn better edges.
By assigning stride=2 you can reduce the spatial size of input tensor so that the height and width of output tensor becomes half of that input tensor. That means, if your input tensor shape is (batch, 32, 32, 3) (3 is for RGB channel) to a Convolution layer having 32 kernels/filters with stride=2 then the shape of output tensor will be (batch, 16, 16, 32). Alternatively, Pooling is also widely used to reduce the output tensor size.
The ability of learning hierarchical representation by stacking conv layer is considered as the key to success of CNN. In CNN, as we go deeper the spatial size of the tensor reduces whereas the number of channel increases that helps to handle the variations in appearance of complex target object . This reduction of spatial size drastically decreases the required number of arithmetic operations and computation time with the motive of extracting prominent features contributing towards final output/decision. However, finding this optimal number of filter/kernel/output channel is time consuming and, therefore, people follow the proven earlier architectures e.g. VGG.

Does the shape of a tensor for an image affect the resulting output?

I am representing images of size 100px by 100px, so I can have the shape (None, 100, 100, 3) or shape (None, 10000, 3)
I can't find any clear explanation on Google, however, will the following two tensors result in similar results?
(None, 100, 100, 3)
(None, 10000, 3)
I assume either is sufficient as I would have thought the neural network will still learn just as well if the image is in a single row, your thoughts?
For the 1st shape : ( 100 , 100 , 3 )
This is a 3 dimensional tensor. If you are working with Dense layers, they require two dimensional input. Yes, 1D Convolutional layers exist but they are reserved for totally different use cases.
A Convolutional layer would pass a kernel through definite strides and will gather spatial information. This kernel will then get pooled so that the information is retained but with lesser dimensions.
Hence, the learning with this shape, would be far better as learning
of spatial features will take place. This is excellent for Image
Classification.
For 2nd shape : ( 10000 , 3 )
This is 2 dimensional tensor and would work with 1D Convolutional layers and Dense layers.
1D Convolutions pass the kernel through only one straight line ( axis ). Also the features of the image would get aligned in a straight ( all the columns would get lined up ). This will destroy the features of the image.
Hence, at last, an image is a 2D object a and must be kept in it's original dimension to facilitate learning. A 1D tensor has other uses like Text classification, human activity recognition etc.

feeding a convolutional neural network with variable sized inputs in tensorflow

I am trying to pass a list of 2d numpy arrays with different sizes to a convolutional neural network using feed_dict parameter.
x = tf.placeholder(tf.float32, [batch_size, None, None, None])
y = tf.placeholder(tf.float32, [batch_size, 1])
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
optimizer.run(feed_dict={x: batch[0], y: batch[1], keep_prob: 0.5})
and I am getting the following error :
ValueError: setting an array element with a sequence.
I understood that batch[0] has to contain arrays with the same size.
I am trying to find a way to apply the optimization using variable sized batch of arrays but all the suggested solutions ask to resize the arrays which is not possible in my case because these arrays are not images and contain DNA Fragments with different sizes (any modifications on any element of the array will cause a lost of important information)
Anyone has an idea ?
The matrix provided needs to have a consistent size across rows and columns. One row, or column, can not be a different size than any other.
Matrix #1 Matrix #2
1 2 3 1 2 3
None 4 5 6
None 7 8 9
No operations will work on Matrix #1, which is essentially what you have. If you want to feed in vairable size matrices (different sizes among matices, but size size with in rows and columns) this
may solve your problem
Args:
shape: The shape of the tensor to be fed (optional). If the shape is
not specified, you can feed a tensor of any shape.
Or you if you are looking for a sparse tensor (tf.sparse_placeholder() -- undefined elements are set to zero), this question may help.