I've recently used the mnist data set to build a model for predicting hand-written integers. I now want to use my own images. My images are 28x28 pixels (like the mnist set), but when I try to convert them into a tensor using tf.image.decode_png, I get a a 3D tensor [28, 28, 4]. From reading around, I believe the extra 4 is RGB related. How can I convert to [28, 28] ignoring any colour scale (if that is the actual problem, maybe I'm missing something entirely).
Thank you!
As you correctly said, you got a 3D tensor because your image have 3 RGB channels. You can use something like tf.image.rgb_to_grayscale to get what you want.
Related
This seems so basic, but for some reason, I can't find any clear documentation on it.
So lets say I know my ONNX model wants an input of shape [245, 245, 3]. The second argument in the constructor Ort::Value::CreateTensor wants a linear array of the data to fill the tensor. What is the order of the linear array?
For example, are the first three values in the linear array the BGR values for the 0-th pixel in the image, or are the first three values in the linear array the B-channel value of the first three pixels in the image? And as for ordering of pixels in the image: row-major?
The short answer is : ONNX only supports NCHW
As a reference, please check the section My converted TensorFlow model is slow - why? in onnxruntime.ai. This is the only "official" material that talking about the data format I found so far.
It's row-major. The format of inputs in ONNX is NCHW. C = number of channels. In this case C=3. The ordering of C (BGR or RGB) depends on the model. For e.g. the YOLO model takes an image 3(RGB) x 416px x 416px.
I still have not understood a concept. One reason that we use fully convolutional layer at the end in a CNN network is to handle different images sizes during training. My question is that if this is the case why we always crop or squeeze images into squared sizes in the input section. Please do not say the question is repeated, we use squared images to make it easier, check pyramid pooling, and so on.
For example, Here's a link
DeepLab can accept any images with different sizes. But in its code, there is a target_size as (513). Now, if CNN can accept images with different sizes, why we need to use target_size. If this is for converting images into a standard format, why 513?
During training, we should specify batch size. What is our batch_size in this case: (5, None, None, None). Is it possible to have images with different sizes in a batch?
I read many posts and still, I am confused with these questions:
- How can we train a model on images with different sizes (imagine that sizes are standard)? I see some codes use a batch size of one. I think it is not a solution.
- Is there any snipped code that shows how can we define batches for a model like FCN to accept dataset with different sizes?
- In this paper: Here's a link my problem was explained but authors again resized images into squared format, if we can use batches comprises of images with different sizes why they proposed that idea of using squared images between 180 by 180 and 224 by 224.
Has DeepLab used this part: link to make images into a standard format? or for other reason?
width, height = image.size
resize_ratio = 1.0 * 513 / max(width, height)
target_size = (int(resize_ratio * width), int(resize_ratio * height))
I could not find the place of their code when they training the model on PASCAL dataset.
I expected to find a simple code for Keras or Tensorflow whereas it shows easily that we can apply a CNN model such as FCN or DeepLab for a dataset such as PASCAL VOC2012 (for Segmentation) with images of different sizes without any resizing or cropping. Still, I am looking.
Thank you for detail answers in advance. Please do not repeat answers like you can use batch size one, squared images are common and better, you can add black margins to the images, fully connected layer is the problem, you can use global max pooling, and so on. I am looking to find a code that works on images with different sizes.
I could not find the place of DeepLab model in TensorFlow GitHub where it accepts batches with different sizes?? here
Also in here FCN it is trained on COCO dataset with target_size of 320 by 320. Why? it should be any size for FCN.
Also, could one explain to me how can we have a batch of images with different sizes? Could we have an np array of different sized images? Batch = [5, none, none, 3] each of 5 with different sizes.
I also found another confusing part in semantic segmentation. Using Keras Augmentation we can not augment image with more than 4 channels. It means that using Keras augmentation, we can not train PASCAL dataset with 21 channels. ??
I am using https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 to extract image feature vectors. However, I'm confused when it comes to how to preprocess the images prior to passing them through the module.
Based on the related Github explanation, it's said that the following should be done:
image_path = "path/to/the/jpg/image"
image_string = tf.read_file(image_path)
image = tf.image.decode_jpeg(image_string, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
# All other transformations (during training), in my case:
image = tf.random_crop(image, [224, 224, 3])
image = tf.image.random_flip_left_right(image)
# During testing:
image = tf.image.resize_image_with_crop_or_pad(image, 224, 224)
However, using the aforementioned transformation, the results I am getting suggest that something might be wrong. Moreover, the Resnet paper is saying that the images should be preprocessed by:
A 224×224 crop is randomly sampled from an image or its
horizontal flip, with the per-pixel mean subtracted...
which I can't quite understand what is means. Can someone point me in the right direction?
Looking forward to you answers!
The image modules on TensorFlow Hub all expect pixel values in range [0,1], like you get in your code snippet above. This makes it easy and safe to switch between modules.
Inside the module, the input values are scaled to the range that the network was trained for. The module https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 has been published from a TF-Slim checkpoint (see documentation), which uses yet another convention for normalizing inputs than He&al. -- but all this is taken care of.
To demystify the language in He&al.: it refers to the mean R, G and B values aggregated over all pixels of the dataset they studied, following the old wisdom that normalizing inputs to zero mean helps neural networks train better. However, later papers on image classification no longer expended this degree of attention to dataset-specific preprocessing.
The citation from the Resnet paper you mentioned is based on the following explanation from the Alexnet paper:
ImageNet consists of variable-resolution images, while our system requires a constant input dimensionality. Therefore, we down-sampled the images to a fixed resolution of256×256. Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and thencropped out the central 256×256patch from the resulting image. We did not pre-process the images in any other way, except for subtracting the mean activity over the training set from each pixel.
So in the Resnet paper, a similar process consist in taking a of 224x224 pixels part of the image (or of its horizontally flipped version) to ensure the network is given constant-sized images, and then center it by substracting the mean.
I have one image data tensor with shape of B*H*W*C and one position tensor with shape of B*H*W*2. The values in position tensor are pixel coordinates and I want to sample pixels in image data tensor according to these pixel coordinates. I have tried one way to do that like reshaping the tensor to one-dimension tensor, but I think it's really inconvenient. I wonder whether I could implement it by some more convenient approach like matrix mapping(e.g. remap in opencv).
I would first ask if you are sure the position matrix isn't redundant. If the position matrix entries simply correspond to the pixel locations in the image array, then for a given application however you access the position matrix could be used instead on the image data.
Perhaps as a starting point, running
sess = tf.Session()
np_img, np_pos = sess.run([tf_img, tf_pos], feed_dict={...})
will convert tensors to numpy arrays, which may make your operations easier.
Otherwise, a 1D-tensor isn't that bad and there are TF functions for reshaping easily.
In many documents, there are images about each filter like "Example".
I want to visual my convolution filters like "Example" image, but I don't know how can visualize it.
How can I visualize my convolution filters?
Think about each convolutional filter as x by x matrix, where x is the size of the filter. So your task is to put those matrices on a plot grid. I have made an example how to plot convolutional filters and output of convolutional layers using MNIST dataset, see conviz repository on github. Hope it helps you.
No those aren't the filters. You can read this paper which describes the procedures from converting the layer L's filters into these images.
In short words what it does is taking some filter, and uses a technique similar but not the same as back-propagation to convert the filter into an image.
The result of the 2d convolution is a tensor [batch, in_height, in_width, in_channels]. The image can be represented as a matrix [in_height, in_width, in_channels]. So all you need to do is to grab a few images from your batch, and add them to your summary with tf.summary.image().
For a tutorial how to do this, take a look at this answer.