Is there any method to shrink images without crop in TensorFlow? - tensorflow

I found many methods in tf.image to resize images, but almost all of them are crop or pad or interpolation. Can the methods with interpolation shrink images? I just want to shrink my images without crop.
Thanks a lot!

Well, seems you need tf.image.resize_images?
res = tf.image.resize_images(images, size, method=ResizeMethod.BILINEAR, align_corners=False)
default resize method is BILINEAR
val = np.random.rand(100, 70, 3)
x = tf.constant(val)
y = tf.image.resize_images(x, (30,30))
with tf.Session() as sess:
a = sess.run(y) # size (30, 30, 3)

At the end of the day, images are matrices/tensors. In order to shrink an image directly you will have to do some lossy compression. Namely, average or max pooling on individual groups of pixels (or matrix values) and them save them out as pre-processed images for your model. Try looking into the Tensorflow max-pool and average-pool functions:
tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC', name=None)
tf.nn.avg_pool(value, ksize, strides, padding, data_format='NHWC', name=None)
Depending on the application, you may just want to "shrink" images using a model that "creates features" such as a Convolutional Neural Network (CNN) that then you can use for your model. However, you still have to process your original sized images on every run.
Another advanced approach is to do some sort of decomposition, such as the SVD, to get a lower dimensional representation of your images. However, most methods are linear techniques and might not preserve everything exactly. Another method is to train an autoencoder and then save out the lower-dimensional results out to be the pre-processed data.
Hope this helps!

Related

Why we have target_size for DeepLab while CNN can accept any sizes?

I still have not understood a concept. One reason that we use fully convolutional layer at the end in a CNN network is to handle different images sizes during training. My question is that if this is the case why we always crop or squeeze images into squared sizes in the input section. Please do not say the question is repeated, we use squared images to make it easier, check pyramid pooling, and so on.
For example, Here's a link
DeepLab can accept any images with different sizes. But in its code, there is a target_size as (513). Now, if CNN can accept images with different sizes, why we need to use target_size. If this is for converting images into a standard format, why 513?
During training, we should specify batch size. What is our batch_size in this case: (5, None, None, None). Is it possible to have images with different sizes in a batch?
I read many posts and still, I am confused with these questions:
- How can we train a model on images with different sizes (imagine that sizes are standard)? I see some codes use a batch size of one. I think it is not a solution.
- Is there any snipped code that shows how can we define batches for a model like FCN to accept dataset with different sizes?
- In this paper: Here's a link my problem was explained but authors again resized images into squared format, if we can use batches comprises of images with different sizes why they proposed that idea of using squared images between 180 by 180 and 224 by 224.
Has DeepLab used this part: link to make images into a standard format? or for other reason?
width, height = image.size
resize_ratio = 1.0 * 513 / max(width, height)
target_size = (int(resize_ratio * width), int(resize_ratio * height))
I could not find the place of their code when they training the model on PASCAL dataset.
I expected to find a simple code for Keras or Tensorflow whereas it shows easily that we can apply a CNN model such as FCN or DeepLab for a dataset such as PASCAL VOC2012 (for Segmentation) with images of different sizes without any resizing or cropping. Still, I am looking.
Thank you for detail answers in advance. Please do not repeat answers like you can use batch size one, squared images are common and better, you can add black margins to the images, fully connected layer is the problem, you can use global max pooling, and so on. I am looking to find a code that works on images with different sizes.
I could not find the place of DeepLab model in TensorFlow GitHub where it accepts batches with different sizes?? here
Also in here FCN it is trained on COCO dataset with target_size of 320 by 320. Why? it should be any size for FCN.
Also, could one explain to me how can we have a batch of images with different sizes? Could we have an np array of different sized images? Batch = [5, none, none, 3] each of 5 with different sizes.
I also found another confusing part in semantic segmentation. Using Keras Augmentation we can not augment image with more than 4 channels. It means that using Keras augmentation, we can not train PASCAL dataset with 21 channels. ??

Resnet50 image preprocessing

I am using https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 to extract image feature vectors. However, I'm confused when it comes to how to preprocess the images prior to passing them through the module.
Based on the related Github explanation, it's said that the following should be done:
image_path = "path/to/the/jpg/image"
image_string = tf.read_file(image_path)
image = tf.image.decode_jpeg(image_string, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
# All other transformations (during training), in my case:
image = tf.random_crop(image, [224, 224, 3])
image = tf.image.random_flip_left_right(image)
# During testing:
image = tf.image.resize_image_with_crop_or_pad(image, 224, 224)
However, using the aforementioned transformation, the results I am getting suggest that something might be wrong. Moreover, the Resnet paper is saying that the images should be preprocessed by:
A 224×224 crop is randomly sampled from an image or its
horizontal flip, with the per-pixel mean subtracted...
which I can't quite understand what is means. Can someone point me in the right direction?
Looking forward to you answers!
The image modules on TensorFlow Hub all expect pixel values in range [0,1], like you get in your code snippet above. This makes it easy and safe to switch between modules.
Inside the module, the input values are scaled to the range that the network was trained for. The module https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 has been published from a TF-Slim checkpoint (see documentation), which uses yet another convention for normalizing inputs than He&al. -- but all this is taken care of.
To demystify the language in He&al.: it refers to the mean R, G and B values aggregated over all pixels of the dataset they studied, following the old wisdom that normalizing inputs to zero mean helps neural networks train better. However, later papers on image classification no longer expended this degree of attention to dataset-specific preprocessing.
The citation from the Resnet paper you mentioned is based on the following explanation from the Alexnet paper:
ImageNet consists of variable-resolution images, while our system requires a constant input dimensionality. Therefore, we down-sampled the images to a fixed resolution of256×256. Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and thencropped out the central 256×256patch from the resulting image. We did not pre-process the images in any other way, except for subtracting the mean activity over the training set from each pixel.
So in the Resnet paper, a similar process consist in taking a of 224x224 pixels part of the image (or of its horizontally flipped version) to ensure the network is given constant-sized images, and then center it by substracting the mean.

How to use the black/white image as the input to tensorflow

When implementing the reinforcement learning with tensorflow, the inputs are black/white images. Each pixel can be represented as a bit 1/0.
Can I give the data directly to tensorflow, with each bit as a feature? Or I had to expand the bits to bytes before sending to tensorflow? I'm new to tensorflow, so some code example would be nice.
Thanks
You can directly load the Image data as you would normally do, the Image being binary will have no effect other that the input channel width becoming 1 for the input.
Whenever you put an Image through a convnet, each output filter generally learns features for all the channels, so in case of a binary image, there is a separate kernel defined for each input channel / output channel combination (Since Only 1 input channel) in the first layer.
Each channel is defined by it's number of filters and there exists a 2D kernel for each input channel which averages over all filters, so you will have weights/parameters equal to input_channels * number_of_filters * filter_dims, here for the first layer input_channels becomes one.
Since you asked for some sample code.
Let your image be in a tensor X, simply use
X_out = tf.nn.conv2d(X, filters = 6, kernel_size = [height,width])
After that you can apply an activation, this will make your output image have 6 channels. If you face any problem or have some doubts, feel free to comment, for theoretical clarification, check out https://www.coursera.org/learn/convolutional-neural-networks/lecture/nsiuW/one-layer-of-a-convolutional-network
Edit
Since the question was about simple neural net, not conv net, here is the code for that,
X_train is the variable in which image is stored as (n_x,n_x) byte resolution, n_x is used later.
You will need to flatten the input.
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
This first flattens the image horizontally and then transposes it to arrange it vertically.
Then you will create placeholder tensor X as :
X = tf.placeholder(tf.bool,[n_x*n_x,None]) #Your Input tensor should have dimension same as your input layer.
let W, b be weight and bias respectively.
Z1 = tf.add(tf.matmul(W1,X),b1) #Linear Transformation step
A1 = tf.nn.relu(Z1) #Activation Step
And you keep on creating your graph, I think that answers your question, if not let me know.

MNIST-like issue. Convolutional Neural Network

This should be easy for some, but I'm a bit new to Tensorflow and all my research has brought me to multi-thousand line gits and I'm just curious if there is a simpler alternative for a beginner. I had an idea which inputs a 200x260 color image and outputs a one-hot vector between 1-10. I realized it is very similar to MNIST, but Tensorflow does not have any documentation on how the mnist library turns its images into a usable form. Does anybody have any ideas to turn a folder of about 200 images (yes, I know, small) into a usable form? I already have my one-hot vectors. Also, I set my placeholder shape as tf.placeholder(tf.float32,[None, 200, 260, 3]) Would that work? I would really prefer to maintain color as well. Thanks for any tips!
First, you can import all of your images using imread from skimage
For example:
my_image = skimage.io.imread('./path/myimage.png')
Then if all of them are in the size you desired (200x260) then you can normalize them by dividing all of them by 255 (normalized to a value between 0 and 1). If not, you can use resize from skimage, this will automatically resize and normalize the images for you.
For example
my_image = skimage.transform.resize(my_image, (200, 260))
To visualize it, you can use imshow from matplotlib.pyplot to plot the image.
For the convenient next_batch function that grabs next batch built in in Tensorflow, you can use the following code
i = 0
def next_batch(batch_size):
x = training_images[i:i + batch_size]
y = training_labels[i:i + batch_size]
i = (i + batch_size) % len(training_images)
return x, y
Then you can create your CNN and train the images. The placeholder you created for X looks right.
I also struggled with that in the beginning, but the best way that I know to get data into tensorflow would be to convert your images into the tfRecord format. Especially if you have a large dataset that doesn't fit into RAM. That way tensorflow can load in your data as needed (You need to provide input functions to convert your files back).
Although this might not be/certainly isnt the easiest way it would probably be the best in the long run in case you want to add more images.
To easiest way would just be to load your images using pillow or any other image library (I'm assuming your using tensorflow with python) and hand them over to tensorflow when running your session.

Neural Network with my own dataset

I have downloaded many face images from web. In order to learn Tensorflow I want to feed those images to a simple fully-connected neural network with a single hidden layer. I have found an example code in here.
Since I am a beginner, I don't know how to train, evaluate, and test the network with the downloaded images. The code owner used a '.mat' file and a .pkl file. I don't understand how he organized training and test set.
In order to run the code with my images;
Do I need to divide my images into training, test, and validation folders and turn each folder into a mat file? How am I going to provide labels for the training?
Besides, I don't understand why he used a '.pkl' file?
All in all, I would like to change this code so that I can find test, training , and validation set classification performance with my image dataset.
It might be an easy question, but it is important for me as it is a starting step. Thanks for your understanding.
First, you don't have to use .mat files nor pickles. Tensorflow expects numpy array.
For instance, let's say you have 70000 images of size 28x28 (=784 dimensions) belonging to 10 classes. Let's also assume that you'd like to train a simple feedforward neural network to classify the images.
The first step would be to split the images between train and test (and validation, but let's put this aside for the sake of simplicity). For the sake of the example, let's imagine that you chose randomly 60000 images for your training set and 10000 for your test set.
The second step would be to ensure that your data has the right format. Here, you'd like your training set to consist in one numpy array of shape (60000, 784) for the images and another one of shape (60000, 10) for the labels (if you use one-hot encoding to represent your classes). As for your test set, you should have an array of shape (10000, 784) for the images and one of shape (10000, 10) for the labels.
Once you have these big numpy arrays, you should define placeholders that will allow you to feed data to you network during training and evaluation.
images = tf.placeholder(tf.float32, shape=[None, 784])
labels = tf.placeholder(tf.int64, shape=[None, 10])
The None here means that you can feed a batch of any size, i.e. as many images as you want, as long as you numpy array is of shape (anything, 784).
The third step consists in defining your model as well as the loss function and the optimizer.
The fourth step consists in training your network by feeding it with random batches of data using the placeholders created above. As your network is training, you can periodically print its performance like the training loss/accuracy as well as the test loss/accuracy.
You can find a complete and very simple example here.