combine dimensions without distortion pytorch images - numpy

I have pytorch tensor shape like [ shapes,images,H,W,C]
its like that " for each shape I have 12 Images" . I want to reshape it like [shapes*images,H,W,C] (stack all images together) while keeping images without distortion.
when I do
shape, images, H, W, c = img.shape
img = img.view((shape * images, W, H,c))
the image corrupted.
how to preserve the images?


Transform 3D Tensor to 4D

I am using the VGG16 Model, which expects a 4D Tensor as input. When I call, ytrain, ...) my xtrain is a list of 3D Tensor [size, size, features] - so in this case: [224,224,3]
What I want is 4D Tensors with [len(images), size, size, features]
How could I modify my code to get there?
I tried tf.expand_dims and tf.concant but it didn't work.
# Transforming my image to a 3D Tensor
image =
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, [IMG_SIZE, IMG_SIZE])
image = image / 255.0
Error msg after
Error when checking input: expected input_1 to have 4 dimensions, but got array with shape (224, 224, 3)
It looks like you are reading in only a single image and passing that. If that's the case, you can add a dimension of 1 to the first axis of the image. There's lots of ways to do that.
Using reshape:
image = image.reshape(1, 224, 224, 3)
Using some fancy numpy slicing notation to add an axis (personal favorite):
image = image[None, ...]
Using numpy.expand_dims() as explained in Abhijit's answer.
I imagine you want to be reading a bunch of images in though. Possibly an issue with your input process? Can you wrap your read in a loop and read multiple files? Something like:
images = []
for file in image_files:
image =
# ...
images = np.asarray(images)
numpy.expand_dims(image, axis=0)

Read different size and format images to form a queue in Tensorflow

I meet a problem on for Tensorflow. I want to read some bmp and jpeg images to form the queue in Tensorflow. And these images have different size.
The input is image path list and label list.
Currently I use " tf.train.slice_input_producer" (generate queue), "tf.image.decode_image" (read different format image), "tf.image.resize_images" (resize image to same size).
However, here I have some problems. The "tf.image.resize_images" needs image shape but there is no shape from "tf.image.decode_image". If I set fixed image shape manually, there will be error to read some images with different size.
Is there any better way for this issue (read different size and format images in Tensorflow)?
images = tf.convert_to_tensor(image_list)
labels = tf.convert_to_tensor(label_list)
input_queue = tf.train.slice_input_producer([images, labels]) #Slice_input producer shuffles the data by default.
image = tf.read_file(input_queue[0])
image = tf.image.decode_image(image, channels=3) # for different format
label = input_queue[1]
image.set_shape([640, 480, 3]) # if I dont set the shape, "tf.image.resize_images" cannot work, if I set it, it is fixed...
image = tf.image.resize_images(image, [160, 120])
image_batch, label_batch = tf.train.batch([image , label], batch_size=batch_size)
return image_batch, label_batch

Tensorflow Object Detection API 1-channel image

Is there any way to use pre-trained models in Object Detection API of Tensorflow, which trained for RGB images, for single channel grayscale images(depth) ?
I tried the following approach to perform object detection on Grayscale (1 Channel images) using a pre-trained model (faster_rcnn_resnet101_coco_11_06_2017) in Tensorflow. It did work for me.
The model was trained on RGB Images, So I just had to modify certain code in object_detection_tutorial.ipynb, available in the Tensorflow Repo.
First Change:
Note that exisitng code in the ipynb was written for 3 Channel Images, So change the load_image_into_numpy array function as shown
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
channel_dict = {'L':1, 'RGB':3} # 'L' for Grayscale, 'RGB' : for 3 channel images
return np.array(image.getdata()).reshape(
(im_height, im_width, channel_dict[image.mode])).astype(np.uint8)
Second Change: Grayscale images have only data in 1 channel. To perform object detection we need 3 channels(the inference code was written for 3 channels)
This can be achieved in two ways.
a) Duplicate the single channel data into two more channels
b) Fill the other two channels with Zeros.
Both of them will work, I used the first method
In the ipynb, go the section where you read the images and convert them into numpy arrays (the forloop at the end of the ipynb).
Change the code From:
for image_path in TEST_IMAGE_PATHS:
image =
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
To this:
for image_path in TEST_IMAGE_PATHS:
image =
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
if image_np.shape[2] != 3:
image_np = np.broadcast_to(image_np, (image_np.shape[0], image_np.shape[1], 3)).copy() # Duplicating the Content
## adding Zeros to other Channels
## This adds Red Color stuff in background -- not recommended
# z = np.zeros(image_np.shape[:-1] + (2,), dtype=image_np.dtype)
# image_np = np.concatenate((image_np, z), axis=-1)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
That's it, Run the file and you should see the results.
These are my results

regarding the image scaling operations for running vgg model

While reading the Tensorflow implmentation of VGG model, I noticed that author performs some scaling operation for the input RGB images, such as following. I have two questions: what does VGG_MEAN
mean and how to get that setup? Secondly, why we need to subtract these mean values to get bgr
VGG_MEAN = [103.939, 116.779, 123.68]
ef build(self, rgb):
load variable from npy to build the VGG
:param rgb: rgb image [batch, height, width, 3] values scaled [0, 1]
start_time = time.time()
print("build model started")
rgb_scaled = rgb * 255.0
# Convert RGB to BGR
red, green, blue = tf.split(3, 3, rgb_scaled)
assert red.get_shape().as_list()[1:] == [224, 224, 1]
assert green.get_shape().as_list()[1:] == [224, 224, 1]
assert blue.get_shape().as_list()[1:] == [224, 224, 1]
bgr = tf.concat(3, [
blue - VGG_MEAN[0],
green - VGG_MEAN[1],
red - VGG_MEAN[2],
assert bgr.get_shape().as_list()[1:] == [224, 224, 3]
First off: the opencv code you'd use to convert RGB to BGR is:
from cv2 import cvtColor, COLOR_RGB2BGR
img = cvtColor(img, COLOR_RGB2BGR)
In your code, the code that does this is:
bgr = tf.concat(3, [
blue - VGG_MEAN[0],
green - VGG_MEAN[1],
red - VGG_MEAN[2],
Images aren't [Height x Width] matrices, they're [H x W x C] cubes, where C is the color channel. In RGB to BGR, you're swapping the first and third channels.
Second: you don't subtract the mean to get BGR, you do this to normalize color channel values to center around the means -- so values will be in the range of, say, [-125, 130], rather than the range of [0, 255].
See: Subtract mean from image
I wrote a python script to get the BGR channel means over all images in a directory, which might be useful to you:
mean value is from computing the average of each layer in the training data.
rgb -> bgr is for opencv issue.
The model is ported from Caffe, which I believe relies on OpenCV functionalities and uses the OpenCV convention of BGR channels.

TensorFlow MNIST example feeding own images

I am trying to learn TensorFlow, so I was trying to understand their example with smaller dimensions. Suppose I have image1, image2, image3 three 28x28 matrices which hold grayscale values (0..255). image1 is the training image, image2 is the validation image, and image3 is the test image. I was trying to understand how I can feed my own images into the MNIST example they have here.
I am particularly interested in replacing the following line with my own imageset:
X, Y, testX, testY = mnist.load_data(one_hot=True)
Your help is much appreciated.
Suppose your image is a numpy array, of shape [1, 28, 28, 1].
You can just feed this numpy array to the node X or textX. Even though X is not a placeholder, you can provide its value to TensorFlow.
X_value = ... # numpy array
# ... same for Y_value, testX_value, testY_value
feed_dict = {X: X_value, Y: Y_value, testX: testX_value, testY: testY_value}, feed_dict=feed_dict)
mnist.load_data(one_hot=True) is nothing but some preprossesing of the data. If you have some images in hand, you can just make them an ndarray and feed into the graph. For examples if you have a node named images, you can feed the images using feed_dict = {images: some_image}.