RGB to gray filter doesn't preserve the shape - numpy

I have 209 cat/noncat images and I am looking to augment my dataset. In order to do so, this is the following code I am using to convert each NumPy array of RGB values to have a grey filter. The problem is I need their dimensions to be the same for my Neural Network to work, but they happen to have different dimensions.The code:
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])
Normal Image Dimension: (64, 64, 3)
After Applying the Filter:(64,64)
I know that the missing 3 is probably the RGB Value or something,but I cannot find a way to have a "dummy" third dimension that would not affect the actual image. Can someone provide an alternative to the rgb2gray function that maintains the dimension?

The whole point of applying that greyscale filter is to reduce the number of channels from 3 (i.e. R,G and B) down to 1 (i.e. grey).
If you really, really want to get a 3-channel image that looks just the same but takes 3x as much memory, just make all 3 channels equal:
grey = np.dstack((grey, grey, grey))

def rgb2gray(rgb):
return np.dot(rgb[...,:3], [[0.2989, 0.5870, 0.1140],[0.2989, 0.5870, 0.1140],[0.2989, 0.5870, 0.1140]])

Related

how to add random values (random number to specific spot ) to x-ray image with tensorflow

I want to predict disease and I want to try to make the image have some noise or disruption in specific spot or randomly spot is there any method or solution for it??
is there any way to add noise (random value) to image with tensorflow
I read the image and convert it to array and make a copy of it and then add to it some number is that right??
and i have noticed that when convert it the array became values of zeros and ones even it in rgb form.
i expect the some value in the array or the value in the image change to another values so when imshow (the image) notice some noise (different from guassian noise) so when the input to the model become different from the original image
I have trying this but operand didn't match between(224,224,3) and (224,224)
but when set colormode to grayscal the operand work but i didnt see that much of change in image.
,when trying replace img.size with img.height did'nt work either
img = tf.keras.preprocessing.image.load_img("/content/person1_bacteria_2.jpeg",color_mode="rgb",target_size=(256, 256))
nois_factor = 0.3
n = nois_factor * np.random.randn(*img.size)
noise_image = img + n
plt.imshow(noise_image)

Implement CVAE for a single image

I have a multi-dimensional, hyper-spectral image (channels, width, height = 15, 2500, 2500). I want to compress its 15 channel dimensions into 5 channels.So, the output would be (channels, width, height = 5, 2500, 2500). One simple way to do is to apply PCA. However, performance is not so good. Thus, I want to use Variational AutoEncoder(VAE).
When I saw the available solution in Tensorflow or keras library, it shows an example of clustering the whole images using Convolutional Variational AutoEncoder(CVAE).
https://www.tensorflow.org/tutorials/generative/cvae
https://keras.io/examples/generative/vae/
However, I have a single image. What is the best practice to implement CVAE? Is it by generating sample images by moving window approach?
One way of doing it would be to have a CVAE that takes as input (and output) values of all the spectral features for each of the spatial coordinates (the stacks circled in red in the picture). So, in the case of your image, you would have 2500*2500 = 6250000 input data samples, which are all vectors of length 15. And then the dimension of the middle layer would be a vector of length 5. And, instead of 2D convolutions that are normally used along the spatial domain of images, in this case it would make sense to use 1D convolution over the spectral domain (since the values of neighbouring wavelengths are also correlated). But I think using only fully-connected layers would also make sense.
As a disclaimer, I haven’t seen CVAEs used in this way before, but like this, you would also get many data samples, which is needed in order for the learning generalise well.
Another option would be indeed what you suggested -- to just generate the samples (patches) using a moving window (maybe with a stride that is the half size of the patch). Even though you wouldn't necessarily get enough data samples for the CVAE to generalise really well on all HSI images, I guess it doesn't matter (if it overfits), since you want to use it on that same image.

Why do we take the transpose after flattening an image

I am currently trying to learning deeplearning and numpy. In an example given, after reshaping a test set of 60 128x128 images of carrots by using
`carrots_test.reshape(carrots_test.shape[60],-1)`
The example went on to then add a T to the end. I understand that this means a transpose but why would you transpose this new flattened image.
I understand what it is to flatten an image and why but can't intuitively see why we need to transpose (swap the rows and columns) it
There is no global reason to do it. Your application expects the shape to be (elements, images), not (images, elements). A reshape only adjusts the shape of the buffer. transpose adjusts the strides of the dimensions and compensates by rearranging the shape.

conv2d on non-rectangular image in Tensorflow

I have dataset of images which are half black in a upper triangular fashion, i.e. all pixels below the main diagonal are black.
Is there a way in Tensorflow to give such an image to a conv2d layer and mask or limit the convolution to only the relevant pixels?
If the black translates to 0 then you don't need to do anything. The convolution will multiply the 0 by whatever weight it has so it's not going to contribute to the result. If it's not you can multiply the data with a binary mask to make them 0.
For all black pixels you will still get any bias term if you have any.
You could multiply the result with a binary mask to 0 out the areas you don't want populated. This way you can also decide to drop results that have too many black cells, like around the diagonal.
You can also write your own custom operation that does what you want. I would recommend against it because you only get a speedup of at most 2 (the other operations will lower it). You probably get more performance by running on a GPU.

Reshaping numpy 3D array

I have a dataset with dimensions: (32, 32, 73257) where 32x32 are pixels of a single image.
How do I reshape it to (73257, 1024) so that every image is unrolled in a row?
So far, I did:
self.train_data = self.train_data.reshape(n_training_examples, number_of_pixels*number_of_pixels)
and it looks like I got garbage instead of normal pictures. I am assuming that reshaping was performed across wrong dimension...??
As suggested in the comments, first get every image in a column, then transpose:
self.train_data = self.train_data.reshape(-1, n_training_examples).T
The memory layout of your array will not be changed by any of these operations, so two contiguous pixels of any image will lay 73257 bytes apart (assuming a uint8 image), which may not be the best of options if you want to process your data one image at a time. You will need to time and validate this, but creating a copy of the array may prove advantageous performance-wise:
self.train_data = self.train_data.reshape(-1, n_training_examples).T.copy()