Why do we take the transpose after flattening an image - numpy

I am currently trying to learning deeplearning and numpy. In an example given, after reshaping a test set of 60 128x128 images of carrots by using
`carrots_test.reshape(carrots_test.shape[60],-1)`
The example went on to then add a T to the end. I understand that this means a transpose but why would you transpose this new flattened image.
I understand what it is to flatten an image and why but can't intuitively see why we need to transpose (swap the rows and columns) it

There is no global reason to do it. Your application expects the shape to be (elements, images), not (images, elements). A reshape only adjusts the shape of the buffer. transpose adjusts the strides of the dimensions and compensates by rearranging the shape.

Related

PCA dimension reduction, scikit-learn

I have a dataset, the size is (21263, 81). I need to reduce the first 81 columns to 2 dimensions and plot it.
However, the last column is a continual variable (temperature), not a categorical variable as I normally see. When plotting the 2D figure, if I need to use the last column to control the point size, how do I write the python code?
Thank you.

Implement CVAE for a single image

I have a multi-dimensional, hyper-spectral image (channels, width, height = 15, 2500, 2500). I want to compress its 15 channel dimensions into 5 channels.So, the output would be (channels, width, height = 5, 2500, 2500). One simple way to do is to apply PCA. However, performance is not so good. Thus, I want to use Variational AutoEncoder(VAE).
When I saw the available solution in Tensorflow or keras library, it shows an example of clustering the whole images using Convolutional Variational AutoEncoder(CVAE).
https://www.tensorflow.org/tutorials/generative/cvae
https://keras.io/examples/generative/vae/
However, I have a single image. What is the best practice to implement CVAE? Is it by generating sample images by moving window approach?
One way of doing it would be to have a CVAE that takes as input (and output) values of all the spectral features for each of the spatial coordinates (the stacks circled in red in the picture). So, in the case of your image, you would have 2500*2500 = 6250000 input data samples, which are all vectors of length 15. And then the dimension of the middle layer would be a vector of length 5. And, instead of 2D convolutions that are normally used along the spatial domain of images, in this case it would make sense to use 1D convolution over the spectral domain (since the values of neighbouring wavelengths are also correlated). But I think using only fully-connected layers would also make sense.
As a disclaimer, I haven’t seen CVAEs used in this way before, but like this, you would also get many data samples, which is needed in order for the learning generalise well.
Another option would be indeed what you suggested -- to just generate the samples (patches) using a moving window (maybe with a stride that is the half size of the patch). Even though you wouldn't necessarily get enough data samples for the CVAE to generalise really well on all HSI images, I guess it doesn't matter (if it overfits), since you want to use it on that same image.

RGB to gray filter doesn't preserve the shape

I have 209 cat/noncat images and I am looking to augment my dataset. In order to do so, this is the following code I am using to convert each NumPy array of RGB values to have a grey filter. The problem is I need their dimensions to be the same for my Neural Network to work, but they happen to have different dimensions.The code:
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])
Normal Image Dimension: (64, 64, 3)
After Applying the Filter:(64,64)
I know that the missing 3 is probably the RGB Value or something,but I cannot find a way to have a "dummy" third dimension that would not affect the actual image. Can someone provide an alternative to the rgb2gray function that maintains the dimension?
The whole point of applying that greyscale filter is to reduce the number of channels from 3 (i.e. R,G and B) down to 1 (i.e. grey).
If you really, really want to get a 3-channel image that looks just the same but takes 3x as much memory, just make all 3 channels equal:
grey = np.dstack((grey, grey, grey))
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [[0.2989, 0.5870, 0.1140],[0.2989, 0.5870, 0.1140],[0.2989, 0.5870, 0.1140]])

Should I transpose a Tensor when feeding it into a CNN

I am using a custom dataset with images of different sizes in the Lab format (Lightness, a, b) which are feed into a CNN. The input layer has 3 in-channels and so my idea was to split all 3 channels (L, a, b) and feed those into the network. Next I was wondering if each tensor needs to be transposed? My doubt is that it would lose its dimensions which are variable from image to image and I would not be able to reconstruct the image in the end. Any thoughts or ideas how I should normalize the image?
You can normalise without the need of transposing the image or splitting it based on its channels
torchvision.transforms.Normalize(mean=[l_channel_mean, a_channel_mean , b_channel_mean], std= [l_channel_mean, a_channel_mean , b_channel_mean])
The only required transform is the one that converts the images to tensors :
torchvision.transforms.ToTensor()

Reshaping numpy 3D array

I have a dataset with dimensions: (32, 32, 73257) where 32x32 are pixels of a single image.
How do I reshape it to (73257, 1024) so that every image is unrolled in a row?
So far, I did:
self.train_data = self.train_data.reshape(n_training_examples, number_of_pixels*number_of_pixels)
and it looks like I got garbage instead of normal pictures. I am assuming that reshaping was performed across wrong dimension...??
As suggested in the comments, first get every image in a column, then transpose:
self.train_data = self.train_data.reshape(-1, n_training_examples).T
The memory layout of your array will not be changed by any of these operations, so two contiguous pixels of any image will lay 73257 bytes apart (assuming a uint8 image), which may not be the best of options if you want to process your data one image at a time. You will need to time and validate this, but creating a copy of the array may prove advantageous performance-wise:
self.train_data = self.train_data.reshape(-1, n_training_examples).T.copy()