How to convert RGB image to single channel image(but no grayscale) image - tensorflow

I am having a 3 channel image, in which each channel has some information encoded in the form of colors.
I want to convert it to a single channel image with all the information retained. When I convert it into 1 channel (using grayscale) I loose all that color information and get a tensor with zero values and visualizing this image show a total black image.
So, is there any way to change the 3 channel image to 1 channel image but not grayscale ?

You probably have to keep the 3 channels. 1 channel images do not have colors, since you need an additional dimension to represent them.
Why would you want to drop the channels and keep the color information at the same time?
In typical image processing with deep learning, the tensors have dimensions such as [Batch x Channel x Height x Width] (more frequent in pytorch) or [Batch x Height x Width x Channel] (more frequent in tensorflow).
What is the real problem with 3 channels?

Related

Yolo Training: multiple objects in one image

I have a set of training images that contain many small objects (10-20). The image resolution is high (9000x6000).
Is it better to split the image into the specific objects before running yolo training? Or just leave it as is.
Does yolo resize an entire image, or does it ‘extract’ the annotated object first before resizing?
If it is the former, I am concerned that the resolution will be bad. Imagine 20 objects in a 416x416 image.
Does yolo resize an entire image, or does it ‘extract’ the annotated
object first before resizing?
Yes, an entire image will be resized in case of Yolo and it does not extract annotated object before resizing.
Since your input images have very high resolution, what you can do is:
Yolo can handle object sizes of 25 x 25 effectively with network input layer size 608 x 608. So if your object sizes in original input image are greater than 250 x 250 you can train the images as they are (with 608 x 608 network size). In that case even when images are resized to network size, objects will be of size greater than 25x25. This should give you good accuracy.
(6000/600) * 25 = 250
If object sizes in original images are smaller than 200 x 200, split your input image into 8 smaller units/blocks, say blocks/tiles of 2250 x 1500. Train these blocks as individual images. Each bigger image (9000 x 6000) corresponds to 8 training images. Each image might contain zero to many objects. You can operate in sliding window method.
The method you choose for training should be used for inference as well.
For training on objects of all sizes use following models: [Use this if you use original image as it is used for training]
Yolov4-custom
Yolov3-SPP
Yolov3_5l
If all of the objects that you want to detect are of smaller size, then for effective detection use Yolov4 with following changes: [Use this if you split original image into 8 blocks]
Set layers = 23 instead of layers = 54
Set stride=4 instead of stride=2
Set stride=4 instead of stride=2
References:
Refer this relevant GitHub thread
darknet documentation

Does the Kernel slide over each time dimension individually in Conv1D convolutions?

I am dying to understand one question that I can not find any answer to:
When doing Conv1D on a multivariate time-series - is the KERNEL convolved across ALL dimensions or for each dimension individually? IS the size of the kernel [kernel_size x 1] or [kernel_size x num_dims]?
The thing is that I input a 800 by 10 time series into a Conv1D(filter =16,kernel_size=6)
And I get 800 by 16 as output, whereas I would expect to get 800 by 16 by 10 , because each time series dimension is convolved with the filter individually.
What is the case?
Edit: Toy example for discussion:
We have a 3 input channels, 800 time steps long. We have a kernel of 6 time steps width meaning the effective kernel dimensions are [3,1,6].
Each time step, 6 timesteps in each channel are convolved with the kernel. Then all the kernels elements are summed.
If this is correct - what is 1D about this convolution, if the image of the convolution operation is clearly 2-dimensional with [3 x 6] ?
When you convolve an "image" with multiple channels you sum across all the channels and then stack up filters you use to get a new "image" with (# of filters) channels. The thing that's a bit difficult for some people to understand is that the filter itself is actually (kernel_size x 1 x Number of channels). In other words your filters have depth.
So given that you're inputting this as a 800 x 1 "image" with 10 channels, you will end up with an 800 x 1 x 16 image, since you stack 16 filters. Of course the 1s aren't really important for conv1d and can be ignored, so tl;dr 800 x 6 -> 800 x 16 in this case.
Response to part 2:
We have a 3 input channels, 800 time steps long. We have a kernel of 6 time steps width meaning the effective kernel dimensions are [3,1,6].
This is essentially correct.
Each time step, 6 timesteps in each channel are convolved with the kernel. Then all the kernels elements are summed.
Yes, this is essentially correct. We end up with a slightly smaller image as we'll repeat this operation each time we slide the kernel over this timestep, giving us a 700 and something by 1 by 1 new image. We the repeat this operation # of filters times, and stack these on top of each other. This is still in the third dimension, so we end up with 7xx by 1 by (# of filters).
If this is correct - what is 1D about this convolution, if the image of the convolution operation is clearly 2-dimensional with [3 x 6] ?
For something to require Conv2d, it needs to have a 2nd dimension value greater than 1. For example, a color photograph might be 224 x 224 and have 3 color channels so it'd be 224 x 224 by 3.
Notably when we perform Conv2D, we also are sliding our kernel in an additional direction, for example, up and down. This is not required when you simply add more channels, since they're just added to the sum for that cell. Since we're only sliding on one axis in your example (time), we only need Conv1D.

RGB to gray filter doesn't preserve the shape

I have 209 cat/noncat images and I am looking to augment my dataset. In order to do so, this is the following code I am using to convert each NumPy array of RGB values to have a grey filter. The problem is I need their dimensions to be the same for my Neural Network to work, but they happen to have different dimensions.The code:
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])
Normal Image Dimension: (64, 64, 3)
After Applying the Filter:(64,64)
I know that the missing 3 is probably the RGB Value or something,but I cannot find a way to have a "dummy" third dimension that would not affect the actual image. Can someone provide an alternative to the rgb2gray function that maintains the dimension?
The whole point of applying that greyscale filter is to reduce the number of channels from 3 (i.e. R,G and B) down to 1 (i.e. grey).
If you really, really want to get a 3-channel image that looks just the same but takes 3x as much memory, just make all 3 channels equal:
grey = np.dstack((grey, grey, grey))
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [[0.2989, 0.5870, 0.1140],[0.2989, 0.5870, 0.1140],[0.2989, 0.5870, 0.1140]])

Do input size effect mobilenet-ssd in aspect-ratio and real anchor ratio? (Tensorflow API)

im recently using tensorflow api object detection. The default SSD-MobileNet v1 is using 300 x 300 images as input training image, but i gonna edit the image size as width and height in different values. For instance, 320 * 180. Are aspects ratio in .config represent the real ratio of the anchors width/height ratio or they are just for the square images?
You can change the "size" to any different value , the general guidance is preserve the aspect ratio of the original image while the size can be different value.
Aspect ratios represent the real ratio of anchors. You can use it for different input ratios, but you will get the best result if you use input ratio similar to square images.

How to adjust Pixel Spacing and Slice Thickness in DICOM data?

I have a large dicom mri dataset for several patients. For each patient, there is a folder including many 2d slices of .dcm files and the data of each patient has different sizes. For example:
patient1: PixelSpacing=0.8mm,0.8mm, SliceThickness=2mm, SpacingBetweenSlices=1mm, 400x400 pixels
patient2: PixelSpacing=0.625mm,0.625mm, SliceThickness=2.4mm, SpacingBetweenSlices=1mm, 512x512 pixels
So my question is how can I convert all of them into {Pixel Spacing} = 1mm,1mm and {Slice Thickness = 1mm}?
Thanks.
These are two different questions:
About harmonizing positions and pixel spacing, these links will be helpful:
Finding the coordinates (mm) of identical slice locations for two MR datasets acquired in the same scanning session
Interpolation between two images with different pixelsize
http://nipy.org/nibabel/dicom/dicom_orientation.html
Basically, you want to build your target volume and interpolate each of its pixels from the nearest neighbors in the source volumes.
About modifying the slice thickness: If you really want to modify the slice thickness rather than the slice distance, I do not see any chance to do this correctly with the source data you have. This is because the thickness says which width of the raw data was used to calculate the values for a slice in your stack (e.g. by averaging or calculating an integral). With a slice thickness of 2 or 2.4mm in the source volumes, you will not be able to reconstruct the gray values with a thickness of 1 mm. If your question was referring to slice distance rather than slice thickness, answer 1 applies.