I want to train YOLO on custom objects for detection gender from surv camera stream.
I see that default YOLO input layer is 416x416, should I stick to this or maybe it could be better have bigger size for input images for ex. 640x480 etc.
(Original image size could be from 2 to 4 MPx)
Related
I am doing a regression analysis to extract the parameter from the data which is a 2D-array using CNN. The array represents some sort of map of the underlying parameter I am trying to extract. I converted the array into jpgs,pngs and fed into 3-channel 2D CNN model. So far the CNN model is able to extract the underlying parameter from the images but the way images are generated is by converting the array into image using plt.imshow() function in matplotlib which gives color image(3-channel) with compression in the data.
The issue in this case is loss of information in compression of array size during conversion to RGB images. So, I tried building a CNN model where I directly input the raw array into the network without converting it into image but the regression is very poor whereas for the same datasets the regression is quite good if I feed converted jpg or png images.
I suspect that 3-channel in the image is responsible for CNN to perform better with images. Logically speaking, an array is converted into RGB mode from 0 to 255 levels for each channel, isn't it same as feature scaling the data but from 0 to 255 instead of 0 to 1.
Prediction for color images
Prediction for raw array
So, I tried scaling the data raw array from 0 to 1 and stacking up 3 times making it 3 channel raw array and feed the data into the network but the prediction was still quite poor.
If my logic is correct then I want to make use of 3-channel in CNN to extract the parameter from the raw array without loss of information. Is there anyway to do it? What else can I implement to get the similar prediction as from images but from the raw 2D-array instead.
I have made a Music Genre Classification Convolutional NN Model in Tensorflow and I face a issue:
I have 999 music Spectrogram images in my X and also have appropriate labels(y). The images are of size (128,1293).
When I check the shape of X it shows me (999,),which signifies number of images ; however I wish to expand X to include the dimensions of the image(1 color channel) to (999,128,1293,1) to feed into my CNN.
I read somewhere you can do it by cv2.resize() , but I am unsure of that method. Any kind of help would be appreciated.
I'm using pretrained VGG19 in my modified neural transfer code (Gatys algorithm), but my PC doesn't allow me to use input image in original size (original height is 2499 pix, but with 20GB RAM I can use it only 1000 pix maximum)
As I read, the solution for me will be decreasing batch_size. So, my question is - how can I modify VGG19 .h5 file to change batch_size inside it? Or maybe I can override batch_size of it in my code?
Assuming the pretrained model is defined on ImageNet, the maximum input data size for a single sample is 224*224.
If you try and pass a large input, it's possible your deep learning framework will reshape it into many images to be classified at once.
Resizing your input data to 224*224, you will run with a single image (batch size of 1).
You could make a custom implementation of your model to take larger input sizes. However sizing down to 224*224 generally gets good results, depending on the task.
I want to retrain quantized Mobilenet-SSD V2 model so i downloaded the unlabeled folder from COCO. This model requires input size of 300x300 but i succeeded retrainig it once on pictures of a different size and it worked (poorly, but worked).
Also, the code that uses the retrained model resizes the input from the camera to 500x500 and it works. So my question is, why is it written that the required input is 300x300 if it works with other sizes too? Do I need to resize all the dataset to 300x300 before I label them? I know it does convolution on the input so i don't think the size really matters (fix me if im wrong). As I know, the convolution occoure until we reach the end of the input.
Thanks for helping!
If I understand correctly you are using TF Object Detection API.
A given model, as mobilenet-v2-ssd, contains 3 main blocks:
[prepeocessing (normalizing and resizing] --> [Detector (backbone + detection heads)] --> [Postprocessing(bbox decoding+nms)]
When they talk about required input, it is for the detector.. The checkpoint itself contain the full pipeline, which means that the preprocessing unit will do the work for you - so there is no need to resize it to 300x300 beforehand.
if for some reason you intend to inject the input by yourself directly to the detector you have do the same preprocessing what was done in the training.
BTW:
in the config file of the training (https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config)
you can see the resize that was defined:
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
- the normalization is mobilenet normalization (changing the dynamic range of the input from [0,255] to [-1,1]
I want to retrain last inception or mobilenet layer so it would classify my own objects (about 5-15)
Also I want this to work with INPUT_SIZE == 64x64 or 32x32 (not 224 like for the default inception model)
I found some articles about retraining models:
https://hackernoon.com/creating-insanely-fast-image-classifiers-with-mobilenet-in-tensorflow-f030ce0a2991
https://medium.com/#daj/creating-an-image-classifier-on-android-using-tensorflow-part-3-215d61cb5fcd
For mobilenet they say
the input image size, either '224', '192', '160', or '128'
so I can't train with 64 or 32 (it's bad) https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py#L80
What about inception models? Can I somehow train models to work with small image input sizes (to get results faster)?
Objects which I want to classify from such small images will be already cropped from its parent image (for example from camera frames), it could be traffic/road signs cropped by fastest cascade classifiers (LBP/Haar) which were trained to detect everything that looks like sign's shapes/figures (triangle/rhombus,circle shapes)
So 64x64 images which fully include/contain only interested object should be enough for classification
No you still can, use the smallest option which would be 128. It will just scale your 32 or 64 image up, which is fine.
it's not possible for classificators
but it become possible for tensorflow object detection api (we can set any input size) https://github.com/tensorflow/models/tree/master/research/object_detection