Is it possible to feed multiple images input to convolutional neural network - tensorflow

I have a dataset of many images where images have 5 magnifications (x10, x20, x30, x40, x50) to the same class but they are not a sequence data, and all images are in RGB mode and with size 512x512 and I want to give this 5 images as an input to the CNN, and I don't know how.
Also, there is another problem which is once the model was well trained on the 5 image pipeline, is it okay or will it work, when I have only one image (one magnification, x10 as an example)?

You have asked two questions.
For the 1st one, there are two ways to do it. 1- you can design the model in a way that the input size is 5×512×512×3, and you go to train the model.
For your 2nd question, you need to design your model in a way to handle a feature absence or missing features. For a complicated one, that I can think about, you can design the model in this way,
You have 5 inputs, per image, and each image goes through one or more CNN, and after one or a few layers you merge those together.
For each input, you can consider an additional feature, a boolean to indicate if this current image should be considered in training or not ( is absent or present). During your training, you should make a combination of all 5, and also consider the absence of some, so that your model learns to handle the absence of one or more images out of 5 in the input.
I hope I was clear enough and it helps.
Good luck.

Related

Misclassification using VGG16 Net

I use Faster RCNN to classify 33 items. But most of them are misclassified among each other. All items are snack packets and sweet packets like in the link below.
https://redmart.com/product/lays-salt-and-vinegar-potato-chips
https://www.google.com/search?q=ice+breakers&source=lnms&tbm=isch&sa=X&ved=0ahUKEwj5qqXMofHfAhUQY48KHbIgCO8Q_AUIDigB&biw=1855&bih=953#imgrc=TVDtryRBYCPlnM:
https://www.google.com/search?biw=1855&bih=953&tbm=isch&sa=1&ei=S5g-XPatEMTVvATZgLiwDw&q=disney+frozen+egg&oq=disney+frozen+egg&gs_l=img.3..0.6353.6886..7047...0.0..0.43.116.3......1....1..gws-wiz-img.OSreIYZziXU#imgrc=TQVYPtSi--E7eM:
So color and shape are similar.
What could be the best way to solve this misclassification problem?
Fine tuning is a way to use features, learned on some big dataset, in our problem, which means instead of training the complete network again, we freeze out weights of the lower layer of the network and add few layers at the end of network, as per requirement. Now we train it on our data-set again. So the advantage here is that, we don't need to train all-millions of parameters, but few only. Another is that we don't need large-dataset to fine-tune.
More you can find here. This is another-useful resource, where author has explained this in more detail(with code).
Note: This is also known as transfer-learning.

CNN - Choose between two classes shown in 1 image

I am currently running a pretty vanilla CNN, but I have some images that contain "multiple classes", it is sorting images of digits into different classes (not ocr), however i also have a class called "bad_captures" which is for images that don't have digits. However some images have some features that the bad_captures have, but at the same time they contain digits. Currently the model is predicting these images as bad, but I need it to focus on the digits. Any advice on how to this?
You cannot explicitly tell model which features to weigh more. What you can do however is to train some simple classifier which will tell you if the image is bad_caputre or not. Then you can put that on top of your model and if the image is not bad capture, then process it. That way your CNN model will not have to worry about picking classes, it will focus only on getting the right digit from the image.
Other way would be to figure out which of those features are confusing the model and remove them explicitly in the preprocessing. Also, you can try different things in preprocessing in general to see if that has any effect on the model performance.

convolutional neural network image recognition

Currently I am working on a project with convolutional network using tensorflow and I have set up the network and now i need to train it. I don't have a clue of how could the image should be for training. Like how much of % of the image the object is training on.
It's a cigarette that I have to detect and I have tried around 280 individual pictures where the cigarette is about2-5% of the image. I'm thinking of scrapping those pictures and take new one where the cigarette is about 30-50% of the image.
All the pictures are taking outside on the street environment.
So my question is: are there are any kind of rule regarding good pictures in a training set?
I will report back when I have tried my own solution
The object you are trying to recognise is too small. In the Sample, I think first one will be the best bet for you. Convolution neural network works by doing convolution operations on image pixels. In the second picture, background is too large compared to the object you are trying to recongise. Training on such data will not help you.
Just trying to answer your rule question:
Make sure that cigarette occupies maximum region of the Image. It can be 50% to 90% (with experience). You can still identify cigarettes with 2 to 3 % area, but you need millions of images with varying backgrounds.
CNN learns from the input image. Looking at the sample images you shared (I guess all the images are taken from road side platforms and grass areas). CNN may not learn to find the cigarette, instead it will learn to detect the common background, if your background occupies maximum area of the image. Please make sure to keep different background patterns.

TensorFlow Custom Object Detection Disappointing Result - Why?

I have just started TF Object Detection API two weeks ago, and manage to train a model to recognize a custom object, in my case, a Mecanum wheel.
Here's the details:
No. of training images = 125
All training images are around 500 x 500 (plus minus)
Transfer Learning
Model used = ssd_mobilenet_v1_coco
batch size = 2
total steps ran = 12715
loss is around 0.5000 - 2.5000, some time it fluctuate to more than 10, I am not sure why
Here's the result:
The first image is encouraging.
The second image starts to disappoint me a little. I expect the model to detect FOUR (four boxes) Mecanum wheel. Why?
Then, I suspect that's there's something wrong with my trained model. I tried with the sample test images, the third image and fourth image, then I am sure that this is totally not the model I first aim for.
I have been reading this post which I think our problems are quite similar (and he manage to solve it). He mentioned that the input image needs to be less than 600 x 1024, so I tried with fifth image and unsurprisingly, the result is again disappointing.
I went through the tutorial series by sentdex and in the comment sections, I notice that there are many people face this problem too. So, what to do now?
Can someone please help me to edit the list? Why can't I make it to one paragraph one list?
125 images? You will not be able to get very good results with that many images. If you want to validate that this is indeed the problem, try training with just subsets of your original 125 images.
For example, how bad is the output when you train on 10 images?
Does it get better when you use 50 images?
Does it get better yet when you use 125 images?
If the accuracy improves with increasing dataset size, you can extrapolate and guess that with 1000 images, you will be able to do even better. I would guess that that is your problem.

How to make a model of 10000 Unique items using tensorflow? Will it scale?

I have a use case where I have around 100 images each of 10000 unique items. I have 10 items with me which are all from the 10000 set and I know which 10 items too but only at the time of testing on live data. I have to now match the 10 items with their names. What would be an efficient way to recognise these items? I have full control of training environment background and the testing environment background. If I make one model of all 10000 items, will it scale? Or should I make 10000 different models and run the 10 items on the 10 models I have pretrained.
Your question is regarding something called "one-vs-all classification" you can do a google search for that, the first hit is a video lecture by Andrew Ng that's almost certainly worth watching.
The question has been long studied and in a plethora of contexts. The answer to your question does very much depend on what model you use. But I'll assume that, if you're doing image classification, you are using convolutional neural networks, because, after all, they're state of the art for most such image classification tasks.
In the context of convolutional networks, there is something called "Multi task learning" that you should read up on. Boiled down to a single sentence, the concept is that the more you ask the network to learn the better it is at the individual tasks. So, in this case, you're almost certain to perform better training 1 model on 10,000 classes than 10,000 classes each performing a one-vs-all classification scheme.
Take for example the 1,000 class Imagenet dataset, and CIFAR-10's 10 class dataset. It has been demonstrated in numerous papers that first training against Imagenet's 1,000 class dataset, and then simply replacing the last layer with a 10 class output and re-training on CIFAR-10's dataset will produce a better result than just training on CIFAR-10's dataset alone. There are admittedly multiple reasons for this result, Imagenet is a larger dataset. But the richness of class labels, multi-task learning, in the Imagenet dataset is certainly among the reasons for this result.
So that was a long winded way of saying, use one model with 10,000 classes.
An aside:
If you want to get really, really interesting, and jump into the realm of research level thinking, you might consider a 1-hot vector of 10,000 classes rather sparse and start thinking about whether you could reduce the dimensionality of your output layer using an embedding. An embedding would be a dense vector, let's say size 100 as a good starting point. Now class labels turn into clusters of points in your 100 dimensional space. I bet your network will perform even better under these conditions.
If this little aside didn't make sense, it's completely safe to ignore it, your 10,000 class output is fine. But if it did peek your interest look up information on Word2Vec, and read this really nice post on how face recognition is achieved using embeddings: https://medium.com/#ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78. You might also consider using an Auto Encoder to generate an embedding for the images (though I favor triplet embeddings as typically used in face recognition myself).