YOLO vs Inception on unique images - tensorflow

I have images of unique products that are used at my workplace. I can't imagine that the inception database already has similar items that it has been trained on.
I tried to train a model using YOLO. It was taking a very very long time. Maybe 7minutes between epochs; and I wanted to do 1000 epochs due to small data size.
I used tiny-yolov2-voc cfg/weight on 1.0 GPU. I had a video of the item but i broke it up into frames so i could annotate. I then attempted to train on the images (not video). The products are healthcare related. Basically anything that a hospital would use.
Ive also used the inception method on images I got from Google. I noticed that inception method was very fast and resulted in accurate predictions. However, i'm worried that my images are too unique for inception to work.
Which method is best to use?
If you recommend YOLO, can you please provide suggestions on how to speed up the training phase?
If you recommend inception, can you please provide an explanation why it would work on unique images? I guess i'm having trouble understanding how inception knows which item i'm trying to train on without me providing annotations.
Thanks in advance

Just my impression (no recommendation or even related experience)
Having a look at the Hardware recommendations related to darknet a assumption is that you might stock up your own hardware to get faster results.
I read about the currently three different versions of YOLO and expect there are lot's of GFLOPS training included if you download the recommended files, but if the models never fit to your products then for you they never might be very helpful.
I must admit I've neither been active with YOLO nor with Tensorflow, so my impression might not be helpful at all.
If you see some videos of YOLO you can remark that sometimes a camel is labeled with horse and the accuracy seems being bad but it depends on the threshold that is applied to the images, so the videos look amazing as it seems the recognition is done so fast but with higher accuracy the process would slow down - also depending on the trained motives.
They never hide it though, they explain on an image where a dog is labeled as cow and a horse as sheep (Version 2) that in combination with darknet it's getting much faster but less accurate too, so usage of darknet is an important aspect too.
The information about details seems being quite bad on the websites of YOLO, they present it more like you'd do with a popstar, in comparison the website of Tensorflow looks more academic and is informing about the mathematics behind the framework.
Concerning Tensorflow I don't know about the hardware-recommendations, but as you wrote your results are useful, probably they are a bit or even much less.
My impression is that YOLO is primary intended for real-time detection in (live-)videos and needs much training for high accuracy. So depending on your use-case it might be right but you'd to invest in hardware probably for professional usage.
This is not an opinion against Tensorflow but that I had to verify more and it seems taking more time to get an impression. Concerning Tensorflow in the moment I even can't say if it can be used for real-time-detection, how accurate it is then and if the results are then still better then those of YOLO.
My assumption is that concerning both solutions it's a matter of involved elements (like the decision if to include darknet for speed), configuration, training and adjustments. Probably there is always something to increase in speed and accuracy, so investing in a system for recognition won't be static process with fixed end in timeline, but a steady process.
This is just a short overview of my impressions, I've never any experience with any recognition-software and hardly recommend that you make any decision based on my words.
Just if you want to do use any recognition software professional, especially for real-time-recognition, then you've to invest in hardware probably.

To my understanding of your problem you need you need inception with the capability of identifying your unique images. In this circumstance you can use transfer-learning on the inception model. With transfer-learning you can still train inception your own pictures while retaining the previous knowledge of inception.
More on transfer-learning

Related

Which model should I use for object recognition on mobile devices?

I am working on a project where I need the app to recognize a special character, just one small graphic, from a photographed document. Something similar to the example from the picture. More specifically, the app would use this character to determine the corners of the document.
Something like this
Which model would be suitable for that, Mobile SSD, Yolo or something completely different? Approximately how many photos and how much time would it take me to successfully train the model to 90%+ detection? And is TensorFlow Model Maker a good option?
I already tried to train it with Model Maker but the results were really disappointing. I have used
efficientdet_lite0
model. The photos were taken with a phone in high resolution, tagged with labelImg. About 40 for training, five each for validation and test.
It would mean a lot to me if someone would tell me if I am at least on the right track. Thank you very much in advance.
efficientdet or yolo should be good enough for you use case. Yolov5 recommends 10000 annotated instances of each class for good results, this is given that the variability of the class representation is large. I would not start with anything less than a few hundred.

TensorFlow 2 Detection Model Zoo metrics

I know it's a banality, but i'm really confused on what Speed (ms) and COCO mAP means HERE.
I get the idea, lower speed and higher mAP are better, but can i ask what does those metrics mean?
I have to write a report about a project that uses one of the model listed in the github model of tensorflow, so i would like a technical description of those two if possible. About COCO mAP i found something already, i'm trying to understand it, but nothing related to Speed. What does speed measure?
I'm sorry about the stupid question, but i like to fully understand things
It refers to inference speed, how much time it take for the network to provide an output based on your input.

How to train your own(w/o YOLO etc.) object detector in tf/keras

I successfully trained multi-classificator model, that was really easy with simple class related folder structure and keras.preprocessing.image.ImageDataGenerator with flow_from_directory (no one-hot encoding by hand btw!) after i just compile fit and evaluate - extremely well done pipeline by Keras!
BUT! when i decided to make my own (not cats, not dogs, not you_named) object detector - this is became a nightmare...
TFRecord and tf.Example are just madness! but ok, i almost get it (my dataset is small, i have plenty of ram, but who cares, write f. boilerplate, so much meh...)
The main thing - i just can't find any docs/tutorial how to make it with plain simple tf/keras, everyone just want to build up it on top of someone model, YOLO SSD FRCNN, even if they trying to detect completely new objects!!!
There two links about OD in official docs, and they both using some models underneath.
So my main question WHY ??? or i just blind..? -__-
It becomes a nightmare because Object Detection is way way harder than classification. The most simple object detector is this: first train a classifier on all your objects. Then when you want to detect objects in your image, slide a window over your image, and classify each window. Then, if your classifier is certain that a certain window is one of the objects, mark it as a successful detection.
But this approach has a lot of problems, mainly it's way (like waaaay) too slow. So, researcher improved it and invented RCNNs. That had it problems, so they invented Faster-RCNN, YOLO and SSD, all to make it faster and more accurate.
You won't find any tutorials online on how to implement the sliding window technique because it's not useful anyway, and you won't find any tutorials on how to implement the more advanced stuff because, well, the networks get complicated pretty quick.
Also note that using YOLO doesn't mean you should use the same weights as in YOLO. You can always train YOLO from scratch on your own data if you want by randomly initiliazing all the weights in the network layers. So the even if they trying to detect completely new objects!!! you mentioned isn't really valid. Also also note that I still would advise you to do use the weights they used in Yolo network. Transfer Learning is generally looked at as being a good idea, especially when starting out and especially in the image processing world, as many images share common features (like edges, for example).
I am having pretty much the same problem as my images are B/W diagrams, quite different from regular pictures, I want to train a custom model on just only diagrams.
I have found this documentation section in Tensorflow models repo:
https://github.com/tensorflow/models/blob/master/research/object_detection/README.md
It has a couple of sections explaining how to bring your own model and dataset in "extras" that could be a starting point.

How to fix incorrect guess in image recognition

I'm very new to this stuff so please bear with me. I followed a quick simple video about image recognition/classification in YT and the program indeed could classify the image with a high percentage. But then I do have some other images that was incorrectly classified.
On tensorflow site: https://www.tensorflow.org/tutorials/image_retraining#distortions
However, one should generally avoid point-fixing individual errors in
the test set, since they are likely to merely reflect more general
problems in the (much larger) training set.
so here are my questions:
What would be the best way to correct the program's guess? eg. image is B but the app returned with the results "A - 70%, B - 30%"
If the answer to one would be to retrain again, how do I go about retraining the program again without deleting the previous bottlenecks files created? ie. I want the program to keep learning while retaining previous data I already trained it to recognize.
Unfortunately there is often no easy fix, because the model you are training is highly complex and very hard for a human to interpret.
However, there are techniques you can use to try and reduce your test error. First make sure your model isn't overfitting or underfitting by observing the difference between train and test errors. If either is the case then try applying standard techniques, such as choosing a deeper model and/or using more filters if underfitting or adding regularization if overfitting.
Since you say you are already classifying correctly a high percentage of the time, I would start inspecting misclassified examples directly to try and gain insight into what you might be able to improve.
If possible, try and observe what your misclassified images have in common. If you are lucky they will all fall into one or a small number of categories. Here are some examples of what you might see and possible solutions:
Problem: Dogs facing left are misclassified as cats
Solution: Try augmenting your training set with rotations
Problem: Darker images are being misclassified
Solution: Make sure you are normalizing your images properly
It is also possible that you have reached the limits of your current approach. If you still need to do better consider trying a different approach like using a pretrained network for image recognition, such as VGG.

Use summarization model without training

The tensorflow text summarization model as described here https://github.com/tensorflow/models/tree/master/textsum requires a multi GPU architecture in order to train. My repeated attempts at training the model has resulted in memory exceptions, machine crashing for various reasons. Is the trained summarisation model available so can make use of the summarization model without the need for training? The summarization model is trained using the not free Gigaword dataset, if the trained model is not available from Google is this a factor in reason why ?
So as far as I can tell, no one has put the trained model out there that is referenced. I too was originally running into memory issues on my macbook pro and eventually ended up using my gaming laptop which had a much better GPU.
The other option of course is to take advantage of AWS and use something like their g2.2xlarge instance. They also have their P2 instances as well, but I have not checked that out yet.
With regards to the Gigaword dataset, it simply comes down to licensing. It is not a free license from LDC and often many of the academics working on this have the dataset provided to them via their Universities or companies. I have not had luck finding it, however LDC did get back to me and advised that they do have other article datasets that have a pricetag of around $300 which is much more reasonable for those of use just trying to learn TF. That said, if you didn't want to buy anything, you can always write your own page scraper and format the data for the textsum model. https://github.com/tensorflow/models/pull/379/files
Hope this helps some. Good luck!