I start the course of tensorflow in udacity, and simultaneously I am looking on the web for the topic.
I suppose that the typical use cases are well solved already, in a better way that i can achieve by my own. In other words in some place exists trained models for usual cases ready to use. I found zooModels that if I undestand properly is the thing that i looking for. but I can't realize that there does not exist a ocr model published that can recognise a number in a image:
image example
Do i need to train my own model? Is there a repository that i don't know?
Related
I am using a Teachable Machine model which i trained to recognize some specific objects, the issue with it, however, is that it does not recognize when there is nothing, basically it always assumes that one of the objects is there. One potential solution I am considering is combining two models like the YOLO V2 Tflite model in the same app. Would this be even possible/efficient? If it is what would be the best way to do it?
If anyone knows a solution to get teachable machine to recognize when the object is not present that would probably be a much better solution.
Your problem can be solved making a model ensemble: Train a classifier that learns to know if your specific objects are not in the visual space, and then use your detection model.
However, I really recommend you to upload your model to an online service and consume it via an API. As I know tflite package just supports well MobileNet based models.
I had the same problem, just create another class called whatever you want(for example none) and put some non-related images in it, then train the model.
Now whenever there is nothing in the field, it should output none.
I successfully trained multi-classificator model, that was really easy with simple class related folder structure and keras.preprocessing.image.ImageDataGenerator with flow_from_directory (no one-hot encoding by hand btw!) after i just compile fit and evaluate - extremely well done pipeline by Keras!
BUT! when i decided to make my own (not cats, not dogs, not you_named) object detector - this is became a nightmare...
TFRecord and tf.Example are just madness! but ok, i almost get it (my dataset is small, i have plenty of ram, but who cares, write f. boilerplate, so much meh...)
The main thing - i just can't find any docs/tutorial how to make it with plain simple tf/keras, everyone just want to build up it on top of someone model, YOLO SSD FRCNN, even if they trying to detect completely new objects!!!
There two links about OD in official docs, and they both using some models underneath.
So my main question WHY ??? or i just blind..? -__-
It becomes a nightmare because Object Detection is way way harder than classification. The most simple object detector is this: first train a classifier on all your objects. Then when you want to detect objects in your image, slide a window over your image, and classify each window. Then, if your classifier is certain that a certain window is one of the objects, mark it as a successful detection.
But this approach has a lot of problems, mainly it's way (like waaaay) too slow. So, researcher improved it and invented RCNNs. That had it problems, so they invented Faster-RCNN, YOLO and SSD, all to make it faster and more accurate.
You won't find any tutorials online on how to implement the sliding window technique because it's not useful anyway, and you won't find any tutorials on how to implement the more advanced stuff because, well, the networks get complicated pretty quick.
Also note that using YOLO doesn't mean you should use the same weights as in YOLO. You can always train YOLO from scratch on your own data if you want by randomly initiliazing all the weights in the network layers. So the even if they trying to detect completely new objects!!! you mentioned isn't really valid. Also also note that I still would advise you to do use the weights they used in Yolo network. Transfer Learning is generally looked at as being a good idea, especially when starting out and especially in the image processing world, as many images share common features (like edges, for example).
I am having pretty much the same problem as my images are B/W diagrams, quite different from regular pictures, I want to train a custom model on just only diagrams.
I have found this documentation section in Tensorflow models repo:
https://github.com/tensorflow/models/blob/master/research/object_detection/README.md
It has a couple of sections explaining how to bring your own model and dataset in "extras" that could be a starting point.
I'm trying to train a model for a sentence classification task. The input is a sentence (a vector of integers) and the output is a label (0 or 1). I've seen some articles here and there about using Bert and GPT2 for text classification tasks. However, I'm not sure which one should I pick to start with. Which of these recent models in NLP such as original Transformer model, Bert, GPT2, XLNet would you use to start with? And why? I'd rather to implement in Tensorflow, but I'm flexible to go for PyTorch too.
Thanks!
It highly depends on your dataset and is part of the data scientist's job to find which model is more suitable for a particular task in terms of selected performance metric, training cost, model complexity etc.
When you work on the problem you will probably test all of the above models and compare them. Which one of them to choose first? Andrew Ng in "Machine Learning Yearning" suggest starting with simple model so you can quickly iterate and test your idea, data preprocessing pipeline etc.
Don’t start off trying to design and build the perfect system.
Instead, build and train a basic system quickly—perhaps in just a few
days
According to this suggestion, you can start with a simpler model such as ULMFiT as a baseline, verify your ideas and then move on to more complex models and see how they can improve your results.
Note that modern NLP models contain a large number of parameters and it is difficult to train them from scratch without a large dataset. That's why you may want to use transfer learning: you can download pre-trained model and use it as a basis and fine-tune it to your task-specific dataset to achieve better performance and reduce training time.
I agree with Max's answer, but if the constraint is to use a state of the art large pretrained model, there is a really easy way to do this. The library by HuggingFace called pytorch-transformers. Whether you chose BERT, XLNet, or whatever, they're easy to swap out. Here is a detailed tutorial on using that library for text classification.
EDIT: I just came across this repo, pytorch-transformers-classification (Apache 2.0 license), which is a tool for doing exactly what you want.
Well like others mentioned, it depends on the dataset and multiple models should be tried and best one must be chosen.
However, sharing my experience, XLNet beats all other models so far by a good margin. Hence if learning is not the objective, i would simple start with XLNET and then try a few more down the line and conclude. It just saves time in exploring.
Below repo is excellent to do all this quickly. Kudos to them.
https://github.com/microsoft/nlp-recipes
It uses hugging face transformers and makes them dead simple. 😃
I have used XLNet, BERT, and GPT2 for summarization tasks (English only). Based on my experience, GPT2 works the best among all 3 on short paragraph-size notes, while BERT performs better for longer texts (up to 2-3 pages). You can use XLNet as a benchmark.
My problem statement is as follows :
" Object Detection and Localization using Tensorflow and convolutional neural network "
What i did ?
I am done with the cat detection from images using tflearn library.I successfully trained a model using 25000 images of cats and its working fine with good accuracy.
Current Result :
What i wanted to do?
If my image consist of two or more than two objects in the same image for example cat and dog together so my result should be 'cat and dog' and apart from this i have to find the exact location of these two objects on the image(bounding box)
I came across many high level libraries like darknet , SSD but not able to get the concept behind it.
Please guide me about the approach to solve the problem.
Note : I am using supervised learning techniques.
Expected Result :
You have several ways to go about it.
The most straight forward way is to get some suggested bounding boxes using some bounding box suggestion algorithm like selective search and run on each on of the suggestion the classification net that you already trained. This approach is the approach taken by R-CNN.
For more advanced algorithm based on the above approach i suggest you read about Fast-R-CNN and Faster R-CNN.
Look at Object detection with R-CNN? for some basic explanation.
Darknet and SSD are based on a different approach if you want to undestand them you can read about them on
http://www.cs.unc.edu/~wliu/papers/ssd.pdf
https://pjreddie.com/media/files/papers/yolo.pdf
Image localization is a complex problem with many different implementations achieving the same result with different efficiency.
There are 2 main types of implementation
-Localize objects with regression
-Single Shot Detectors
Read this https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html to get a better idea.
Cheers
I have done a similar project (detection + localization) on Indian Currencies using PyTorch and ResNet34. Following is the link of my kaggle notebook, hope you find it helpful. I have manually collected images from the internet and made bounding box around them and saved their annotation file (Pascal VOC) using "LabelImg" annotation tool.
https://www.kaggle.com/shweta2407/objectdetection-on-custom-dataset-resnet34
Currently, there are a lot of deep learning models developed in Caffe instead of tensorflow. If I want to re-write these models in tensorflow, how to start? I am not familiar with Caffe structure. It seems to me that there are some files storing the model architecture only. My guess is that I only need to understand and transfer those architecture design into Tensorflow. The input/output/training will be re-written anyway. Is this thought meaningful?
I see some Caffe implementation also need to hack into the original Caffe framework down to the C++ level, and make some modifications. I am not sure under what kind of scenario the Caffe model developer need to go that deep? If I just want to re-implement their models in Tensorflow, do I need to go to check their C++ modifications, which are sometimes not documented at all.
I know there are some Caffe-Tensorflow transformation tool. But there are always some constraints, and I think re-write the model directly maybe more straightforward.
Any thougts, suggestions, and link to tutorials are highly appreciated.
I have already asked a similar question.
To synthetise the possible answers :
You can either use pre-existing tools like etheron's kaffe(which is really simple to use). But its simplicity comes at a cost: it is not easy to debug.
As #Yaroslav Bulatov answered start from scratch and try to make each layer match. In this regard I would advise you to look at ry's github which is a remarkable example where you basically have small helper functions which indicate how to reshape the weights appropriately from caffe to Tensorflow, which is the only real thing you have to do to make simple models match and also provides activations check layer by layer.