I have a custom NER model that is working fine to find locations:
"I like London and Berlin"
LOC London
LOC Berlin
I am trying to add another custom NER training to it, following this guide. However, when I add a new label, the previous model loses its information about the label types. Namely:
"Do you like horses? I like London and Berlin"
ANIMAL horses
ANIMAL London
ANIMAL Berlin
(I have simply added the second sentence to the test_text variable in the link above, that's why no additional code is provided)
Can anybody shed some light on this? I am using spaCy 2.2.4
The reason for the poor results is due to a concept called catastrophic forgetting. You can get more information here.
tl;dr
As you are training your model with new entities, it is forgetting what it previously learnt.
In order to make sure that the old learnings are not forgotten, you need to feed the model examples of the other types of entities too during retraining. By doing this, you will ensure that the model does not self tune and skew itself to predict everything as the new entity being trained.
You can read about possible solutions that can be implemented here
Related
I'm working on an object detection project where I have to identify the type of animals and their posture given an image/video. For this purpose, I have two custom YOLOv4 models which are trained separately. Model 1 identifies the type of animal and Model 2 identifies the posture of the animal. I have converted these models to TensorFlow models.
Now, since both the models use the same image/video as input, I want to combine the outputs of both the models and the final output should display the bounding box of both the models.
I'm stuck at this point, I have been researching the solution for this and I'm confused with various methods. Could anyone help me with this?
I don't think that you need object detection model as pose identifier - because you've already localized the animal by 1st net.
The easiest (and clearly not very accurate) solution that I see is to use classifier on top of detections (crop bounding box as input) - but in that case the animal anatomy is not taken into account explicitly, but that approach is I guess still good baseline.
For further experiments you can take a look at these and these solutions with animal pose estimation, but they are more complex to use
I was wondering, which would be better for the following:
I want to create a model to help distinct car models, take the Mercedes C250 (2014) and Mercedes C63 (2014) as an example.
I understand, object helps to identify multiple well... objects in a given image, however, looking at a tutorial online and seeing how IBM cloud can allow you to annotate such specifics say the badge on the car, certain detailing etc. Would an object detection work better for me as opposed to just an image classifier?
I understand, the more data that is fed, the better the results, but in a general sense, what should be the approach? Image classifier or object detection? Or maybe something else? I've used and trained multiple image classifiers but I am not happy at all with the results.
Any help or suggestions would be much appreciated.
Object detection better because simple image classifier broke if you have more than one different cars at one photo.
I am currently testing out custom object detection using the Tensorflow API. But I don't quite seem to understand the theory behind it.
So if I for example download a version of MobileNet and use it to train on, lets say, red and green apples. Does it forget all the things that is has already been trained on? And if so, why does it then benefit to use MobileNet over building a CNN from scratch.
Thanks for any answers!
Does it forget all the things that is has already been trained on?
Yes, if you re-train a CNN previously trained on a large database with a new database containing fewer classes it will "forget" the old classes. However, the old pre-training can help learning the new classes, this is a training strategy called "transfert learning" of "fine tuning" depending on the exact approach.
As a rule of thumb it is generally not a good idea to create a new network architecture from scratch as better networks probably already exist. You may want to implement your custom architecture if:
You are learning CNN's and deep learning
You have a specific need and you proved that other architectures won't fit or will perform poorly
Usually, one take an existing pre-trained network and specialize it for their specific task using transfert learning.
A lot of scientific literature is available for free online if you want to learn. you can start with the Yolo series and R-CNN, Fast-RCNN and Faster-RCNN for detection networks.
The main concept behind object detection is that it divides the input image in a grid of N patches, and then for each patch, it generates a set of sub-patches with different aspect ratios, let's say it generates M rectangular sub-patches. In total you need to classify MxN images.
In general the idea is then analyze each sub-patch within each patch . You pass the sub-patch to the classifier in your model and depending on the model training, it will classify it as containing a green apple/red apple/nothing. If it is classified as a red apple, then this sub-patch is the bounding box of the object detected.
So actually, there are two parts you are interested in:
Generating as many sub-patches as possible to cover as many portions of the image as possible (Of course, the more sub-patches, the slower your model will be) and,
The classifier. The classifier is normally an already exisiting network (MobileNeet, VGG, ResNet...). This part is commonly used as the "backbone" and it will extract the features of the input image. With the classifier you can either choose to training it "from zero", therefore your weights will be adjusted to your specific problem, OR, you can load the weigths from other known problem and use them in your problem so you won't need to spend time training them. In this case, they will also classify the objects for which the classifier was training for.
Take a look at the Mask-RCNN implementation. I find very interesting how they explain the process. In this architecture, you will not only generate a bounding box but also segment the object of interest.
I am new to machine learning field and based on what I have seen on youtube and read on internet I conjectured that it might be possible to count pedestrians in a video using tensorflow's object detection API.
Consequently, I did some research on tensorflow and read documentation about how to install tensorflow and then finally downloaded tensorflow and installed it. Using the sample files provided on github I adapted the code related to object_detection notebook provided here ->https://github.com/tensorflow/models/tree/master/research/object_detection.
I executed the adapted code on the videos that I collected while making changes to visualization_utils.py script so as to report number of objects that cross a defined region of interest on the screen. That is I collected bounding boxes dimensions (left,right,top, bottom) of person class and counted all the detection's that crossed the defined region of interest (imagine a set of two virtual vertical lines on video frame with left and right pixel value and then comparing detected bounding box's left & right values with predefined values). However, when I use this procedure I am missing on lot of pedestrians even though they are detected by the program. That is the program correctly classifies them as persons but sometimes they don't meet the criteria that I defined for counting and as such they are not counted. I want to know if there is a better way of counting unique pedestrians using the code rather than using the simplistic method that I am trying to develop. Is the approach that I am using the right one ? Could there be other better approaches ? Would appreciate any kind of help.
Please go easy on me as I am not a machine learning expert and just a novice.
You are using a pretrained model which is trained to identify people in general. I think you're saying that some people are pedestrians whereas some other people are not pedestrians, for example, someone standing waiting at the light is a pedestrian, but someone standing in their garden behind the street is not a pedestrian.
If I'm right, then you've reached the limitations of what you'll get with this model and you will probably have to train a model yourself to do what you want.
Since you're new to ML building your own dataset and training your own model probably sounds like a tall order, there's a learning curve to be sure. So I'll suggest the easiest way forward. That is, use the object detection model to identify people, then train a new binary classification model (about the easiest model to train) to identify if a particular person is a pedestrian or not (you will create a dataset of images and 1/0 values to identify them as pedestrian or not). I suggest this because a boolean classification model is about as easy a model as you can get and there are dozens of tutorials you can follow. Here's a good one:
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/neural_network.ipynb
A few things to note when doing this:
When you build your dataset you will want a set of images, at least a few thousand along with the 1/0 classification for each (pedestrian or not pedestrian).
You will get much better results if you start with a model that is pretrained on imagenet than if you train it from scratch (though this might be a reasonable step-2 as it's an extra task). Especially if you only have a few thousand images to train it on.
Since your images will have multiple people in it you have a problem of identifying which person you want the model to classify as a pedestrian or not. There's no one right way to do this necessarily. If you have a yellow box surrounding the person the network may be successful in learning this notation. Another valid approach might be to remove the other people that were detected in the image by deleting them and leaving that area black. Centering on the target person may also be a reasonable approach.
My last bullet-point illustrates a problem with the idea as it's I've proposed it. The best solution would be to alter the object detection network to ouput both a bounding box per-person, and also a pedestrian/non pedestrian classification with it; or to only train the model to identify pedestrians, specifically, in the first place. I mention this as more optimal, but I consider it a more advanced task than my first suggestion, and a more complex dataset to manage. It's probably not the first thing you want to tackle as you learn your way around ML.
Hi i have build my CNN with two classes dogs and cats, i have trained this and now i am able to classify dog and cat image. But what about if i want to introduce a class for new unclassified object? For example if i feed my network with a flower image's the network give me a wrong classification. I want to build my network with a third class for new unclassified object. But how can i build this third class. Which images i have to use to get class for new object that are different from dogs or cats?
Actually at the end of my network i use Softmax and my code is developed by using tensorflow. Could someone provide me some suggestion? Thanks
You need to add a third "something else" class to your network. There are several ways you can go about it. In general, if you have a class that you want to detect you should have examples for that class, so you could add images without cats or dogs to your training data labelled with the new class. However, this is a bit tricky, because the new class is, by definition, everything in the universe but dogs and cats, so you cannot possibly expect to have enough data to train for it. In practice, though, if you have enough examples the network will probably learn that the third class is triggered whenever the first two are not.
Another option that I have used in the past is to model the "default" class slightly different from the regular ones. So, instead of trying to actually learn what is a "not cat or dog" image, you can just explicitly say that it is just whatever does not activates the cat or dog neurons. I did this by replacing the last layer from softmax to a sigmoids (so the loss would be sigmoid cross-entropy instead of softmax cross-entropy, and the output would not be a categorical probability distribution anymore, but honestly it didn't make much difference performance-wise in my case), then express the "default" class as 1 minus the maximum activation value from every other class. So, if no class had an activation of 0.5 of greater (i.e. 50% estimated probability of being that class), the "default" class would be the highest scoring one. You can explore this an other similar schemes.
You should just add images to your dataset that are neither dogs nor cats, label them as "Other", and treat "Other" as normal class in all your code. In particular you'll get a softmax over 3 classes.
The images you're using can be anything (except cats and dogs of course), but should be of the same kind as the ones you'll probably be testing against when using your network. So for instance if you know you'll be testing on images of dogs, cats, and other animals, train with other animals, not with pictures of flowers. If you don't know what you'll be testing with, try to get very varied images, from different sources, etc, so that the network learns well that this class is "anything but cats and dogs" (the wide range of images in the real world that fall in this category should be reflected in your training dataset).