Cells detection using deep learning techniques - tensorflow

I have to analyse some images of drops, taken using a microscope, which may contain some cell. What would be the best thing to do in order to do it?
Every acquisition of images returns around a thousand pictures: every picture contains a drop and I have to determine whether the drop has a cell inside or not. Every acquisition dataset presents with a very different contrast and brightness, and the shape of the cells is slightly different on every setup due to micro variations on the focus of the microscope.
I have tried to create a classification model following the guide "TensorFlow for poets", defining two classes: empty drops and drops containing a cell. Unfortunately the result wasn't successful.
I have also tried to label the cells and giving to an object detection algorithm using DIGITS 5, but it does not detect anything.
I was wondering if these algorithms are designed to recognise more complex object or if I have done something wrong during the setup. Any solution or hint would be helpful!
Thank you!
This is a collage of drops from different samples: the cells are a bit different from every acquisition, due to the different setup and ambient lights

This kind of problem should definitely be possible. I would suggest starting with a cifar 10 convolutional neural network tutorial and customizing it for your problem.
In future posts you should tell us how your training is progressing. Make sure you're outputting the following information every few steps (maybe every 10-100 steps):
Loss/cost function output, you should see your loss decreasing over time.
Classification accuracy on the current batch of your training data
Classification accuracy on a held out test set (if you've implemented test set evaluation, you might implement this second)
There are many, many, many things that can go wrong, from bad learning rates, to preprocessing steps that go awry. Neural networks are very hard to debug, they are very resilient to bugs, making it hard to even know if you have a bug in your code. For that reason make sure you're visualizing everything.
Another very important step to follow is to save the images exactly as you are passing them to tensorflow. You will have them in a matrix form, you can save that matrix form as an image. Do that immediately before you pass the data to tensorflow. Make sure you are giving the network what you expect it to receive. I can't tell you how many times I and others I know have passed garbage into the network unknowingly, assume the worst and prove yourself wrong!
Your next post should look something like this:
I'm training a convolutional neural network in tensorflow
My loss function (sigmoid cross entropy) is decreasing consistently (show us a picture!)
My input images look like this (show us a picture of what you ACTUALLY FEED to the network)
My learning rate and other parameters are A, B, and C
I preprocessed the data by doing M and N
The accuracy the network achieves on training data (and/or test data) is Y
In answering those questions you're likely to solve 10 problems along the way, and we'll help you find the 11th and, with some luck, last one. :)
Good luck!

Related

Does changing a token name in an image caption model affect performance?

If I train an image caption model then stop to rename a few tokens:
Should I train the model from scratch?
Or can I reload the model and continue training from the last epoch with the updated vocabulary?
Will either approach effect model accuracy/performance differently?
I would go for option 2.
When training the model from scratch, you are initializing the model's weights randomly and then you fit them based on your problem. However, if, instead of using random weights, you use weights that have already been trained for a similar problem, you may decrease the convergence time. This option is kind similar to the idea of transfer learning.
Just to give the other team a voice: So what is actually the difference between training from scratch and reloading a model and continuing training?
(2) will converge faster, (1) will probably have a better performance and should thus be chosen. Do we actually care about training times when we trade them off with performance - do you really? See you do not.
The further your model is already converged to a specific problem, the harder it gets to get it back into another optimum. Now you might be lucky and the chance, that you are going down the right rabid hole, rises with similar tasks and similar data. Yet with a change in your setup this can not be guaranteed.
Initializing a few epochs on other than your target domain, definitely makes sense and is beneficial, yet the question arises why you would not train on your target domain from the very beginning.
Note: For a more substantial read I'd like to refer you to this paper, where they explain in more depth why domain is of the essence and transfer learning could mess with your final performance.
It depends on the number of tokens being relabeled compared to the total amount. Just because you mentioned there are few of them, then the optimal solution in my opinion is clear.
You should start the training from scratch but initialize the weights with the values they had from wherever the previous training stopped (again mentioning that it is crucial that the samples that are being re-labeled are not of substantial amount). This way, the model will likely converge faster than starting with random weights and also better than trying to re-fit ("forget") what it managed to learn from the previous training.
Topologically speaking you are initializing in a position where the model is closer to a global minimum but has not made any steps towards a local minimum.
Hope this helps.

Do I need every class in a training image for object detection?

I just try to dive into TensorFlows Object Detection. I have a very small training set of circa 40 images yet. Each image can have up to 3 classes. But now the question came into my mind: Does every training image need every class? Is that important for efficient training? Or is it okay if an image may only have one of the object classes?
I get a very high total loss with ~8.0 and thought this might be the reason for this but I couldn't find an answer.
In general machine learning systems can cope with some amount of noise.
An image missing labels or having the wrong labels is fine as long as overall you have sufficient data for the model to figure it out.
40 examples for image classification sounds very small. It might work if you start with a pre-trained image network and there are few classes that are very easy to distinguish.
Ignore absolute the loss value, it doesn't mean anything. Look at the curve to see that the loss is decreasing and stop the training when the curve flattens out. Compare the loss value to a test dataset to check if the values are sufficiently similar (you are not overfitting). You might be able to compare to another training of the exact same system (to check if the training is stable for example).

Counting Pedestrians Using TensorFlow's Object Detection

I am new to machine learning field and based on what I have seen on youtube and read on internet I conjectured that it might be possible to count pedestrians in a video using tensorflow's object detection API.
Consequently, I did some research on tensorflow and read documentation about how to install tensorflow and then finally downloaded tensorflow and installed it. Using the sample files provided on github I adapted the code related to object_detection notebook provided here ->https://github.com/tensorflow/models/tree/master/research/object_detection.
I executed the adapted code on the videos that I collected while making changes to visualization_utils.py script so as to report number of objects that cross a defined region of interest on the screen. That is I collected bounding boxes dimensions (left,right,top, bottom) of person class and counted all the detection's that crossed the defined region of interest (imagine a set of two virtual vertical lines on video frame with left and right pixel value and then comparing detected bounding box's left & right values with predefined values). However, when I use this procedure I am missing on lot of pedestrians even though they are detected by the program. That is the program correctly classifies them as persons but sometimes they don't meet the criteria that I defined for counting and as such they are not counted. I want to know if there is a better way of counting unique pedestrians using the code rather than using the simplistic method that I am trying to develop. Is the approach that I am using the right one ? Could there be other better approaches ? Would appreciate any kind of help.
Please go easy on me as I am not a machine learning expert and just a novice.
You are using a pretrained model which is trained to identify people in general. I think you're saying that some people are pedestrians whereas some other people are not pedestrians, for example, someone standing waiting at the light is a pedestrian, but someone standing in their garden behind the street is not a pedestrian.
If I'm right, then you've reached the limitations of what you'll get with this model and you will probably have to train a model yourself to do what you want.
Since you're new to ML building your own dataset and training your own model probably sounds like a tall order, there's a learning curve to be sure. So I'll suggest the easiest way forward. That is, use the object detection model to identify people, then train a new binary classification model (about the easiest model to train) to identify if a particular person is a pedestrian or not (you will create a dataset of images and 1/0 values to identify them as pedestrian or not). I suggest this because a boolean classification model is about as easy a model as you can get and there are dozens of tutorials you can follow. Here's a good one:
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/neural_network.ipynb
A few things to note when doing this:
When you build your dataset you will want a set of images, at least a few thousand along with the 1/0 classification for each (pedestrian or not pedestrian).
You will get much better results if you start with a model that is pretrained on imagenet than if you train it from scratch (though this might be a reasonable step-2 as it's an extra task). Especially if you only have a few thousand images to train it on.
Since your images will have multiple people in it you have a problem of identifying which person you want the model to classify as a pedestrian or not. There's no one right way to do this necessarily. If you have a yellow box surrounding the person the network may be successful in learning this notation. Another valid approach might be to remove the other people that were detected in the image by deleting them and leaving that area black. Centering on the target person may also be a reasonable approach.
My last bullet-point illustrates a problem with the idea as it's I've proposed it. The best solution would be to alter the object detection network to ouput both a bounding box per-person, and also a pedestrian/non pedestrian classification with it; or to only train the model to identify pedestrians, specifically, in the first place. I mention this as more optimal, but I consider it a more advanced task than my first suggestion, and a more complex dataset to manage. It's probably not the first thing you want to tackle as you learn your way around ML.

How to fix incorrect guess in image recognition

I'm very new to this stuff so please bear with me. I followed a quick simple video about image recognition/classification in YT and the program indeed could classify the image with a high percentage. But then I do have some other images that was incorrectly classified.
On tensorflow site: https://www.tensorflow.org/tutorials/image_retraining#distortions
However, one should generally avoid point-fixing individual errors in
the test set, since they are likely to merely reflect more general
problems in the (much larger) training set.
so here are my questions:
What would be the best way to correct the program's guess? eg. image is B but the app returned with the results "A - 70%, B - 30%"
If the answer to one would be to retrain again, how do I go about retraining the program again without deleting the previous bottlenecks files created? ie. I want the program to keep learning while retaining previous data I already trained it to recognize.
Unfortunately there is often no easy fix, because the model you are training is highly complex and very hard for a human to interpret.
However, there are techniques you can use to try and reduce your test error. First make sure your model isn't overfitting or underfitting by observing the difference between train and test errors. If either is the case then try applying standard techniques, such as choosing a deeper model and/or using more filters if underfitting or adding regularization if overfitting.
Since you say you are already classifying correctly a high percentage of the time, I would start inspecting misclassified examples directly to try and gain insight into what you might be able to improve.
If possible, try and observe what your misclassified images have in common. If you are lucky they will all fall into one or a small number of categories. Here are some examples of what you might see and possible solutions:
Problem: Dogs facing left are misclassified as cats
Solution: Try augmenting your training set with rotations
Problem: Darker images are being misclassified
Solution: Make sure you are normalizing your images properly
It is also possible that you have reached the limits of your current approach. If you still need to do better consider trying a different approach like using a pretrained network for image recognition, such as VGG.

How does deep learning work in real time?

I just have made api which recommends movies based on user input using Django. It’s designed as to execute deep learning function for every 10 inputs from each user. For this each 10 inputs, it takes around 10 seconds to give output. How does google or amazon can do real time data update without delay?
When it comes to execution there is no deep learning, there is just deep model. Let us assume that this is a regular feed forward model, then all you have to do is perform K (depth) matrix multiplications. For convolutional layers it is much more, but still matrix multiplications. All these operations are extremely simple to parallelize. In particular just running them on GPU will give you ~20x boost. Using tensorflow which can scatter computations across many cores/cpus/machines also works the same way. There are also numerous other optimizations possible, like training a small net to replicate behaviour of the big one, making use of sparsity of some matrices (if you are using relus - many neurons will produce zeros, thus - sparse matrices appear) etc.
That being said - network that takes 10s to process 10 inputs sounds like a horrible implementation, or the network is really huge, so before going for general optimization schemes - make sure your current code is fine. For example if you use tensorflow etc. it is important that many things before pushing your data take time - like loading libraries, starting sessions, session .run call itself etc.