Why is my transfer learning implementation on tensorflow is throwing me an error after a few iterations? - tensorflow

I am using inception v1 architecture for transfer learning. I have downloded the checkpoints file, nets, pre-processing file from the below github repository
https://github.com/tensorflow/models/tree/master/slim
I have 3700 images and pooling out the last pooling layer filters from the graph for each of my image and appending it to a listĀ . With every iteration the ram usage is increasing and finally killing the run at around 2000 images. Can you tell me what mistake I have doneĀ ?
https://github.com/Prakashvanapalli/TensorFlow/blob/master/Transfer_Learning/inception_v1_finallayer.py
Even if I remove the list appending and just trying to print the results. this is still happening. I guess the mistake is with the way of calling the graph. When I see my ram usage , with every iteration it is becoming heavy and I don't know why this is happening as I am not saving anything nor there is a difference between 1st iteration
From my point, I am just sending one Image and getting the outputs and saving them. So it should work irrespective of how many images I send.
I have tried it on both GPU (6GB) and CPU (32GB).

You seem to be storing images in your graph as tf.constants. These will be persistent, and will cause memory issues like you're experiencing. Instead, I would recommend either placeholders or queues. Queues are very flexible, and can be very high performance, but can also get quite complicated. You may want to start with just a placeholder.
For a full-complexity example of an image input pipeline, you could look at the Inception model.

Related

Does changing a token name in an image caption model affect performance?

If I train an image caption model then stop to rename a few tokens:
Should I train the model from scratch?
Or can I reload the model and continue training from the last epoch with the updated vocabulary?
Will either approach effect model accuracy/performance differently?
I would go for option 2.
When training the model from scratch, you are initializing the model's weights randomly and then you fit them based on your problem. However, if, instead of using random weights, you use weights that have already been trained for a similar problem, you may decrease the convergence time. This option is kind similar to the idea of transfer learning.
Just to give the other team a voice: So what is actually the difference between training from scratch and reloading a model and continuing training?
(2) will converge faster, (1) will probably have a better performance and should thus be chosen. Do we actually care about training times when we trade them off with performance - do you really? See you do not.
The further your model is already converged to a specific problem, the harder it gets to get it back into another optimum. Now you might be lucky and the chance, that you are going down the right rabid hole, rises with similar tasks and similar data. Yet with a change in your setup this can not be guaranteed.
Initializing a few epochs on other than your target domain, definitely makes sense and is beneficial, yet the question arises why you would not train on your target domain from the very beginning.
Note: For a more substantial read I'd like to refer you to this paper, where they explain in more depth why domain is of the essence and transfer learning could mess with your final performance.
It depends on the number of tokens being relabeled compared to the total amount. Just because you mentioned there are few of them, then the optimal solution in my opinion is clear.
You should start the training from scratch but initialize the weights with the values they had from wherever the previous training stopped (again mentioning that it is crucial that the samples that are being re-labeled are not of substantial amount). This way, the model will likely converge faster than starting with random weights and also better than trying to re-fit ("forget") what it managed to learn from the previous training.
Topologically speaking you are initializing in a position where the model is closer to a global minimum but has not made any steps towards a local minimum.
Hope this helps.

Tensorflow inference run time high on first data point, decreases on subsequent data points

I am running inference using one of the models from TensorFlow's object detection module. I'm looping over my test images in the same session and doing sess.run(). However, on profiling these runs, I realize the first run always has a higher time as compared to the subsequent runs.
I found an answer here, as to why that happens, but there was no solution on how to fix.
I'm deploying the object detection inference pipeline on an Intel i7 CPU. The time for one session.run(), for 1,2,3, and the 4th image looks something like (in seconds):
1. 84.7132628
2. 1.495621681
3. 1.505012751
4. 1.501652718
Just a background on what all I have tried:
I tried using the TFRecords approach TensorFlow gave as a sample here. I hoped it would work better because it doesn't use a feed_dict. But since more I/O operations are involved, I'm not sure it'll be ideal. I tried making it work without writing to the disk, but always got some error regarding the encoding of the image.
I tried using the TensorFlow datasets to feed the data, but I wasn't sure how to provide the input, since the during inference I need to provide input for "image tensor" key in the graph. Any ideas on how to use this to provide input to a frozen graph?
Any help will be greatly appreciated!
TLDR: Looking to reduce the run time of inference for the first image - for deployment purposes.
Even though I have seen that the first inference takes longer, the difference (84 Vs 1.5) that is shown there seems to be a bit unbelievable. Are you counting the time to load model also, inside this time metric? Can this could be the difference for the large time difference? Is the topology that complex, so that this time difference could be justified?
My suggestions:
Try Openvino : See if the topology you are working on, is supported in Openvino. OpenVino is known to speed up the inference workloads by a substantial amount due to it's capability to optimize network operations. Also, the time taken to load openvino model, is comparitively lower in most of the cases.
Regarding the TFRecords Approach, could you please share the exact error and at what stage you got the error?
Regarding Tensorflow datasets, could you please check out https://github.com/tensorflow/tensorflow/issues/23523 & https://www.tensorflow.org/guide/datasets. Regarding the "image tensor" key in the graph, I hope, your original inference pipeline should give you some clues.

How to fix incorrect guess in image recognition

I'm very new to this stuff so please bear with me. I followed a quick simple video about image recognition/classification in YT and the program indeed could classify the image with a high percentage. But then I do have some other images that was incorrectly classified.
On tensorflow site: https://www.tensorflow.org/tutorials/image_retraining#distortions
However, one should generally avoid point-fixing individual errors in
the test set, since they are likely to merely reflect more general
problems in the (much larger) training set.
so here are my questions:
What would be the best way to correct the program's guess? eg. image is B but the app returned with the results "A - 70%, B - 30%"
If the answer to one would be to retrain again, how do I go about retraining the program again without deleting the previous bottlenecks files created? ie. I want the program to keep learning while retaining previous data I already trained it to recognize.
Unfortunately there is often no easy fix, because the model you are training is highly complex and very hard for a human to interpret.
However, there are techniques you can use to try and reduce your test error. First make sure your model isn't overfitting or underfitting by observing the difference between train and test errors. If either is the case then try applying standard techniques, such as choosing a deeper model and/or using more filters if underfitting or adding regularization if overfitting.
Since you say you are already classifying correctly a high percentage of the time, I would start inspecting misclassified examples directly to try and gain insight into what you might be able to improve.
If possible, try and observe what your misclassified images have in common. If you are lucky they will all fall into one or a small number of categories. Here are some examples of what you might see and possible solutions:
Problem: Dogs facing left are misclassified as cats
Solution: Try augmenting your training set with rotations
Problem: Darker images are being misclassified
Solution: Make sure you are normalizing your images properly
It is also possible that you have reached the limits of your current approach. If you still need to do better consider trying a different approach like using a pretrained network for image recognition, such as VGG.

How to train a CNN without lose the already trained weights (memory loss effect)

From what I've researched, there is nothing that can handle the "memory loss" effect by using tensorflow (which I consider the most complete library for the purpose). In this way, I would like to formulate my question relate to this.
Considering the hypothetical scenario of a large set of images that were used for training, and this training took hours to occur, gave to me an accurate weights. Now, I would like to make my model even more robust by presenting even more images, of course, without the need to present the initial set again. I'd like to add knowledge without losing the memory I already have!
Is there something in tensorflow that could handle this scenario? Or should I just train again with the whole set?

Deep networks on Cloud ML

I am trying to train a very deep model on Cloud ML however i am having serious memory issues that i am not managing to go around. The model is a very deep convolutional neural network to auto-tag music.
The model for this can be found in the image below. A batch of 20 with a tensor of 12x38832x1 is inserted in the network.
The music was originally 465894x1 samples which was then split into 12 windows. Hence, 12x38832x1. When using the map_fn function each loop would have the seperate 38832x1 samples (conv1d).
Processing windows at a time yields better results than the whole music using one CNN. This was split prior to storing the data in TFRecords in order to minimise the needed processing during training. This is loaded in a queue with maximum queue size of 200 samples (ie 10 batches).
Once dequeue, it is transposed to have the 12 dimension first which then can be used in the map_fn function for processing of the windows. This is not transposed prior to being queued as the first dimension needs to match the batch dimension of the output which is [20, 50]. Where 20 is the batch size as the data and 50 are the different tags.
For each window, the data is processed and the results of each map_fn are superpooled using a smaller network. The processing of the windows is done by a very deep neural network which is giving me problems to keep as all the config options i am giving are giving me out of memory errors.
As a model i am using one similar to Census Tensorflow Model.
First and foremost, i am not sure if this is the best option since for evaluation a separate graph is built and not shared variables. This would require double the amount of parameters.
Secondly, as a cluster setup, i have been using one complex_l master, 3 complex_l workers and 3 large_model parameter servers. I do not know if am underestimating the amount of memory needed here.
My model has previously worked with a much smaller network. However, increasing it in size started giving me bad out of memory errors.
My questions are:
The memory requirement is big, but i am sure it can be processed on cloud ml. Am i underestimating the amount of memory needed? What are your suggestions about the cluster for such a network?
When using a train.server in the dispatch function, do you need to pass on the cluster_spec so it is used in the replica_device setter? Or does it allocate on it's own? When not using it, and setting tf.configProto of log placement, all the variables seem to be on the master worker. On the Census Example in the task.py this is not passed on. I can assume this is correct?
How does one calculate how much memory is needed for a model (rough estimate to select the cluster)?
Is there any other tensorflow core tutorial how to setup such big jobs? (other than Census)
When training a big model in distributed between-graph replication, does all the model need to fit on the worker, or the worker only does ops and then transmits the results to the PS. Does that mean that the workers can have low memory just for singular ops?
PS: With smaller models the network trained successfully. I am trying to deepen the network for better ROC.
Questions coming up from on-going troubleshooting:
When using the replica_device_setter with the parameter cluster, i noticed that the master has very little memory and CPU usage and checking the log placement there are very little ops on the master. I checked the TF_CONFIG that is loaded and it says the following for the cluster field:
u'cluster': {u'ps': [u'ps-4da746af4e-0:2222'], u'worker': [u'worker-4da746af4e-0:2222'], u'master': [u'master-4da746af4e-0:2222']}
On the other hand, in the tf.train.Clusterspec documentation, it only shows workers. Does that mean that the master is not considered as worker? What happens in such case?
Error is it Memory or something else? EOF Error?