Long waiting when running training model with ML - tensorflow

I have trouble for long waiting when I run my training model with Machine Learning using CNNs. Maybe this because my pc has such a bad specs for machine learning.
I have 50000 images for my X_training and must wait up to 1 hours more until it's done.
I think maybe that someone can solve my problem. Thanks a lot

I would recommend you to use Google Collab. It’s free to use. You can access it withing Google Drive and make sure to change the runtime to GPU. In cases such as CNN, using GPUs can make your training process a lot faster.
Also, I don’t know how you are handling images, but if using TensorFlow/Keras I would also recommend you to use the ImageDataGenerator for not loading all images into memory at once, but loading the images needed within each batch. It can save some resources for the computer

Related

Why does memory usage not going down even after stopping my tensorflow object detection script?

I am training object detection pipeline which is developed using TensorFlow library. My problem is, even after stopping the script memory usage is really high and not going down. Can somebody recommend a remedy to this problem?
I am using TensorFlow=2.6 and the object detection API from tensorflow to train on my data.
Even after I re-ran my script (model_main_tf2) after stopping the older ones, these older ones are still consuming a lot of memory (with same name as model_main_tf2) as can be seen in the figure below.
try running:
tf.keras.backend.clear_session()
after you are finished running a model

machine learning and model training

I am working on a machine learning project where I am my training my model on Google Colab.
I have cloned the repository and model is build up with tensor flow framework.
However, my data-set is too large. Before running the model I have two questions which are coming to mind:
1) If I leave my model overnight to get trained, what is the smartest way to know that my training is completed/left in between? (Any notification through email . . or ?)
2) What happens, if the internet connection breaks in between
My Google search is not providing me understandable answer. I would appreciate any help with solutions for my queries.
Maximum of 2 instances can be run concurrently and are linked to your Google account. Keep backing up your weights, and re-train if it takes more than 12 hours.
For such long jobs, it's always better to invest in a VPS, but to answer your questions,
The maximum lifetime of a job on Colab with the browser open is 12 hours. Therefore, it's a good idea to periodically save your model weights. A script to backup weights while training is a good idea.
If the internet connection breaks, the notebook will run for 90 minutes before the instance is considered to be idle and will be recycled. It's similar to closing your browser.

Deep Learning with TensorFlow on Compute Engine VM

I'm actualy new in Machine Learning, but this theme is vary interesting for me, so Im using TensorFlow to classify some images from MNIST datasets...I run this code on Compute Engine(VM) at Google Cloud, because my computer is to weak for this. And the code actualy run well, but the problam is that when I each time enter to my VM and run the same code I need to wait while my model is training on CNN, and after I can make some tests or experiment with my data to plot or import some external images to impruve my accuracy etc.
Is There is some way to save my result of trainin model just once, some where, that when I will decide for example to enter to the same VM tomorrow...and dont wait anymore while my model is training. Is that possible to do this ?
Or there is maybe some another way to do something similar ?
You can save a trained model in TensorFlow and then use it later by loading it; that way you only have to train your model once, and use it as many times as you want. To do that, you can follow the TensorFlow documentation regarding that topic, where you can find information on how to save and load the model. In short, you will have to use the SavedModelBuilder class to define the type and location of your saved model, and then add the MetaGraphs and variables you want to save. Loading the saved model for posterior usage is even easier, as you will only have to run a command pointing to the location of the file in which the model was exported.
On the other hand, I would strongly recommend you to change your working environment in such a way that it can be more profitable for you. In Google Cloud you have the Cloud ML Engine service, which might be good for the type of work you are developing. It allows you to train your models and perform predictions without the need of an instance running all the required software. I happen to have worked a little bit with TensorFlow recently, and at first I was also working with a virtualized instance, but after following some tutorials I was able to save some money by migrating my work to ML Engine, as you are only charged for the usage. If you are using your VM only with that purpose, take a look at it.
You can of course consult all the available documentation, but as a first quickstart, if you are interested in ML Engine, I recommend you to have a look at how to train your models and how to get your predictions.

Why is my transfer learning implementation on tensorflow is throwing me an error after a few iterations?

I am using inception v1 architecture for transfer learning. I have downloded the checkpoints file, nets, pre-processing file from the below github repository
https://github.com/tensorflow/models/tree/master/slim
I have 3700 images and pooling out the last pooling layer filters from the graph for each of my image and appending it to a list . With every iteration the ram usage is increasing and finally killing the run at around 2000 images. Can you tell me what mistake I have done ?
https://github.com/Prakashvanapalli/TensorFlow/blob/master/Transfer_Learning/inception_v1_finallayer.py
Even if I remove the list appending and just trying to print the results. this is still happening. I guess the mistake is with the way of calling the graph. When I see my ram usage , with every iteration it is becoming heavy and I don't know why this is happening as I am not saving anything nor there is a difference between 1st iteration
From my point, I am just sending one Image and getting the outputs and saving them. So it should work irrespective of how many images I send.
I have tried it on both GPU (6GB) and CPU (32GB).
You seem to be storing images in your graph as tf.constants. These will be persistent, and will cause memory issues like you're experiencing. Instead, I would recommend either placeholders or queues. Queues are very flexible, and can be very high performance, but can also get quite complicated. You may want to start with just a placeholder.
For a full-complexity example of an image input pipeline, you could look at the Inception model.

Real Time Object detection using TensorFlow

I have just started experimenting with Deep Learning and Computer Vision technologies. I came across this awesome tutorial. I have setup the TensorFlow environment using docker and trained my own sets of objects and it provided greater accuracy when I tested it out.
Now I want to make the same more real-time. For example, instead of giving an image of an object as the input, I want to utilize a webcam and make it recognize the object with the help of TensorFlow. Can you guys guide me with the right place to start with this work?
You may want to look at TensorFlow Serving so that you can decouple compute from sensors (and distribute the computation), or our C++ api. Beyond that, tensorflow was written emphasizing throughput rather than latency, so batch samples as much as you can. You don't need to run tensorflow at every frame, so input from a webcam should definitely be in the realm of possibilities. Making the network smaller, and buying better hardware are popular options.