How can I keep the gpu running for more than 12 hours in google colaboratory? - tensorflow

I am trying to train a model on google colaboratoy but the problem is that the GPU gets disconnected after 12 hours and hence I am not able to train my model beyond a certain point. Is there a way to keep GPU connected for longer times?

Related

Colab pro plus notebook disconnected after 13 hours of training

I took the colab pro plus offer and last night I wanted to train a model for 15 hours. But the notebook is disconnected after 13 hours. So I lost all my checkpoints and I have to start training again. I took the colab pro plus offer to avoid these inconveniences. How is it that this happened despite the colab pro plus offer? What can I do to prevent this from happening again?

Google Colab - Session Crashed error while training

I am trying to train a model on around 4500 text sentences. The embedding is however heavy. The session crashes for number of training clauses > 350. It works fine and displays results till 350 sentences though.
Error message:
Your session crashed after using all available RAM.
Runtime type - GPU
I have hereby attached the screenshot of logs.
I am considering training model by batches, but I am a newbie and finding it difficult to have my way around it. Any help will be appreciated. session logs attached
This is basically because of out of memory on Google colab.
Google colab provides ~12GB Free RAM, to extend it 25 GB follow the instruction mentioned here.

TensorFlow model serving on Google AI Platform online prediction too slow with instance batches

I'm trying to deploy a TensorFlow model to Google AI Platform for Online Prediction. I'm having latency and throughput issues.
The model runs on my machine in less than 1 second (with only an Intel Core I7 4790K CPU) for a single image. I deployed it to AI Platform on a machine with 8 cores and an NVIDIA T4 GPU.
When running the model on AI Platform on the mentioned configuration, it takes a little less than a second when sending only one image. If I start sending many requests, each with one image, the model eventually blocks and stops responding. So I'm instead sending batches of images on each request (from 2 to 10, depending on external factors).
The problem is that I expected the batched requests to be almost constant in time. When sending 1 image, the CPU utilization was around 10% and GPU 12%. So I expected that a batch of 9 images would use ~100% of the hardware and respond in the same time ~1 sec, but this is not the case. A batch of 7 to 10 images takes anywhere from 15 to 50 seconds to be processed.
I already tried to optimize my model. I was using map_fn, replaced that with manual loops, switched from Float 32 to Float 16, tried to vectorize the operations as much as possible, but it's still in the same situation.
What am I missing here?
I'm using the latest AI Platform runtime for online prediction (Python 3.7, TensorFlow 2.1, CUDA 10.1).
The model is a large version of YOLOv4 (~250MB in SavedModel format). I've built a few postprocessing algorithms in TensorFlow that operates on the output of the model.
Last but not least, I also tried debugging with TensorBoard, and it turns out that the YOLOv4 part of the TensorFlow Graph is taking ~90% of the processing time. I expected this particular part of the model to be highly parallel.
Thanks in advance for any help with this. Please ask me for any information that you may need to better understand the issue.
UPDATE 2020-07-13: as suggested in a comment below, I also tried running the model on CPU, but it's really slow and suffers of the same problems than with GPU. It doesn't seem to process images from a single request in parallel.
Also, I think I'm running into issues with TensorFlow Serving due to the rate and amount of requests. I used the tensorflow/serving:latest-gpu Docker image locally to test this further. The model answers 3 times faster on my machine (GeForce GTX 1650) than on AI Platform, but its really inconsistent with response times. I'm getting the following response times (<amount of images> <response time in milliseconds>):
3 9004
3 8051
11 4332
1 222
3 4386
3 3547
11 5101
9 3016
10 3122
11 3341
9 4039
11 3783
11 3294
Then, after running for a minute, I start getting delays and errors:
3 27578
3 28563
3 31867
3 18855
{
message: 'Request failed with status code 504',
response: {
data: { error: 'Timed out waiting for notification' },
status: 504
}
}
For others with the same problem as me when using AI Platform:
As stated in a comment from the Google Cloud team here, AI Platform does not execute batches of instances at once. They plan on adding the feature, though.
We've since moved on from AI Platform to a custom deployment of NVIDIA's Triton Inference Server hosted on Google Cloud Compute Engine. We're getting much better performance than we expected, and we can still apply many more optimizations to our model provided by Triton.
Thanks to everyone who tried to help by replying to this answer.
From the Google Cloud documentation:
If you use a simple model and a small set of input instances, you'll find that there is a considerable difference between how long it takes to finish identical prediction requests using online versus batch prediction. It might take a batch job several minutes to complete predictions that are returned almost instantly by an online request. This is a side-effect of the different infrastructure used by the two methods of prediction. AI Platform Prediction allocates and initializes resources for a batch prediction job when you send the request. Online prediction is typically ready to process at the time of request.
This has to do, like the quote says, with the difference in node allocations, specially with:
Node allocation for online prediction:
Keeps at least one node ready over a period of several minutes, to handle requests even when there are none to handle. The ready state ensures that the service can serve each prediction promptly.
You can learn more about that here
The model is a large version of YOLOv4 (~250MB in SavedModel format). I've built a few postprocessing algorithms in TensorFlow that operates on the output of the model.
What are the postprocessing modifications have you made to the YOLOv4? Is it possible that the source of the slowdown are from those operations? One test you can do to validate this hypothesis locally is to benchmark an unmodified version of YOLOv4 against the benchmarks you've already made for your modified version.
Last but not least, I also tried debugging with TensorBoard, and it turns out that the YOLOv4 part of the TensorFlow Graph is taking ~90% of the processing time. I expected this particular part of the model to be highly parallel.
It would be interesting to take a look at the "debugging output" you're mentioning here. If you use https://www.tensorflow.org/guide/profiler#install_the_profiler_and_gpu_prerequisites, what are the breakdown of the most expensive operations? I've had some experience digging into TF ops -- I've found some strange bottlenecks due to CPU <-> GPU data transfer bottlenecks in some cases. Would be happy to hop on a call sometime and take a look with you if you shoot me a DM.

machine learning and model training

I am working on a machine learning project where I am my training my model on Google Colab.
I have cloned the repository and model is build up with tensor flow framework.
However, my data-set is too large. Before running the model I have two questions which are coming to mind:
1) If I leave my model overnight to get trained, what is the smartest way to know that my training is completed/left in between? (Any notification through email . . or ?)
2) What happens, if the internet connection breaks in between
My Google search is not providing me understandable answer. I would appreciate any help with solutions for my queries.
Maximum of 2 instances can be run concurrently and are linked to your Google account. Keep backing up your weights, and re-train if it takes more than 12 hours.
For such long jobs, it's always better to invest in a VPS, but to answer your questions,
The maximum lifetime of a job on Colab with the browser open is 12 hours. Therefore, it's a good idea to periodically save your model weights. A script to backup weights while training is a good idea.
If the internet connection breaks, the notebook will run for 90 minutes before the instance is considered to be idle and will be recycled. It's similar to closing your browser.

Distributed training in Tensorflow using multiple GPUs in Google Colab

I have recently become interested in incorporating distributed training into my Tensorflow projects. I am using Google Colab and Python 3 to implement a Neural Network with customized, distributed, training loops, as described in this guide:
https://www.tensorflow.org/tutorials/distribute/training_loops
In that guide under section 'Create a strategy to distribute the variables and the graph', there is a picture of some code that basically sets up a 'MirroredStrategy' and then prints the number of generated replicas of the model, see below.
Console output
From what I can understand, the output indicates that the MirroredStrategy has only created one replica of the model, and thereofore, only one GPU will be used to train the model. My question: is Google Colab limited to training on a single GPU?
I have tried to call MirroredStrategy() both with, and without, GPU acceleration, but I only get one model replica every time. This is a bit surprising because when I use the multiprocessing package in Python, I get four threads. I therefore expected that it would be possible to train four models in parallel in Google Colab. Are there issues with Tensorflows implementation of distributed training?
On google colab, you can only use one GPU, that is the limit from Google. However, you can run different programs on different gpu instances so by creating different colab files and connect them with gpus but you can not place the same model on many gpu instances in parallel.
There are no problems with mirrored startegy, talking from personal experience it works fine if you have more than one GPU.