Lifetime of a colab VM without GPU or TPU - google-colaboratory

I've read frequently (here, here and at tons of other places) that the VMs at google colab timeout after 12h. However, it's always about TPU and GPU accelerated VMs.
What about not-hardware-accelerated ones? Is there a time limit? Is it also 12 hours?

Quoted directly from the Colaboratory FAQ:
Notebooks run by connecting to virtual machines that have maximum lifetimes that can be as much as 12 hours. Notebooks will also disconnect from VMs when left idle for too long. Maximum VM lifetime and idle timeout behavior may vary over time, or based on your usage.
In short, yes. Non-accelerated runtimes also time out after 12 hours.

Related

How long can I run Google Colab if I don't use GPU

I know there is a limit on how much GPU you can use on Google Colab but what if you are just running a regular CPU script. Is there a limit to how long I can run it for?
I found this question but it is unclear whether it's talking about with GPU or without GPU.
From their docs
Notebooks run by connecting to virtual machines that have maximum
lifetimes that can be as much as 12 hours. Notebooks will also
disconnect from VMs when left idle for too long. Maximum VM lifetime
and idle timeout behavior may vary over time, or based on your usage.
If your notebook is not idle: 12 hours. If it is: 90 minutes after it becomes idle. This applies to using GPU or CPU.

Virtual Enviroment CPU Allocation

I am currently attempting to spec out a virtual environment and I am having a hard time understanding how many cores or "cpu's" I can apply to virtual machines.
Can someone let me know how many usable cores I have in the attached image spec?
In other words, how many cores can I assign to VMs before I hit my limit, or run into issue with performance.
Server spec
(2) xeon silver 4214 2.2 12c per Server
4 servers total. Based on this I should have 192 virtual cores that I can allocate? Or am I wrong??
you have 48 logical processors on 1 server with listed CPU. Now think this way - you might have other VMs that will consume some amount of resources like CPU and RAM. If you will assign lets say 16 cpus to a VM, will other hosts in your cluster (I assume you clustered all 4 hosts) be able to handle the load of other VMs+ this with 16cpus?
You should check VMs usage on idle and with some load so then you could do some calculation of how many cpus each VM should have before you gonna experience major performance issues.

What is the official way to do long-running training on google colab?

I have a neural net that takes about 7-15 days to train on several GPU's.
Google colab disconnects after running the script for a few hours. There are a couple "hacks" that you can do to keep the session alive but this is obviously not the official standard.
Once I have written my script in google colab, how should I go about running the script for a long period of time? I'm connected and using google's deep learning VM's.
There is currently no way of running scripts for such long times (i.e. days) in the free version of Colab; in fact, it is clear from the Resource Limits section of the official FAQ that the maximum running time is 12 hours (emphasis added):
How long can notebooks run in Colab?
Notebooks run by connecting to virtual machines that have maximum
lifetimes that can be as much as 12 hours. Notebooks will also
disconnect from VMs when left idle for too long. Maximum VM lifetime
and idle timeout behavior may vary over time, or based on your usage.
This is necessary for Colab to be able to offer computational
resources for free. Users interested in longer VM lifetimes and more
lenient idle timeout behaviors that don’t vary as much over time may
be interested in Colab Pro.
So, if you really need running times in the order of days, you should consider Colab Pro.

Is there any limitations for google colab other than the session timeout after 12 hours?

one of the limitations is that we can get only 12 continuous hours per session. Is there any limitations for the usage for GPU and TPU?
Yes, you can only use 1 GPU with a limited memory of 12GB and TPU has 64 GB High Bandwidth Mmeory.You can read here in this article.
So, if you want to use large dataset then I would recommend you to use tf.data.Dataset for preparing it before training.
If you want to use GPUs you can use any TF version. But for TPU I would recommend using TF1.14.
From Colab's documentation,
In order to be able to offer computational resources for free, Colab needs to maintain the flexibility to adjust usage limits and hardware availability on the fly. Resources available in Colab vary over time to accommodate fluctuations in demand, as well as to accommodate overall growth and other factors.
In a nutshell, Colab has dynamic resource provisioning. So they can change the hardware, it it is being taxed too much automatically.
Google giveth and Google taketh away.
Link

Google-colaboratory: No backend with GPU available

Here it is described how to use gpu with google-colaboratory:
Simply select "GPU" in the Accelerator drop-down in Notebook Settings (either through the Edit menu or the command palette at cmd/ctrl-shift-P).
However, when I select gpu in Notebook Settings I get a popup saying:
Failed to assign a backend
No backend with GPU available. Would you like to use a runtime with no accelerator?
When I run:
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
Of course, I get GPU device not found. It seems the description is incomplete. Any ideas what needs to be done?
You need to configure the Notebook with GPU device
Click Edit->notebook settings->hardware accelerator->GPU
You'll need to try again later when a GPU is available. The message indicates that all available GPUs are in use.
The FAQ provides additional info:
How may I use GPUs and why are they sometimes unavailable?
Colaboratory is intended for interactive use. Long-running background
computations, particularly on GPUs, may be stopped. Please do not use
Colaboratory for cryptocurrency mining. Doing so is unsupported and
may result in service unavailability. We encourage users who wish to
run continuous or long-running computations through Colaboratory’s UI
to use a local runtime.
There seems to be a cooldown on continuous training with GPUs. So, if you encounter the error dialog, try again later, and perhaps try to limit long-term training in subsequent sessions.
Add some pictures to make it clearer
My reputation is just slightly too low to comment, but here's a bit of additional info for #Bob Smith's answer re cooldown period.
There seems to be a cooldown on continuous training with GPUs. So, if you encounter the error dialog, try again later, and perhaps try to limit long-term training in subsequent sessions.
Based on my own recent experience, I believe Colab will allocate you at most 12 hours of GPU usage, after which there is roughly an 8 hour cool-down period before you can use compute resources again. In my case, I could not connect to an instance even without a GPU. I'm not entirely sure about this next bit but I think if you run say 3 instances at once, your 12 hours are depleted 3 times as fast. I don't know after what period of time the 12 hour limit resets, but I'd guess maybe a day.
Anyway, still missing a few details but the main takeaway is that if you exceed you'll limit, you'll be locked out from connecting to an instance for 8 hours (which is a great pain if you're actively working on something).
After Reset runtime didn't work, I did:
Runtime -> Reset all runtimes -> Yes
I then got a happy:
Found GPU at: /device:GPU:0
This is the precise answer to your question man.
According to a post from Colab :
overall usage limits, as well as idle timeout periods, maximum VM
lifetime, GPU types available, and other factors, vary over time.
GPUs and TPUs are sometimes prioritized for users who use Colab
interactively rather than for long-running computations, or for users
who have recently used less resources in Colab. As a result, users who
use Colab for long-running computations, or users who have recently
used more resources in Colab, are more likely to run into usage limits
and have their access to GPUs and TPUs temporarily restricted. Users
with high computational needs may be interested in using Colab’s UI
with a local runtime running on their own hardware.
Google Colab has by default tensorflow 2.0, Change it to tensorflow 1. Add the code,
%tensorflow_version 1.x
Use it before any keras or tensorflow code.