Equivalents of Google Colab with more RAM - google-colaboratory

Im currently working on a project creating simulations from defined models. Most of the testing on down-sized models has taken place in google colab to make use of the GPU accelerator option. However, when up-scaling to the full sized models I now exceed the maximum RAM for google colab. Is there an equivalent service which allows for 25GB RAM rather than 12GB. Access to GPU acceleration is still essential
Note: I am outside the US so Google Colab Pro is not an option

Related

What are paid alternative services to KAggle?

Kaggle provides 30 hours of GPU usage for free per week. Sometimes, I max out this quota and I need more, or I need a more powerful GPU.
I tried Google Colab Pro subscription, however, it was slower than Kaggle. Are there any cloud computing services for deep learning similar to Kaggle or Google Colab Pro? I prefer to use a fixed type of GPU for a fixed amount of hours for a fixed monthly fee, that is why I do not like AWS because it pay-as-you-use and in general AWS looks too complex for my needs which is learning deep learning by implementing and experimenting different architectures.

Colab pro very slow

I am using colab pro to train a model, and it is endless despite of a simple regression model.
I am set on CPU and elevated ram but still super slow.
My file comes from my google drive, could be the problem here ?
Thanks,

Since TensorflowJS can use the GPU via WebGL, why would I need an nVIDIA GPU?

So TensorFlowJS can use WebGL to do GPU computations and train deep learning models. Why isn't this more popular than using CUDA with an nVIDIA GPU? Most people just trying to prototype machine learning models would love to do so on their personal computer, but many of us resort to using expensive cloud services like AWS (although more recently Google Colab helps) for ML training if we don't have a computer with an nVIDIA GPU. I'm sure nVIDIA GPUs are faster than whatever GPU is in my Macbook, but probably any GPU will offer at least an order of magnitude speedup over even a fast CPU and allow for model prototyping, so why aren't well using WebGL GPGPU? There must be a catch I just don't know about.
WebGL backend uses GLSL language to define functions and upload data as shaders - it "works", but you pay huge cost to compile GSLS and upload shaders: warmup time for semi-complex models is immense (we're talking about minutes just to startup). And then memory overhead is 100-200% of what model would normally need - and for larger models, you're GPU memory bound, you don't want to waste that.
Btw, actual inference time once model is warmed up and it fits in memory is ok using WebGL
On the other hand nVidia CUDA libraries provide direct access to GPU, so TF compiled to use them is always going to be much more efficient.
Unfortunately, not many GPU vendors provide libraries like CUDA, so most ML is done on nVidia GPUs
Then there is a next level when you're using TPU instead of GPU - then there is no WebGL to start with
If I select WebGPU with the TFJS benchmark (https://tensorflow.github.io/tfjs/e2e/benchmarks/local-benchmark/index.html) it responds with "WebGPU is not supported. Please use Chrome Canary browser with flag "--enable-unsafe-webgpu" enabled...."
So when that's ready will it be competitive with CUDA? On my laptop it is about 15% faster than WebGL on that benchmark.

Training on Google Colab is slower than on local machine despite having better GPU - why?

I've got a DL model to train and since the data is quite large I store it on my Google Disk which I mount to my Google Colab instance at the beginning of each session. However, I have noticed that the training of the exact same model with exact same script is 1.5-2 times slower on Google Colab than on my personal laptop. The thing is that I checked the Google Colab GPU and it has 12GB RAM (I'm not sure how can I check the exact model), while my laptop GPU is RTX 2060 which has only 6GB RAM. Therefore, as I'm new user of Google Colab, I've been wondering what might be the reason. Is this because data loading from mounted Disk Google with torch DataLoader slows down the process? Or maybe this is because my personal harddrive is SSD and Google Colab might not have SSD attached to my instance? How can I validate further if I'm not doing anything with my Google Colab setup that slows down the training?
The resources for Google Colaboratory are dynamically assigned to user instances. Short, interactive processes are preferred over long running data loading and processes further info can be found in the documentation:
https://research.google.com/colaboratory/faq.html#resource-limits
Specifically quoted from the above link
"GPUs and TPUs are sometimes prioritized for users who use Colab interactively rather than for long-running computations, or for users who have recently used less resources in Colab...As a result, users who use Colab for long-running computations, or users who have recently used more resources in Colab, are more likely to run into usage limits"

TPU custom chip available with Google Cloud ML

Which type of Hardware is used as part of Google Cloud ML when using TensorFlow? Only CPU or Tensor Processing Unit (custom cards) are also available?
cf this article
Cloud ML currently focuses on CPUs. GPUs and TPUs will be available in the future.
Cloud TPUs are available to the public as of 2018-06-27: https://cloud.google.com/tpu/docs/release-notes
This was announced at Google Next '18:
https://www.blog.google/products/google-cloud/empowering-businesses-and-developers-to-do-more-with-ai/
At the time of writing (December 2017), GPUs are available, see https://cloud.google.com/ml-engine/docs/training-overview
If you use the gcloud command line utility, you can e.g. add the --scale-tier BASIC_GPU option when submitting jobs to mlengine. This currently gives runs your tensorflow code on a Tesla K80.
There is also a CUSTOM scale tier which allows more complex configurations and gives access to P100 GPUs (see https://cloud.google.com/ml-engine/docs/using-gpus ).
The TPU service is in 'alpha' status according to https://cloud.google.com/tpu/ and one needs to sign up to learn more.