CUDNN_STATUS_NOT_SUPPORTED in Google Colab, dlib face detection project - google-colaboratory

I was trying out dlib‘s deep learning-based face detection MMOD, and it worked perfectly fine without any errors. After the weekend, I rerun my google colab, and I get the following error:
RuntimeError: Error while calling cudnnConvolutionBiasActivationForward( context(), &alpha1, descriptor(data), data.device(), (const cudnnFilterDescriptor_t)filter_handle, filters.device(), (const cudnnConvolutionDescriptor_t)conv_handle, (cudnnConvolutionFwdAlgo_t)forward_algo, forward_workspace, forward_workspace_size_in_bytes, &alpha2, out_desc, out, descriptor(biases), biases.device(), identity_activation_descriptor(), out_desc, out) in file /tmp/pip-install-fdw8qrx_/dlib_e3176ea453c4478d8dbecc372b81297e/dlib/cuda/cudnn_dlibapi.cpp:1237. code: 9, reason: CUDNN_STATUS_NOT_SUPPORTED
literally same code previously saved in GitHub, and now in google colab
Any ideas about what could have happened over the weekend, and how to fix it? Thank you!

So after I tried EVERYTHING I could come up with (trying the code on a different machine, on a different platform, check if there were any library updates), I went through my github committed version, and realized, that the dlib library was updated, but not announced anywhere...
So yeah, note for the future self: always include the .version afterimporting the tools, might save DAYS of trying to figure out what on earth happened

Related

Frequent disconnects while busy running in Google Colab

In the last week or two I have seen frequent disconnects while trying to run a lengthy training run. A month or two ago this seemed to be working pretty reliably. My code has definitely changed but those internal details seem unrelated to the operation of Colab.
(On the other hand, I did switch my local machine from an Intel MacBook Pro running Big Sur to an M1 (Apple Silicon) MacBook Pro running Monterey. I assume that does not matter to Colab running in the cloud, via a Chrome browser.)
I see two kinds of disconnects:
There are “faux disconnects” which seem like false positives from
the disconnect detector. These last less than a second, then the
computation continues apparently unscathed. A black notification
slides up from the lower left corner of then window, then slides
back. See a link to a video of this below.
Then there are “real disconnects.” I start a computation that I
expect to run for several hours. I see “faux disconnects” happen
frequently. But less than an hour into the computation, I find
the Colab window idle, no status information, and a Reconnect button
in the upper right corner.
Link to video. I started this session around 1:03 pm. This video was recorded at 1:35 pm. Normally the training session should have run for several hours. Instead it died at 1:52 pm (~50 minutes into the run). See some additional comments in an issue at GitHub.
Can anyone help me understand how to get past this? I am currently unable to make progress in my work because I cannot complete a training run before my Colab runtime decides to disconnect.
Edit:
FYI: since once a “real disconnect” happens it is too late to look at the (no longer connected) runtime's log, and since this seems to run for about an hour before disconnecting, I saved a log file when a run was about 10 minutes in.
Edit on August 1, 2022:
My real problem is the “real disconnect” on my real Colab notebook. But my notebook is overly complicated, so not a good test case. I tried to make a small test case, see Colab notebook: DisconnectTest.ipynb. It contains a generic NIST-based Keras/TensorFlow benchmark from the innertubes. I made a screen grab video of the first 2.5 minutes of a run. While this run completes OK — that is, there are no “real disconnects” — it had several “faux disconnects.” The first one is at 1:36. These seem fairly benign, but they do disrupt the Resources panel on the right. This makes it hard to know if the source of the “real disconnect” has anything to do with exhausting resources.
As I described in a parallel post on Colab's Issue #2965 on Github, this appears to be “some interaction between Colab and Chrome (and perhaps macOS Monterey (Version 12.5.1), and perhaps M1 Apple Silicon). Yet Colab seems to work fine on M1/Monterey/Safari.”
As described there, a trivial Colab example fails on Chrome browser but works fine on Safari.

How to get "Tesla P100" in Google Colab in programmaticly way?

Maybe you heard that Google Colab has P100 GPU, It is way more faster than other all GPUs except V100 (V100 is avaliable in only Colab Pro.). As Its powerful, its pretty rare in Colab Free (P100). I didnt get "Tesla P100" in Colab before. So I tried to code a program that Factory Resets Runtime until getting "Tesla P100-PCIE..." text in nvidia-smi (If you create a cell which contains !nvidia-smi in code, You'll get your GPU's model) . I tried to do with Selenium but It failed cause of "This browser may not be secure" error. Then I tried to do with Javascript (Google DevConsole) but It failed cause of an error that I dont know what does it mean. So Im here.
[Q] How to get "Tesla P100" in Google Colab in programmaticly way?

Why Colab stops training and the browser becomes unresponsive?

I am trying to train a model for image recognition using Yolo version 3 with this notebook:
https://drive.google.com/file/d/1YnZLp6aIl-iSrL4tzVQgxJaE1N2_GfFH/view?usp=sharing
But for some reason, everything works fine but the final training. The training starts, and after 5-10 minutes (randomly) it stops working. The browser becomes unresponsive (I am unable to do anything inside that tab), and after several minutes Colab completely disconnects.
I have tried this 10 and more times and I always get the same result. I tried it on both Chrome Canary and regular Chrome (last versions), as well inside anonymous windows, but I always get the same result.
Any ideas? Why is that happening?
Eager to know your thoughts about this.
All the best,
Fab.
Problem solved. Tried the same process on Firefox and discovered that the auto-saving feature of Google drive was conflicting with the process! So... I had to simply use the "playground" of colab instead as explained here:
https://stackoverflow.com/questions/58207750/how-to-disable-autosave-in-google-colab#:~:text=1%20Answer&text=Open%20the%20notebook%20in%20playground,Save%20a%20copy%20in%20Drive.
No idea why Chrome didn't give me any feedback about that, but Firefox saved my day!
Following #fabrizio-ferrari answer, I disabled output saving and the problem persisted.
Runtime -> Change runtime type -> Omit code cell output when saving this notebook
I moved to firefox and the problem disappeared.

Installing tensor flow in Beagle Bone Black

For a project of mine, I am working on Beagle Bone Black(BBB), my objective is to detect fire from real-time video. I tried installing tensor flow. But neither the normal installation nor installing the pre-compiled binary gave positive results. It gives an error saying, 'is not a supported wheel on this platform'. Does BBB support tensorflow? If yes, Could you please help me with this issue?
If not, can anyone of you suggest an object detection API which is supported by BBB?
Thanks in advance.
screenshot of the error
Some people have seen success installing tensorflow in BBB following the guide found on github. You can find it here https://github.com/samjabrahams/tensorflow-on-raspberry-pi/blob/master/GUIDE.md. Hope it helps!
Please note that this should be added as a comment but unfortunately cannot do that (yet).

Unable to Unrar in Google Colab

I'm using Google Colab to learn and tinker with ML and TensorFlow. I had a huge dataset in multiple multi-part rar files. I tried simply
!unrar e zip-file 'extdir'
but after successfully extracting a couple of archives it starts throwing up errors, specifically input/output errors.
Does google block you after a couple GBs unrar-ed?
I have already tried resetting the runtime environment and changing the runtime from Py2 to Py3 but nothing made a difference
True, it doesn't work after a couple of runs.
Try unrar-free, the free version of unrar.
Checkout the help manual below:
https://helpmanual.io/help/unrar-free/
No Google doesn't block you for extracting large files. Also, unrar-free gave the same error as before. So, you can install p7zip and extract rarv5. Or you can also use 7z. This solved the exact same problem that I was also facing. (I had a rar file ~20 GiB).
!apt install p7zip-full p7zip-rar
or
!7z e zip-file