Install RAPIDS library on Googe Colab notebook

I was wondering if I could install RAPIDS library (executing machine learning tasks entirely on GPU) in Google Colaboratory notebook?
I've done some research but I've not been able to find the way to do that...

This is now possible with the new T4 instances
To enable cuGraph too, you can replace the wget command with:
!conda install -c nvidia/label/cuda10.0 -c rapidsai/label/cuda10.0 -c pytorch \
-c numba -c conda-forge -c numba -c defaults \
boost cudf=0.6 cuml=0.6 python=3.6 cugraph=0.6 -y

Dec 2019 update
New process for RAPIDS v0.11+
RAPIDS v0.11 has dependencies (pyarrow) which were
not covered by the prior install script,
the notebooks-contrib repo, which contains RAPIDS demo notebooks (e.g.
colab_notebooks) and the Colab install script, now follows RAPIDS standard version-specific branch structure*
and some Colab users still enjoy v0.10,
our honorable notebooks-contrib overlord taureandyernv has updated the script which now:
If running v0.11 or higher, updates pyarrow library to 0.15.x.
Here's the code cell to run in Colab for v0.11:
# Install RAPIDS
!wget -nc
import sys, os
dist_package_index = sys.path.index("/usr/local/lib/python3.6/dist-packages")
sys.path = sys.path[:dist_package_index] + ["/usr/local/lib/python3.6/site-packages"] + sys.path[dist_package_index:]
if os.path.exists(''): ## This file only exists if you're using RAPIDS version 0.11 or higher
exec(open("").read(), globals())
For a walk thru setting up Colab & implementing this script, see How to Install RAPIDS in Google Colab
-* e.g. branch-0.11 for v0.11 and branch-0.12 for v0.12 with default set to the current version

Looks like various subparts are not yet pip-installable so the only way to get them on colab would be to build them on colab, which might be more effort than you're interested in investing in this :) is the issue to watch for rapidsai/cudf (presumably the other rapidsai/ libs will follow suit).

Latest solution;
!wget -nc
import sys, os
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
was pushed a few days ago, see issues #104 or #110, or the full script for more info.
Note: instillation currently requires a Tesla T4 instance, checking for this can be done with;
# check gpu type
import pynvml
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
device_name = pynvml.nvmlDeviceGetName(handle)
# your dolphin is broken, please reset & try again
if device_name != b'Tesla T4':
raise Exception("""Unfortunately this instance does not have a T4 GPU.
Please make sure you've configured Colab to request a GPU instance type.
Sometimes Colab allocates a Tesla K80 instead of a T4. Resetting the instance.
If you get a K80 GPU, try Runtime -> Reset all runtimes...""")
# got a T4, good to go
print('Woo! You got the right kind of GPU!')


Tensorflow serving failing with std::bad_alloc

I'm trying to run tensorflow-serving using docker compose (served model + microservice) but the tensorflow serving container fails with the error below and then restarts.
microservice | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
tensorflow-serving | terminate called after throwing an instance of 'std::bad_alloc'
tensorflow-serving | what(): std::bad_alloc
tensorflow-serving | /usr/bin/ line 3: 7 Aborted
(core dumped) tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$#"
I monitored the memory usage and it seems like there's plenty of memory. I also increased the resource limit using Docker Desktop but still get the same error. Each request to the model is fairly small as the microservice is sending tokenized text with batch size of one. Any ideas?
I was encountering the same problem, and this fixed worked for me:
uninstalled and reinstalled:
tensorflow, tensorflow-gpu, etc to 2.9.0, (and trained and built my model)
docker pull and docker run tensorflow/serving:2.8.0 (this did the trick and finally got rid of this problem.)
Had the same error when using tensorflow/serving:latest. Based on Hanafi's response, I used tensorflow/serving:2.8.0 and it worked.
For reference, I used
sudo docker run -p 8501:8501 --mount type=bind,source= \
-e MODEL_NAME=[MODEL_NAME] -t tensorflow/serving:2.8.0
The issue is solved for TensorFlow and TensorFlow Serving 2.11 (not yet released) and fix is included in nightly release of TF serving. You can build nightly docker image or use pre-compiled version.
Also TensorFlow 2.9 and 2.10 was patched to fix this issue. Refer PR here.[1, 2]

I failed to convert caffe model into mlmodel using coremltools 5

I try to convert caffe model. I am using coremltools v5.
this is my code
import coremltools
caffe_model = ('oxford102.caffemodel', 'deploy.prototxt')
labels = 'flower-labels.txt'
coreml_model = coremltools.converters.caffe.convert(
I convert using below command
And i get an error message like below.
error message
Does anybody face this problem and have solution on it?
I just came across this as I was having the same problem. The caffe support is not available in the newer versions of coremltools API. To make this code run an older version of coremltools (such as 3.4) must be used, which requires using Python 2.7 - which is best done in a virtual environment.
I assume you've solved your issue already, but I added this in case anyone else stumbles onto this question.
There are several solutions according to your case:
I had the same issue on my M1 Mac. You can resolve the same by duplicating your Terminal, and running it with Rosetta.(This worked for me)
cd ~/.virtualenvs/<your venv name here>/bin
mkdir bk; cp python bk; mv -f bk/python .;rmdir bk
codesign -s - --preserve-metadata=identifier,entitlements,flags,runtime -f python
Fore more solutions and issue you can watch this issue on github
I had the same error running python 3.7
In the virtualenv, solution is to run:
pip install coremltools==3.0
Don't have to change python versions and just rerun the script

Problems at running ImageDataBunch in Deepnote

I'm having trouble running this line of code in Deepnote, does anyone know why?
data = ImageDataBunch.from_folder(path, train="train", valid ="test",ds_tfms=get_transforms(), size=(256,256), bs=32, num_workers=4).normalize()
The error says:
NameError: name 'ImageDataBunch' is not defined
And previously, I have imported the Fastai library. So I don't get it!
The FastAI setup in Deepnote is not that straightforward. It's best to use a custom environment where you set stuff up in a Dockerfile and everything works afterwards in the notebook. I am not sure if the ImageDataBunch or whatever you're trying to do works the same way in FastAI v1 and v2, but here are the details for v1.
This is a Dockerfile which sets up the FastAI environment via conda:
# This is Dockerfile
FROM deepnote/python:3.9
RUN wget -O ~/
RUN bash ~/ -b -p $HOME/miniconda
ENV PATH $HOME/miniconda/bin:$PATH
RUN $HOME/miniconda/bin/conda install python=3.9 ipykernel -y
RUN $HOME/miniconda/bin/conda install -c fastai -c pytorch fastai -y
RUN $HOME/miniconda/bin/python -m ipykernel install --user --name=conda
After that, you can test the fastai imports in the notebook:
import fastai
from import *
And if you download and unpack this sample MNIST dataset, you should be able to load the data like you suggested:
data = ImageDataBunch.from_folder(path, train="train", valid ="test",ds_tfms=get_transforms(), size=(256,256), bs=32, num_workers=4).normalize()
Feel free to check out or clone my Deepnote project to continue working on this.

How to set up Spark to use pandas managed by anaconda?

We've updated the Spark version from 2.2 to 2.3, but admins didn't update the pandas. So our jobs fail with the following error:
ImportError: Pandas >= 0.19.2 must be installed; however, your version was 0.18.1
Our admin team suggested to created a VM downloading latest version from anaconda (using the command conda create -n myenv anaconda).
I did that and after activating the local environment using source activate myenv when I logged into pyspark2 then I found it was picking the new version of pandas.
But when I am submitting a job using spark2-submit command then it is not working. I did added the below configuration in the spark2-submit command
--conf spark.pyspark.virtualenv.enabled=true
--conf spark.pyspark.virtualenv.type=conda
--conf spark.pyspark.virtualenv.requirements=/home/<user>/.conda/requirements_conda.txt --conf spark.pyspark.virtualenv.bin.path=/home/<user>/.conda/envs/myenv/bin
Also I did zipped whole python 2.7 folder and passed that in the --py-files option along with other .py files --py-files /home/<user>/, but still getting the same version issue for pandas.
I tried to follow the instruction specified in the URL , but still no luck yet.
How to fix it and be able to spark2-submit with the proper pandas?
I think you may need to define environment variables such as SPARK_HOME and PYTHONPAH pointing to corresponding locations in your virtualenv.
export SPARK_HOME=path_to_spark_in_virtualenv

Shipping and using virtualenv in a pyspark job

PROBLEM: I am attempting to run a spark-submit script from my local machine to a cluster of machines. The work done by the cluster uses numpy. I currently get the following error:
Importing the multiarray numpy extension module failed. Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control). Otherwise reinstall numpy.
Original error was: cannot import name multiarray
In my local environment I have setup a virtualenv that includes numpy as well as a private repo I use in my project and other various libraries. I created a zip file (lib/ from the site-packages directory at venv/lib/site-packages where 'venv' is my virtual environment. I ship this zip to the remote nodes. My shell script for performing the spark-submit looks like this:
$SPARK_HOME/bin/spark-submit \
--deploy-mode cluster \
--master yarn \
--conf spark.pyspark.virtualenv.enabled=true \
--conf spark.pyspark.virtualenv.type=native \
--conf spark.pyspark.virtualenv.requirements=${parent}/requirements.txt \
--conf spark.pyspark.virtualenv.bin.path=${parent}/venv \
--py-files "${parent}/lib/" \
--num-executors 1 \
--executor-cores 2 \
--executor-memory 2G \
--driver-memory 2G \
I also know that on the remote nodes there is a /usr/local/bin/python2.7 folder that includes a python 2.7 install.
so in my conf/ I have set the following:
export PYSPARK_PYTHON=/usr/local/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7
When I run the script I get the error above. If I screen print the installed_distributions I get a zero length list []. Also my private library imports correctly (which says to me it is actually accessing my site-packages.). My file looks something like this:
from myprivatelibrary.bigData.spark import spark_context
spark = spark_context()
import numpy as np
spark.parallelize(range(1, 10)).map(lambda x: np.__version__).collect()
I expect this to import numpy correctly especially since I know numpy works correctly in my local virtualenv. I suspect this is because I'm not actually using the version of python that is installed in my virtualenv on the remote node. My question is first, how do I fix this and second how do I use my virtualenv installed python on the remote nodes instead of the python that is just manually installed and currently sitting on those machines? I've seen some write-ups on this but frankly they are not well written.
With --conf spark.pyspark.{} and export PYSPARK_PYTHON=/usr/local/bin/python2.7 you set options for your local environment / your driver. To set options for the cluster (executors) use the following syntax:
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON
Furthermore, I guess you should make your virtualenv relocatable (this is experimental, however). <edit 20170908> This means that the virtualenv uses relative instead of absolute links. </edit>
What we did in such cases: we shipped an entire anaconda distribution over hdfs.
<edit 20170908>
If we are talking about different environments (MacOs vs. Linux, as mentioned in the comment below), you cannot just submit a virtualenv, at least not if your virtualenv contains packages with binaries (as is the case with numpy). In that case I suggest you create yourself a 'portable' anaconda, i.e. install Anaconda in a Linux VM and zip it.
Regarding --archives vs. --py-files:
--py-files adds python files/packages to the python path. From the spark-submit documentation:
For Python applications, simply pass a .py file in the place of instead of a JAR, and add Python .zip, .egg or .py files to the search path with --py-files.
--archives means these are extracted into the working directory of each executor (only yarn clusters).
However, a crystal-clear distinction is lacking, in my opinion - see for example this SO post.
In the given case, add the via --archives, and your 'other python files' via --py-files.
See also: Running Pyspark with Virtualenv, a blog post by Henning Kropp.