Tensorflow serving failing with std::bad_alloc - tensorflow

I'm trying to run tensorflow-serving using docker compose (served model + microservice) but the tensorflow serving container fails with the error below and then restarts.
microservice | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
tensorflow-serving | terminate called after throwing an instance of 'std::bad_alloc'
tensorflow-serving | what(): std::bad_alloc
tensorflow-serving | /usr/bin/tf_serving_entrypoint.sh: line 3: 7 Aborted
(core dumped) tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$#"
I monitored the memory usage and it seems like there's plenty of memory. I also increased the resource limit using Docker Desktop but still get the same error. Each request to the model is fairly small as the microservice is sending tokenized text with batch size of one. Any ideas?

I was encountering the same problem, and this fixed worked for me:
uninstalled and reinstalled:
tensorflow, tensorflow-gpu, etc to 2.9.0, (and trained and built my model)
docker pull and docker run tensorflow/serving:2.8.0 (this did the trick and finally got rid of this problem.)

Had the same error when using tensorflow/serving:latest. Based on Hanafi's response, I used tensorflow/serving:2.8.0 and it worked.
For reference, I used
sudo docker run -p 8501:8501 --mount type=bind,source= \
[PATH_TO_MODEL_DIRECTORY],target=/models/[MODEL_NAME] \
-e MODEL_NAME=[MODEL_NAME] -t tensorflow/serving:2.8.0

The issue is solved for TensorFlow and TensorFlow Serving 2.11 (not yet released) and fix is included in nightly release of TF serving. You can build nightly docker image or use pre-compiled version.
Also TensorFlow 2.9 and 2.10 was patched to fix this issue. Refer PR here.[1, 2]

Related

Setting up GPU support in Airflow containers with Docker-compose - (GPU support with Tensorflow)

I am having some difficulties in starting airflow using docker-compose with appropriate GPU libraries to run my machine learning tasks.
The airflow-scheduler throws this error:
airflow-scheduler_1 | 2022-03-21 12:33:36.919960: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Basically, there is no CUDA libraries installed in the /usr/local within the airflow container hence the error. I have installed nvidia-container runtime and set the deamon default runtime in deamon.json file
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \ sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \ sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list sudo apt-get update
And I have managed to use the runtime:nvidia in the docker-compose.yaml file. This way within the airflow container I can see nvidia-smi. However CUDA libraries are still missing.
Is there a way to install these libraries automatically (ideally FROM tensorflow/tensorflow:latest-gpu) as these set the CUDA libraries within the container?
On the other hand, if I am not using docker-compose I can start a container with docker:
docker run -it --gpus all tensorflow/tensorflow:latest-gpu
This container has all the libraries that I need. However, I would like to use docker-compose as life will be much easier to run multiple containers and setting up all network. So I would like to avoid this approach.
Also I can use the docker in airflow and mount the docker socket to airflow container such that I can initialise a new container from the airflow. This way, I can have all the CUDA libraries also installed however, it sounds very counter-intuitive and I am having difficulties understanding why I can't set all these within the airflow container originally.
client = docker.from_env()
# run the container
response = client.containers.run(
# The container you wish to call
'tensorflow/tensorflow:latest-gpu',
# The command to run inside the container
'find / -name "libcudart.so.11.0"',
# Passing the GPU access
device_requests=[
docker.types.DeviceRequest(count=-1, capabilities=[['gpu']])
]
)
I would appreciate if you can assist me in the right direction.

I failed to convert caffe model into mlmodel using coremltools 5

I try to convert caffe model. I am using coremltools v5.
this is my code
import coremltools
caffe_model = ('oxford102.caffemodel', 'deploy.prototxt')
labels = 'flower-labels.txt'
coreml_model = coremltools.converters.caffe.convert(
caffe_model,
class_labels=labels,
image_input_names='data'
)
coreml_model.save('FlowerClassifier.mlmodel')
I convert using below command
python3 convert-script.py
And i get an error message like below.
error message
Does anybody face this problem and have solution on it?
I just came across this as I was having the same problem. The caffe support is not available in the newer versions of coremltools API. To make this code run an older version of coremltools (such as 3.4) must be used, which requires using Python 2.7 - which is best done in a virtual environment.
I assume you've solved your issue already, but I added this in case anyone else stumbles onto this question.
There are several solutions according to your case:
I had the same issue on my M1 Mac. You can resolve the same by duplicating your Terminal, and running it with Rosetta.(This worked for me)
cd ~/.virtualenvs/<your venv name here>/bin
mkdir bk; cp python bk; mv -f bk/python .;rmdir bk
codesign -s - --preserve-metadata=identifier,entitlements,flags,runtime -f python
Fore more solutions and issue you can watch this issue on github
I had the same error running python 3.7
In the virtualenv, solution is to run:
pip install coremltools==3.0
Don't have to change python versions and just rerun the script

Tensorflow Serving Compiling Failure For CPU AVX AVX2

I use the method in the tfx official document to compile the tfx devel in docker file. The OS is MacOS, intel CPU.
here is the docker build code for it
#!/bin/bash
USER=$1
TAG=$2
TF_SERVING_VERSION_GIT_BRANCH="2.4.1"
git clone --branch="${TF_SERVING_VERSION_GIT_BRANCH}" https://github.com/tensorflow/serving
TF_SERVING_BUILD_OPTIONS="--copt=-mavx --local_ram_resources=4096"
cd serving && \
docker build --pull -t $USER/tensorflow-serving-devel:$TAG \
--build-arg TF_SERVING_VERSION_GIT_BRANCH="${TF_SERVING_VERSION_GIT_BRANCH}" \
--build-arg TF_SERVING_BUILD_OPTIONS="${TF_SERVING_BUILD_OPTIONS}" \
-f tensorflow_serving/tools/docker/Dockerfile.devel .
Then I run the shell script with >3hrs and get the following failure:
Actually I cannot know the detail because the log file from docker is clipped by the builder.
Does anyone met the similar problem and can help on this topic?
Thanks a lot in advance!
These instruction sets are not available on all machines, especially with older processors.
If you'd like to apply generally recommended optimizations, including utilizing platform-specific instruction sets for your processor, you can add --config=nativeopt to Bazel build commands when building TensorFlow Serving.
tools/run_in_docker.sh bazel build --config=nativeopt tensorflow_serving/...

TensorFlow serving S3 and Docker

I’m trying to find a way to use Tensorflow serving with the ability to add new models and new versions of models. Can I point tensorflow serving to an S3 bucket?
Also I need it to run as a container? Is this possible or do I need to implement another program to pull down the model and add it to a shared volume and ask tensorflow to update models in the file system?
Or do I need to build my own docker image to be able to pull the content from s3?
I found that I could use the TF S3 connection information (even though it isn't outlined in the TF Serving Docker Container). Example docker run command:
docker run -p 8501:8501 -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e MODEL_BASE_PATH=s3://path/bucket/models -e MODEL_NAME=model_name -e S3_ENDPOINT=s3.us-west-1.amazonaws.com -e AWS_REGION=us-west-1 -e TF_CPP_MIN_LOG_LEVEL=3 -t tensorflow/serving
Note Log level was set because of this bug
I've submitted a very detailed answer (but using DigitalOcean Spaces instead of S3), here:
How to deploy TensorFlow Serving using Docker and DigitalOcean Spaces
Since the implementation piggy-backs off an S3-like interface, I thought I'd add the link here in case someone needs a more comprehensive example.

Install RAPIDS library on Googe Colab notebook

I was wondering if I could install RAPIDS library (executing machine learning tasks entirely on GPU) in Google Colaboratory notebook?
I've done some research but I've not been able to find the way to do that...
This is now possible with the new T4 instances https://medium.com/rapids-ai/run-rapids-on-google-colab-for-free-1617ac6323a8
To enable cuGraph too, you can replace the wget command with:
!conda install -c nvidia/label/cuda10.0 -c rapidsai/label/cuda10.0 -c pytorch \
-c numba -c conda-forge -c numba -c defaults \
boost cudf=0.6 cuml=0.6 python=3.6 cugraph=0.6 -y
Dec 2019 update
New process for RAPIDS v0.11+
Because
RAPIDS v0.11 has dependencies (pyarrow) which were
not covered by the prior install script,
the notebooks-contrib repo, which contains RAPIDS demo notebooks (e.g.
colab_notebooks) and the Colab install script, now follows RAPIDS standard version-specific branch structure*
and some Colab users still enjoy v0.10,
our honorable notebooks-contrib overlord taureandyernv has updated the script which now:
If running v0.11 or higher, updates pyarrow library to 0.15.x.
Here's the code cell to run in Colab for v0.11:
# Install RAPIDS
!wget -nc https://raw.githubusercontent.com/rapidsai/notebooks-contrib/890b04ed8687da6e3a100c81f449ff6f7b559956/utils/rapids-colab.sh
!bash rapids-colab.sh
import sys, os
dist_package_index = sys.path.index("/usr/local/lib/python3.6/dist-packages")
sys.path = sys.path[:dist_package_index] + ["/usr/local/lib/python3.6/site-packages"] + sys.path[dist_package_index:]
sys.path
if os.path.exists('update_pyarrow.py'): ## This file only exists if you're using RAPIDS version 0.11 or higher
exec(open("update_pyarrow.py").read(), globals())
For a walk thru setting up Colab & implementing this script, see How to Install RAPIDS in Google Colab
-* e.g. branch-0.11 for v0.11 and branch-0.12 for v0.12 with default set to the current version
Looks like various subparts are not yet pip-installable so the only way to get them on colab would be to build them on colab, which might be more effort than you're interested in investing in this :)
https://github.com/rapidsai/cudf/issues/285 is the issue to watch for rapidsai/cudf (presumably the other rapidsai/ libs will follow suit).
Latest solution;
!wget -nc https://github.com/rapidsai/notebooks-extended/raw/master/utils/rapids-colab.sh
!bash rapids-colab.sh
import sys, os
sys.path.append('/usr/local/lib/python3.6/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
was pushed a few days ago, see issues #104 or #110, or the full rapids-colab.sh script for more info.
Note: instillation currently requires a Tesla T4 instance, checking for this can be done with;
# check gpu type
!nvidia-smi
import pynvml
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
device_name = pynvml.nvmlDeviceGetName(handle)
# your dolphin is broken, please reset & try again
if device_name != b'Tesla T4':
raise Exception("""Unfortunately this instance does not have a T4 GPU.
Please make sure you've configured Colab to request a GPU instance type.
Sometimes Colab allocates a Tesla K80 instead of a T4. Resetting the instance.
If you get a K80 GPU, try Runtime -> Reset all runtimes...""")
# got a T4, good to go
else:
print('Woo! You got the right kind of GPU!')