What is the best way to run TensorFlow 2.0 with AWS Sagemeker?
As of today (Aug 7th, 2019) AWS does not provide TensorFlow 2.0 SageMaker containers, so my understanding is that I need to build my own.
What is the best Base image to use? Example Dockerfile?
EDIT: Amazon SageMaker does now support TF 2.0 and higher.
SageMaker + TensorFlow docs: https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/using_tf.html
Supported Tensorflow versions (and Docker URIs): https://aws.amazon.com/releasenotes/available-deep-learning-containers-images
Original answer
Here is an example Dockerfile that uses the underlying SageMaker Containers library (this is what is used in the official pre-built Docker images):
FROM tensorflow/tensorflow:2.0.0b1
RUN pip install sagemaker-containers
# Copies the training code inside the container
COPY train.py /opt/ml/code/train.py
# Defines train.py as script entrypoint
ENV SAGEMAKER_PROGRAM train.py
For more information on this approach, see https://docs.aws.amazon.com/sagemaker/latest/dg/build-container-to-train-script-get-started.html
Update on 10 nov 2019:
There is now a way to use Tensorflow 2 in SageMaker, event though there is no shortcut to start TF 2 directly from SageMaker console.
Start a conda Python3 Kernel
Make some updates (one in each code cell):
!pip install --upgrade pip # pip 19.0 or higher is required for TF 2
!pip install --upgrade setuptools # Otherwise you'll get annoying warnings about bad installs
Install Tensorflow 2
!pip install --user --upgrade tensorflow
Given the doc, this will install in $HOME.
Nota:
If you are using a GPU based instance of SageMaker, replace tensorflow by tensorflow-gpu.
You now can use TF 2 in your instance. This only needs to be done once, as long as the instance remains up.
To test, just run in the next cell:
import tensorflow as tf
print(tf.__version__)
You should see 2.0.0 or higher.
As of now, the best image that you can use to build Tensorflow 2.0 is 2.0.0b1, which is the latest version of Tensorflow 2.0(image) that is available now.Please find the link here. You also have an image 2.0.0b1-py3, which comes with Python3 (3.5 for Ubuntu 16-based images; 3.6 for Ubuntu 18-based images).
If you feel this answer is useful, kindly accept this answer and/or up vote it. Thanks.
Here from the SageMaker team. To use SageMaker seamlessly, it's recommended that you try out Amazon SageMaker Studio. It's similar to a Jupyter Notebook instance but has a variety of SageMaker services already integrated within it. A huge plus point is that you can select from a variety of kernels and for TensorFlow 2.0 you can select any of these that already have all the required packages installed:
Python 3 (TensorFlow 2.1 Python 3.6 CPU Optimized)
Python 3 (TensorFlow 2.1 Python 3.6 GPU Optimized)
Python 3 (TensorFlow 2.3 Python 3.7 CPU Optimized)
Python 3 (TensorFlow 2.3 Python 3.7 GPU Optimized)
In Studio, you can select the Kernel drop-down located in top-right corner of your SageMaker Studio instance:
Shown below is a screenshot of the UI interface in SageMaker Studio of the available kernels that are compatible with TensorFlow 2.0. Note that all kernels within SageMaker Studio are continuously tested and work seamlessly with all SageMaker Services.
Related
Anyone with experience using vast.ai for cloud GPU computing knows if when renting more than one GPU do you need to do some setup to take advantage of the extra GPUs?
Because I can't notice any difference on speed when renting 6 or 8 GPUs instead of just one. I'm new at using vast.ai for cloud GPU computing.
I am using this default docker:
Official docker images for deep learning framework TensorFlow (http://www.tensorflow.org)
Successfully loaded tensorflow/tensorflow:nightly-gpu-py3
And just installing keras afterwards:
pip install keras
I have also checked the available GPUs using this and all the GPUs are detected correctly:
from keras import backend as K
K.tensorflow_backend._get_available_gpus()
cheers
Solution:
Finally I found the solution myself. I just used another docker image with an older version of tensorflow (2.0.0), and the error disappeared.
I have installed the Tensorflow r1.14 and want to use TF-TRT. However, the following error occurs:
"ModuleNotFoundError: No module named 'tensorflow.contrib.tensorrt'"
when running the sample code. The same error occurs with Tensorflow r1.13. So my question is do I need to install the tensorflow.contrib.tensorrt library separately? If yes, how?
Additionally, I can run the sample code of the TensorRT, e.g. sampleINT8, successfully. Click here to see my successful sample code run.
This leads me to believe that TensorRT is installed properly. However, the TF-TRT still doesn't work.
Any help would be greatly appreciated!
In TF 1.14, TF-TRT was moved to the core from contrib.
You need to import it like this: from tensorflow.python.compiler.tensorrt import > trt_convert as trt
https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py#L22
This is the correct answer for Linux.
However, if you're using Windows: the TensorRT Python API (and therefore TF-TRT) is not supported for Windows at the moment, so the TensorFlow python packages aren't built with TensorRT.
In TF 1.14, TF-TRT was moved to the core from contrib.
You need to import it like this:
from tensorflow.python.compiler.tensorrt import trt_convert as trt
https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py#L22
In order to be able to import tensorflow.contrib.tensorrt you need to have tensorflow-gpu version >= 1.7 installed on your system. Maybe you could try installing the tensorflow-gpu library with a:
pip install tensorflow-gpu
Check out the Windows section of the GPU documentation as well. Also, I would try updating your tensorflow version with a:
pip install --upgrade tensorflow
to ensure you're up to date there as well. Check out this section of the TensorFlow documentation for additional support.
Hopefully that helps!
2 possibilities
Have you installed tensorflow-gpu instead of tensorflow?
From your screenshot it looks like you're using Windows. I had the same problem. There seems no tensorrt module under contrib in TF windows distribution however linux has it (I tried 1.13.1).
I built a python tensorflow package and uploaded to run on ml engine.
"tensorflow-gpu==1.8.0" (no tensorflow) is set to be required in my setup.py.
The ML engine run fails at "import tensorflow as tf" saying "No module named tensorflow".
The ML engine run works fine when I'm only requiring "tensorflow==1.8.0" but I believe tensorflow-gpu is needed to use GPU.
Any ideas how to solve this issue?
Thanks
You need to set --runtime-version=1.8 when submitting the job. Consequently, you don't need to manually specify TF in setup.py. In fact, if that's the only package you are requiring, you can omit setup.py altogether.
Update 2018/06/29:
Explanation: different versions of TensorFlow require different versions of NVIDIA's drivers and software stack. The --runtime-version is guaranteed to have the right version of the drivers for that particular version of TensorFlow. You can technically set the version of tensorflow-gpu in your setup.py, but that version must be compatible with the NVIDIA stack present in the --runtime-version you've selected (defaults to the very old TF 1.0).
This also happens when you have multiple python versions. In that case you have to specify the relevant python version for tf installation. For example,"python3 setup.py" instead of "python setup.py".
I would like to use Tensorflow 1.3 (and maybe 1.4) on Cloud ML. Im running jobs on multi-GPU machines on Cloud ML
I do that by specifying the tensorflow version in the setup.py as shown below:
from setuptools import setup
REQUIRED_PACKAGES = ['tensorflow==1.3.0']
setup(
name='my-image-classification',
install_requires=REQUIRED_PACKAGES,
version='1.0',
packages=['my_image_classification',
'my_image_classification/foo',
'my_image_classification/bar',
'my_image_classification/utils'],
)
What is the cudnn library that is installed on Cloud ML? Is it compatible with tensorflow 1.3 and tensorflow 1.3+ ?
I was able to start the jobs, but the performance is 10X lower than the expected value, and I'm curious if there is a problem with the underlying linking of Libraries
Edit:
I'm pretty confident now that the Cudnn versions on Cloud ML dont match what is required for Tensorflow 1.3. I noticed that Tensorflow 1.3 jobs are missing the "Creating Tensorflow device (/gpu:0...) " Logs which appear when I run a job with the default available Tensorflow on cloud ml
DISCLAIMER: using anything but 1.0, 1.2 is not officially supported as of 2017/11/01.
You need to specify the GPU-enabled version of TensorFlow:
REQUIRED_PACKAGES = ['tensorflow-gpu==1.3.0']
But the version of pip is out-of-date so you need to force that to update first.
Upon trying to install Tensorflow for conda environment, I encountered with the following error message, without any progress:
tensorflow-1.1.0-cp35-cp35mwin_amd64.whl is not a supported wheel on this platform
Have you tried uninstalling and re-installing TensorFlow using pip within your Conda environment? I.e.:
pip uninstall tensorflow
Followed by:
pip install tensorflow
If it doesn't work, the issue may be with your Python installation. TensorFlow only supports 64-bit Python 3.5+ on Windows (see more info here).
Perhaps you have Python's default installation, which comes in a 32-bit version. If that's the case, you can download the 64-bit Python 3.5 or later from here to run in your Conda environment and then you should be able to install/run TensorFlow without any issues.
Make sure that the Python version installed in the Environment is 3.5 not 3.6. Since 3.6 was released Conda automatically sets that version as default for python 3. However, it is still not supported by Tensorflow.
You can work using tensorflow library along with other essential libraries using the Dockerfile. Using Docker for environment are a good way to run experiments in reproducible manner as in this blog
You can also try using datmo in order setup environment and track machine learning projects for making it reproducible using datmo CLI tool.