Setting up GPU support in Airflow containers with Docker-compose - (GPU support with Tensorflow)

Setting up GPU support in Airflow containers with Docker-compose - (GPU support with Tensorflow) - tensorflow

I am having some difficulties in starting airflow using docker-compose with appropriate GPU libraries to run my machine learning tasks.
The airflow-scheduler throws this error:
airflow-scheduler_1 | 2022-03-21 12:33:36.919960: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Basically, there is no CUDA libraries installed in the /usr/local within the airflow container hence the error. I have installed nvidia-container runtime and set the deamon default runtime in deamon.json file
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \ sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \ sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list sudo apt-get update
And I have managed to use the runtime:nvidia in the docker-compose.yaml file. This way within the airflow container I can see nvidia-smi. However CUDA libraries are still missing.
Is there a way to install these libraries automatically (ideally FROM tensorflow/tensorflow:latest-gpu) as these set the CUDA libraries within the container?
On the other hand, if I am not using docker-compose I can start a container with docker:
docker run -it --gpus all tensorflow/tensorflow:latest-gpu
This container has all the libraries that I need. However, I would like to use docker-compose as life will be much easier to run multiple containers and setting up all network. So I would like to avoid this approach.
Also I can use the docker in airflow and mount the docker socket to airflow container such that I can initialise a new container from the airflow. This way, I can have all the CUDA libraries also installed however, it sounds very counter-intuitive and I am having difficulties understanding why I can't set all these within the airflow container originally.
client = docker.from_env()
# run the container
response = client.containers.run(
# The container you wish to call
'tensorflow/tensorflow:latest-gpu',
# The command to run inside the container
'find / -name "libcudart.so.11.0"',
# Passing the GPU access
device_requests=[
docker.types.DeviceRequest(count=-1, capabilities=[['gpu']])
]
)
I would appreciate if you can assist me in the right direction.

Related

Do I need to set aws config in docker compose as volume?

In my project I need to configure a AWS bucket download as it always gets read time error or connection error when downloading a fairly large file in my deployment. I have a .aws/config in my root directory and in my dockerfile I use "ADD . ." which adds all the files in the project. To build the image I use Docker compose. However, for some reason it is not using the aws config values. Is there a way to pass these values to Docker so that they are actually used?
This is my "config" file which is in ".aws" in the root of the project.
[default]
read_timeout = 1200
connect_timeout = 1200
http_socket_timeout = 1200
s3 =
max_concurrent_requests = 2
multipart_threshold = 8MB
multipart_chunksize = 8MB
My Dockerfile looks like this:
FROM python:3.7.7-stretch AS BASE
RUN apt-get update \
&& apt-get --assume-yes --no-install-recommends install \
build-essential \
curl \
git \
jq \
libgomp1 \
vim
WORKDIR /app
# upgrade pip version
RUN pip install --no-cache-dir --upgrade pip
RUN pip3 install boto3
ADD . .
I expected that through the "ADD . ." boto3 would use the config file. But that is unfortunately not the case.

Perhaps this would answer your question on why the ADD command didn't work.
Hidden file .env not copied using Docker COPY
Instead of relying on the local config setting of the machine where the docker image is built, you might want to put in the configuration as an explicit file in your repo, which is copied over to ~/.aws/config or anywhere in the container and referenced by setting its path to AWS_CONFIG_FILE; OR use any one of the the methods defined in the AWS documentation below:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
wherein you can define your configuration as part of your python code or declare them as environment variables

Files under /home in singularity container are not accessible

Could someone please let me know how one can access files in /home within a singularity container?
I created a docker image. In this image, some packages are built and installed under /home. Some of those are also added to PYTHONPATH within the docker image. If I run the image, then a docker container is created. Within this container I can access all files under /home and use the Python modules that I added. This is a fully working docker image.
I wanted to use the packages and Python modules on a HPC system. So, I converted the docker image to a singularity image. Then, I used the singularity shell <image_name.sif> command to access the shell in the container. After that I see the prompt below.
Singularity> cat /etc/*-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS"
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Singularity>
The host OS on the HPC system is Red Hat Linux. Since the /etc/*-release command shows Ubuntu, it seems like the /etc directory is the one inside the container. This looks reasonable. However, when I type ls /home, then I see the contents of /home on the host OS. Howe could I find the files in /home within the container?
If I type any commands to run the packages installed in /home within the container, then the singularity shell prints command not found. Also, if I run the Python interpreter, then I cannot import any modules installed within the container. Although the Python version matches the one in the container, the modules are not located. The PYTHONPATH includes paths like /home/<a_directory_name>, but the Python interpreter cannot locate the modules. Even though the docker image is fully functional, the corresponding singularity image is completely useless.
How could I use the packages and Python modules installed in /home in the singularity container?

By default Singularity automatically mounts $HOME into the container, which will shadow anything that was installed there during image creation.
To skip this, use the --no-home flag when running your singularity command. Additional options, such as mounting home to a different location, are described in the online and CLI documentation.

How to install schema registry

I am looking options to install confluent schema registry, is it possible to download and install registry alone and make it work with existing kafka setup ?
Thanks

Assuming you have Zookeeper/Kafka running already, you can easily run Confulent Schema Registry using Docker with running the following command:
docker run -p 8081:8081 -e \
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=host.docker.internal:2181 \
-e SCHEMA_REGISTRY_HOST_NAME=localhost \
-e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 \
-e SCHEMA_REGISTRY_DEBUG=true confluentinc/cp-schema-registry:5.3.2
parameters:
-p 8081:8081 - will open the port 8081 between the container to your machine
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL - is your Zookeeper host and port, I'm using host.docker.internal to resolve local machine that is hosting Zookeeper (outside of the container)
SCHEMA_REGISTRY_HOST_NAME - The hostname advertised in Zookeeper. This is required if if you are running Schema Registry with multiple nodes. Hostname is required because it defaults to the Java canonical hostname for the container, which may not always be resolvable in a Docker environment.
SCHEMA_REGISTRY_LISTENERS - the Schema Registry host and port number to open
SCHEMA_REGISTRY_DEBUG Run in debug mode
note: the script was using the version 5.3.2, make sure this version is aligned with your Kafka version.

Yes you can use your existing Kafka setup, just match to the compatible version of Confluent Platform. Here are the docs on getting started
https://docs.confluent.io/current/schema-registry/docs/intro.html#installation
tl;dr download the platform to pull out the pieces you need or get the docker image and point it at your Kafka cluster.

How can I develop in docker container with intellij?

I know intellij has a docker container plugin, however it doesn't seem to allow me to develop inside the container itself. The idea is simple, I don't want to configure my host to have the correct environment tools. I'd rather just a docker container setup and then use intellij to find libs, functionality and such with in the container itself.
This would be incredibly helpful for c++, java, and scala dev. Also it would be useful debugging as well.
So is it possible to develop within a docker container with intellij?

So you just want to work within a container just as you would within a full-blown VM, right? Then you should just run a container, attach a display (to run IDEA) and start configuring your development environment.
For the display part I'd test some answers given in Can you run GUI apps in a docker container?. There are some very cool answers in this topic showing various approaches to running GUI apps within a container.

Shouldn't the approach be rather:
Have local repository and local IDE. In the repository have docker file and eventually docker-compose.yml, which spins up environment required to run project.
Mount your local drive with sources into docker (volumes), so changes done in your local folder are reflected in docker, similar in other direction.

Please look at this example for Intellij IDEA CI and JDK8 based on Alpine Linux (taken here
https://raw.githubusercontent.com/shaharv/docker/master/alpine/dev/Dockerfile)
# Alpine 3.8 C++/Java Developer Image
#
# For IntelliJ and GUI (X11), run the image with:
# $ XSOCK=/tmp/.X11-unix && sudo docker run -i -v $XSOCK:$XSOCK -e DISPLAY -u developer -t [image-name]
#
# Then run IntelliJ with:
# /idea-IC-191.6707.61/bin/idea.sh
FROM alpine:3.8
ENV LANG C.UTF-8
RUN set -ex && \
apk add --no-cache --update \
# basic packages
bash bash-completion coreutils file grep openssl openssh nano sudo tar xz \
# debug tools
gdb musl-dbg strace \
# docs and man
bash-doc man man-pages less less-doc \
# GUI fonts
font-noto \
# user utils
shadow
RUN set -ex && \
apk add --no-cache --update \
# C++ build tools
cmake g++ git linux-headers libpthread-stubs make
RUN set -ex && \
apk add --no-cache --update \
# Java tools
gradle openjdk8 openjdk8-dbg
# Install IntelliJ Community
RUN set -ex && \
wget https://download-cf.jetbrains.com/idea/ideaIC-2019.1.1-no-jbr.tar.gz && \
tar -xf ideaIC-2019.1.1-no-jbr.tar.gz && \
rm ideaIC-2019.1.1-no-jbr.tar.gz
# Create a new user with no password
ENV USERNAME developer
RUN set -ex && \
useradd --create-home --key MAIL_DIR=/dev/null --shell /bin/bash $USERNAME && \
passwd -d $USERNAME
# Set additional environment variables
ENV JAVA_HOME /usr/lib/jvm/java-1.8-openjdk
ENV JDK_HOME /usr/lib/jvm/java-1.8-openjdk
ENV JAVA_EXE /usr/lib/jvm/java-1.8-openjdk/bin/java

There is a better way to do this now with Jetbrains Gateway. Just make sure you have OpenSSH server installed (latest Ubuntu containers have this already installed) in the container that you initially ran with exposed ports, i.e. -p 220:22 (I like 220) and the SSH service running, i.e. service ssh start, after modifying the /etc/ssh/sshd_config to enable root login and password authentication then service ssh restart. Make sure you set a password for the root user, i.e. passwd root, (or go through other steps to setup a new user). Then all you need to do is open Jetbrains Gateway, and SSH to the container with the fields set thus: user=root, host=localhost, and port=220 (or whatever you chose); note, you will also need to specify a project location, which in my use case is a Java application repository root directory -- this means you will need to have Java and Maven or whatever other tools installed in the container at some point, but doesn't affect ability to connect. Assuming you connect with no issues you will see activity whereby Gateway installs an IDE backend inside the container (takes about 10 minutes) and then starts up a IDE client which is a light version of IntelliJ (or whatever other IDE version you selected) that is honestly a bit buggy at time of writing. But it works and has unblocked some of my colleagues stuck with Windows machines and not many options to upgrade to Macs in the current chip shortage environment. Note that any time you restart the container you also need to restart the SSH service unless you script it to automatically start up when the container does.

docker run cannot find name flag argument

I have recently setup a Rstudio application on Google compute container engine using Docker and the Rocker/rstudio package. Now I want to start my saved container with a name using the following ssh command line:
sudo docker -d -p 8787:8787 --name samplename user/laatste
which returns the following error
flag provided but not defined: --name
I have tried with and without quotes, equal signs, double and single hyphens, before, between and after the other flags and arguments, but the same error keeps returning.
version information:
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.4.1
Git commit (client): a8a31ef
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.4.1
Git commit (server): a8a31ef
The reason I want to name the container is that I want to run standard (static) startup and shutdown scripts with the Google compute instance to automatically save and load changes made in R. The container name is used for identifying the container to be saved. Any other solution for this is also very welcome.

I guess you wanted to do:
sudo docker run -d -p 8787:8787 --name samplename user/laatste
You forgot to specify command (run) here.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas