GitHub Packages: Inconsistency when running Docker Image Locally and in GitHub Actions - cmake

I have been using Github actions to build and publish a docker image to the Github Container Registry according to the Documentation. I am getting an inconsistency behavior when I pull the new image and test it locally.
I have a CMake project in C++ that runs a simple hello world with an INTERFACE and SHARED library.
When I build a docker image locally and test it, this is the output (which is working fine):
*************************************
*** DBSCAN Cluster Segmentation ***
*************************************
--cloudfile: required.
Usage: program [options]
Optional arguments:
-h --help shows help message and exits [default: false]
-v --version prints version information and exits [default: false]
--cloudfile input cloud file [required]
--octree-res octree resolution [default: 120]
--eps epsilon value [default: 40]
--minPtsAux minimum auxiliar points [default: 5]
--minPts minimum points [default: 5]
-o --output-dir output dir to save clusters [default: "-"]
--ext cluster output extension [pcd, ply, txt, xyz] [default: "pcd"]
-d --display display clusters in the pcl visualizer [default: false]
--cal-eps calculate the value of epsilon with the distance to the nearest n points [default: false]
In Github Actions I am using this workflow:
name: Demo Push
on:
push:
# Publish `master` as Docker `latest` image.
branches: ["test-github-packages"]
# Publish `v1.2.3` tags as releases.
tags:
- v*
# Run tests for any PRs.
pull_request:
env:
IMAGE_NAME: dbscan-octrees
jobs:
# Push image to GitHub Packages.
# See also https://docs.docker.com/docker-hub/builds/
push:
runs-on: ubuntu-latest
permissions:
packages: write
contents: read
steps:
- uses: actions/checkout#v3
with:
submodules: recursive
- name: Build image
run: docker build --file Dockerfile --tag $IMAGE_NAME --label "runnumber=${GITHUB_RUN_ID}" .
- name: Test image
run: |
docker run --rm \
--env="DISPLAY" \
--env="QT_X11_NO_MITSHM=1" \
--volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
dbscan-octrees:latest
- name: Log in to registry
# This is where you will update the PAT to GITHUB_TOKEN
run: echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u $ --password-stdin
- name: Push image
run: |
IMAGE_ID=ghcr.io/${{ github.repository_owner }}/$IMAGE_NAME
# Change all uppercase to lowercase
IMAGE_ID=$(echo $IMAGE_ID | tr '[A-Z]' '[a-z]')
# Strip git ref prefix from version
VERSION=$(echo "${{ github.ref }}" | sed -e 's,.*/\(.*\),\1,')
# Strip "v" prefix from tag name
[[ "${{ github.ref }}" == "refs/tags/"* ]] && VERSION=$(echo $VERSION | sed -e 's/^v//')
# Use Docker `latest` tag convention
[ "$VERSION" == "master" ] && VERSION=latest
echo IMAGE_ID=$IMAGE_ID
echo VERSION=$VERSION
docker tag $IMAGE_NAME $IMAGE_ID:latest
docker push $IMAGE_ID:latest
The compilation and test steps are working fine with no errors (check this run). The problem is with the newly generated image after the push to the Github Container registry since when I pulled it locally to test it, the program is crashing with an "Illegal Instruction (core dumped)" error. I have tried to debug to find the problem and there is not a compilation error, link error, or something like that. I found out that this might be related to the linking part of the SHARED library, but it is strange because if the image is working when is built in the Github Action runner, I don't understand why fails the pushed image.
I found this post where the error might be something related to Github that changes the container during the installation.
Hope someone can help me with this.
This is the output in the Test image step on the workflow:
workflow
This is the error after pulling the newly generated image and testing it locally: error
I have even compared the bad binary file (Github version in the docker image) with the good version (Compiled version locally) using ghex, and the binary file generated by GitHub after pushing a new image is a little bigger than the good one.
binary comparision
binary sizes
Issue
CPU AVX instruction set not supported by local PC
Solution
Enable compilation flags in CMake to disable AVX support

Description
After digging using analysis tools for binaries files, debugging, etc. I discovered that the problem was related to the AVX CPU support in the GitHub action runner. My Computer does not support AVX optimized instructions, so I have to enable a compilation flag for my shared libraries in order to disable AVX support. This compilation flag will tell the Github Action runner to compile the project with no AVX CPU support or CPU optimizations which is the standard environment in GitHub Actions.
Analysis tools:
ldd binary
strace binary <-- this one allows me to identify the SIGEV_SIGNAL error code
container-diff
log error
Using the strace tool I got the next error:
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x55dcd7324bc0} ---
+++ killed by SIGILL (core dumped) +++
Illegal instruction (core dumped)
This error allowed me to find the error code and after searching on the internet I found a solution to my specific problem since my project was using Point cloud Library (PCL), I compiled my project with -mno-avx, according to this post.
Solution
In the CMakeList.txt file for each SHARED library define the next compilation flag:
target_compile_options(${PROJECT_NAME} PUBLIC -mno-avx)
New issue
I have resolved the major issue, but now one of my shared libraries has the same error. I will try to fix it with one of these (I think) flags.
After making a lot of tests and using CPU-X software and detecting the proper architecture-specific options in my PC with the following command via GCC:
gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
output:
/usr/lib/gcc/x86_64-linux-gnu/9/cc1 -E -quiet -v -imultiarch
x86_64-linux-gnu - -march=haswell -mmmx -mno-3dnow -msse -msse2
-msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -mno-aes -mno-sha
-mpclmul -mpopcnt -mabm -mno-lwp -mno-fma -mno-fma4 -mno-xop
-mno-bmi -mno-sgx -mno-bmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm
-mno-avx -mno-avx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle
-mrdrnd -mno-f16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx
-mfxsr -mno-xsave -mno-xsaveopt -mno-avx512f -mno-avx512er
-mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt
-mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl
-mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps
-mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku
-mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni
-mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq
-mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote
-mno-ptwrite --param l1-cache-size=32
--param l1-cache-line-size=64 --param l2-cache-size=3072
-mtune=haswell -fasynchronous-unwind-tables
-fstack-protector-strong -Wformat -Wformat-security
-fstack-clash-protection -fcf-protection
Final solution
I have fixed the execution error with the following flags in my SHARED library:
# MMX, SSE(1, 2, 3, 3S, 4.1, 4.2), CLMUL, RdRand, VT-x, x86-64
target_compile_options(${PROJECT_NAME} PRIVATE -Wno-cpp
-mmmx
-msse
-msse2
-msse3
-mssse3
-msse4.2
-msse4.1
-mno-sse4a
-mno-avx
-mno-avx2
-mno-fma
-mno-fma4
-mno-f16c
-mno-xop
-mno-bmi
-mno-bmi2
-mrdrnd
-mno-3dnow
-mlzcnt
-mfsgsbase
-mpclmul
)
Now, the docker image stored in the GitHub Container Registry is working as expected on my local PC.
Related posts
What is the proper architecture-specific options (-m) for Sandy Bridge based Pentium?
using cmake to make a library without sse support (windows version)
https://github.com/PointCloudLibrary/pcl/issues/5248
Compile errors with Assembler messages
https://github.com/PointCloudLibrary/pcl/issues/1837

Related

Tensorflow Serving Compiling Failure For CPU AVX AVX2

I use the method in the tfx official document to compile the tfx devel in docker file. The OS is MacOS, intel CPU.
here is the docker build code for it
#!/bin/bash
USER=$1
TAG=$2
TF_SERVING_VERSION_GIT_BRANCH="2.4.1"
git clone --branch="${TF_SERVING_VERSION_GIT_BRANCH}" https://github.com/tensorflow/serving
TF_SERVING_BUILD_OPTIONS="--copt=-mavx --local_ram_resources=4096"
cd serving && \
docker build --pull -t $USER/tensorflow-serving-devel:$TAG \
--build-arg TF_SERVING_VERSION_GIT_BRANCH="${TF_SERVING_VERSION_GIT_BRANCH}" \
--build-arg TF_SERVING_BUILD_OPTIONS="${TF_SERVING_BUILD_OPTIONS}" \
-f tensorflow_serving/tools/docker/Dockerfile.devel .
Then I run the shell script with >3hrs and get the following failure:
Actually I cannot know the detail because the log file from docker is clipped by the builder.
Does anyone met the similar problem and can help on this topic?
Thanks a lot in advance!
These instruction sets are not available on all machines, especially with older processors.
If you'd like to apply generally recommended optimizations, including utilizing platform-specific instruction sets for your processor, you can add --config=nativeopt to Bazel build commands when building TensorFlow Serving.
tools/run_in_docker.sh bazel build --config=nativeopt tensorflow_serving/...

CMake incremental compilation through toolchain upgrade

I am trying to find a way to enable incremental compilation with CMake through a toolchain upgrade. Here is the problematic scenario :
Branch main uses g++-9 (using CMAKE_CXX_COMPILER=g++-9)
A new branch uses g++-10 (using CMAKE_CXX_COMPILER=g++-10)
Commits are happening on both branches
Incremental builds on one branch work fine
Switching to the other branch and explicitly invoking CMake fails
My question is the following : I'm looking for the proper way to make the invocation of CMake succeed and rebuild all the project from scratch when a toolchain change happens.
Here is a script that will make it quick and easy to reproduce the problem. This script requires Docker. It will create folders Sources and Build at the location where it is executed to avoid littering your filesystem. It then creates Dockerfiles to build docker containers with both g++ and cmake. It then creates a dummy Hello World C++ CMake project. Finally, it creates a folder for build artifacts and then executes the build with g++-9 and then g++-10. The second build fails because CMake generates an error.
#!/bin/bash
set -e
mkdir -p Sources
mkdir -p Build
# Creates a script that will be executed inside the docker container to perform builds
cat << EOF > Sources/Compile.sh
cd /Build \
&& cmake /Sources \
&& make \
&& ./IncrementalBuild
EOF
# Creates a Dockerfile that will be used to have both gcc-9 and cmake
cat << EOF > Sources/Dockerfile-gcc9
FROM gcc:9
RUN apt-get update && apt-get install -y cmake
RUN ln -s /usr/local/bin/g++ /usr/local/bin/g++-9
ADD Compile.sh /Compile.sh
RUN chmod +x /Compile.sh
ENTRYPOINT /Compile.sh
EOF
# Creates a Dockerfile that will be used to have both gcc-10 and cmake
cat << EOF > Sources/Dockerfile-gcc10
FROM gcc:10
RUN apt-get update && apt-get install -y cmake
RUN ln -s /usr/local/bin/g++ /usr/local/bin/g++-10
ADD Compile.sh /Compile.sh
RUN chmod +x /Compile.sh
ENTRYPOINT /Compile.sh
EOF
# Creates a dummy C++ program that will be compiled
cat << EOF > Sources/main.cpp
#include <iostream>
int main()
{
std::cout << "Hello World!\n";
}
EOF
# Creates CMakeLists.txt that will be used to compile the dummy C++ program
cat << EOF > Sources/CMakeLists.txt
cmake_minimum_required(VERSION 3.9)
project(IncrementalBuild CXX)
add_executable(IncrementalBuild main.cpp)
set_target_properties(IncrementalBuild PROPERTIES CXX_STANDARD 17)
EOF
# Build the docker images with both Dockerfiles created earlier
docker build -t cmake-gcc:9 -f Sources/Dockerfile-gcc9 Sources
docker build -t cmake-gcc:10 -f Sources/Dockerfile-gcc10 Sources
# Run a build with g++-9
echo ""
echo "### Compiling with g++-9 and then running the result..."
docker run --rm --user $(id -u):$(id -g) -v $(pwd)/Sources:/Sources -v $(pwd)/Build:/Build -e CXX=g++-9 cmake-gcc:9
echo ""
# Run a build with g++-10
echo "### Compiling with g++-10 and then running the result..."
docker run --rm --user $(id -u):$(id -g) -v $(pwd)/Sources:/Sources -v $(pwd)/Build:/Build -e CXX=g++-10 cmake-gcc:10
echo ""
# Print success if we reach this point
echo "SUCCESS!"
I'm looking for the proper way to make the invocation of CMake succeed and rebuild all the project from scratch when a toolchain change happens.
The proper way is to use a fresh binary directory. Either remove the binary directory when changing and let it recreate or just use a separate different directory for each toolchain.
Use Build/gcc10 binary directory for gcc10 build and Build/gcc9 for gcc9 builds.
No need to cd Build and mkdir with nowadays cmake - use cmake -S. -BBuild. Also do not use make - prefer cmake --build Build to let you switch generator later.
"If you change the toolchain, you should start with a fresh build. There are too many things that assume the toolchain doesn’t change and while you may be able to find workarounds which appear to work, I recommend you always use a fresh build tree for a different toolchain. This same logic also applies if you update the existing toolchain in-place (e.g. you update to a newer version of GCC on Linux, a newer version of Xcode on macOS, etc.). CMake queries compiler capabilities and caches the results. If you change the toolchain in a way that CMake can’t catch, then you end up with stale cached capabilities being used for the new/updated toolchain. Please don’t do that." - Craig Scott
So essentially I don't think it's possible. You just need to blow away your build. The best thing you can do is alert users if CMake isn't doing it for you.
Perhaps reply on this also:
https://discourse.cmake.org/t/how-to-change-toolchain-without-breaking-developer-workflows/1166
Or start another discourse.

Tensorflow Serving: no versions of servable half_plus_two found under base path /models

I'm doing Tensorflow Serving with Docker (see here for docs). Server runs on our infra here. I've succeeded at requesting my model when the command to run the container is something like:
tensorflow_model_server --port=8500 --rest_api_port=8501 \
--model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME}
A curl request to the server returns the expected answer. Problem occurs when I try to use the model_config_file parameter. Command:
tensorflow_model_server --port=8500 --rest_api_port=8501 \
--model_config_file=/serving/models.conf
Config file is:
model_config_list: {
config: {
name: "half_plus_two",
base_path: "/models/",
model_platform: "tensorflow"
}
}
When I run the container with this command, I get the error:
No versions of servable half_plus_two found under base path /models/
(I've also tried to remove the trailing backslash on the base_path with no more success). I've seen this post on SO that reminds us to use a version under model dir and I have one. My /models dir is:
models
|
- half_plus_two
|
- 1
|
- saved_model.pb
- variables
- assets
Someone can help?
Had the same thing on Windows 10. I finally:
Noticed that I forgot to clone the tensorflow/serving repository to my local machine
Ran on Ubuntu-wsl-2 console, as using Windows command line, I could convince docker to correctly map the container's /models/half_plus_two to my local path (the -v option in the following command):
docker run -t --rm -p 8501:8501
-v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two"
-e MODEL_NAME=half_plus_two
tensorflow/serving &

How to dynamically pass parameter to the RPM during installation

We are in need to dynamically pass a variable during RPM installation and capture it in the spec file to trigger a script in %post
Following is the command
RPM Install Command
sudo rpm -Uvh --force abc.noarch.rpm --define '_ip 10.1.2.4' --define 'version 3'
**abc.spec**
Name: abc
Version: 1
Release: 1.0
Summary: Test
%{!?_ip: %define _ip 0.0.0.0 }
%{!?_version: %define _version 0 }
%post
echo "ip:::: %{_ip}"
echo "VESION:::: %{_version}"
So when I run the RPM with the above command , I get the following output.
[root#test solution]$ sudo rpm -Uvh --force abc.noarch.rpm --define '_ip 10.1.2.4' --define 'version 3'
Preparing... ################################# [100%]
Updating / installing...
1:abc ################################# [ 50%]
ip:::: 0.0.0.0
VESION:::: 0
Though i pass a different value in the CLI command , I still see that the argument which I pass is not been captured in the spec file.
Need inputs on how to capture the values which im passing the CLI .
The option --define defines macro. Macros are evaluated when building an RPM from SRC.RPM using rpmbuild. The binary (does not matter if arch or noarch) package has every macro already expanded. Even the %bindir etc.
The RPM ecosystem was designed as non-interactive. This is a big difference from the DEB ecosystem when questions can be raised using debconf.
You cannot workaround it. You cannot ask even by directly reading STDIN as rpm close this descriptor before executing scriptlets.
The best practice is to use configuration files. E.g. /etc/abc/ip.conf. And:
either instruct user to manualy (or using Ansible) alter that file and store their correct data
or do NOT distribute /etc/abc/ip.conf in main abc package and instead require abc-config. And then create one or more config packages which will be like:
Package: abc-testing-config
Provides: abc-config
...
%files
/etc/abc/ip.conf
And you then instruct users to install abc abc-test-config. Or it can be abc abc-EMEA-config etc....

Whether drone.io support reusing docker container for build

I have setup drone.io locally and created a .drone.yml for CI build. But I found drone removes the docker container after finishing the build. Whether it support reusing the docker container? I am working on gradle project and the initial build takes a long time to download java dependencies.
UPDATE1
I used below command to set the admin user on running drone-server container.
docker run -d \
-e DRONE_GITHUB=true \
-e DRONE_GITHUB_CLIENT="xxxx" \
-e DRONE_GITHUB_SECRET="xxxx" \
-e DRONE_SECRET="xxxx" \
-e DRONE_OPEN=true \
-e DRONE_DATABASE_DRIVER=mysql \
-e DRONE_DATABASE_DATASOURCE="root:root#tcp(mysql:3306)/drone?parseTime=true" \
-e DRONE_ADMIN="joeyzhao0113" \
--restart=always \
--name=drone-server \
--link=mysql \
drone/drone:0.5
After doing this, I use the user joeyzhao0113 to login drone server but failed to enable the Trusted flag on the setting page. The popup message dialog shows setting successfully see below screenshot. But the flag keep showing disabled always.
No, it is not possible to re-use a Docker container for your Drone build. Build containers are ephemeral and are destroyed at the end of every build.
That being said, it doesn't mean your problem cannot be solved.
I think a better way to phrase this question would be "how do I prevent my builds from having to re-download dependencies"? There are two solutions to this problem.
Option 1, Cache Plugin
The first, recommended solution, is to use a plugin to cache and restore your dependencies. Cache plugins such as the volume cache and s3 cache are community contributed plugins.
pipeline:
# restores the cache from a local volume
restore-cache:
image: drillster/drone-volume-cache
restore: true
mount: [ /drone/.gradle, /drone/.m2 ]
volumes:
- /tmp/cache:/cache
build:
image: maven
environment:
- M2_HOME=/drone/.m2
- MAVEN_HOME=/drone/.m2
- GRADLE_USER_HOME=/drone/.gradle
commands:
- mvn install
- mvn package
# rebuild the cache in case new dependencies were
# downloaded during your build
rebuild-cache:
image: drillster/drone-volume-cache
rebuild: true
mount: [ /drone/.gradle, /drone/.m2 ]
volumes:
- /tmp/cache:/cache
Option 2, Custom Image
The second solution is to create a Docker image with your dependencies, publish to DockerHub, and use this as your build image in your .drone.yml file.
pipeline:
build:
image: some-image-with-all-my-dependencies
commands:
- mvn package