Checkpoint Resuming Throwing Assertion - ARM Arch - gem5

I am trying to create and resume from a checkpoint for an ARM compiled binary (LLVM Test Suite).
I cross compiled the LLVM Test Suite with the following command in a Makefile:
./arm-linux-gnueabihf-gcc -O0 -ggdb3 -std=c99 -static $< -o $#
(basically using the arm-linux-gnueabihf-gcc cross compiler version 7.4)
I created the checkpoints using the following command:
./build/ARM/gem5.opt --outdir=chkpt_only/ configs/example/se.py --checkpoint-dir chkpt_only/ --take-checkpoints=0,20000000000 --cpu-type=AtomicSimpleCPU --cmd=../../../Benchmarks/LLVM_Test_Suite/SingleSource/Benchmarks/Stanford/Towers
I tried to resume from the checkpoint with the following command:
./build/ARM/gem5.opt --outdir=chkpt_only/ configs/example/se.py -r 1 --checkpoint-dir chkpt_only/ --cpu-type=O3_ARM_v7a_3 --caches --cmd=../../../Benchmarks/LLVM_Test_Suite/SingleSource/Benchmarks/Stanford/Towers
The above seems to work when the --cpu-type is In-order but for any O3 CPU I get the following assertion:
gem5.opt: build/ARM/cpu/o3/rename_map.hh:282: const PhysRegId* UnifiedRenameMap::lookup(const RegId&) const: Assertion `vecMode == Enums::Elem' failed.
Can someone please help me to understand/fix this assertion?
PS: The git commit is 2775f55447edb344d99f30273ad93fea515d7e2b

Related

Compile errors running the ot-br-posix ./script/setup on RPi4

I'm trying to run the ./script/setup, but get compile errors:
Please note that the total 65 steps listed below is because I've restarted the setup script. The initial number of steps were closer to 465.
[1/65] Building CXX object src/common/CMakeFiles/otbr-common.dir/mainloop.cpp.o
FAILED: src/common/CMakeFiles/otbr-common.dir/mainloop.cpp.o
/usr/bin/c++ -DHAVE_LIBSYSTEMD=1 -DOTBR_ENABLE_BACKBONE_ROUTER=1 -DOTBR_ENABLE_BORDER_AGENT=1 -DOTBR_ENABLE_BORDER_ROUTING=1 -DOTBR_ENABLE_BORDER_ROUTING_COUNTERS=1 -DOTBR_ENABLE_DBUS_SERVER=1 -DOTBR_ENABLE_DNSSD_DISCOVERY_PROXY=1 -DOTBR_ENABLE_NAT64=1 -DOTBR_ENABLE_NOTIFY_UPSTART=1 -DOTBR_ENABLE_REST_SERVER=1 -DOTBR_ENABLE_SRP_ADVERTISING_PROXY=1 -DOTBR_ENABLE_SRP_SERVER_AUTO_ENABLE_MODE=1 -DOTBR_ENABLE_VENDOR_INFRA_LINK_SELECT=0 -DOTBR_MESHCOP_SERVICE_INSTANCE_NAME="\"OpenThread BorderRouter\"" -DOTBR_PACKAGE_NAME=\"OpenThread_BorderRouter\" -DOTBR_PACKAGE_VERSION=\"0.3.0-0cdef3c\" -DOTBR_PRODUCT_NAME=\"BorderRouter\" -DOTBR_SYSLOG_FACILITY_ID=LOG_USER -DOTBR_VENDOR_NAME=\"OpenThread\" -I../../include -I../../src -Ithird_party/openthread/repo/etc/cmake -I../../third_party/openthread/repo/etc/cmake -I../../third_party/openthread/repo/include -I../../third_party/openthread/repo/src/posix/platform/include -I../../third_party/openthread/repo/src -Wall -Wextra -Werror -Wfatal-errors -Wuninitialized -Wno-missing-braces -std=c++11 -MD -MT src/common/CMakeFiles/otbr-common.dir/mainloop.cpp.o -MF src/common/CMakeFiles/otbr-common.dir/mainloop.cpp.o.d -o src/common/CMakeFiles/otbr-common.dir/mainloop.cpp.o -c ../../src/common/mainloop.cpp
In file included from /usr/include/c++/8/list:63,
from ../../src/common/mainloop_manager.hpp:41,
from ../../src/common/mainloop.cpp:30:
/usr/include/c++/8/bits/stl_list.h:811:19: error: expected ‘)’ before ‘&’ token
list(_InputIterat&... __args)`
compilation terminated due to -Wfatal-errors.
I receive a lot more errors, but they follow the same pattern as above.
I have followed the guide from openthread.io to setup an Open Thread Border Router
The execution of the bootstrap script ran smoothly.
Additional information:
Git local repository path: ~/src/openthread/ot-br-posix
Command for executing the setup script:
pi#raspberrypi:~/src/openthread/ot-br-posix$> INFRA_IF_NAME=eth0 ./script/setup
RPi OS: Recommended image from the guide Raspberry Pi OS lite
Libgcc versions:
libgcc-8-dev/oldstable,now 8.3.0-6+rpi1 armhf [installed,automatic]
libgcc1/oldstable,now 1:8.3.0-6+rpi1 armhf [installed]
Cmake versions:
cmake-data/oldstable,now 3.16.3-3~bpo10+1 all [installed,automatic]
cmake/oldstable,now 3.16.3-3~bpo10+1 armhf [installed]

How to build tenssorflow op with bazel with additional include directories

I got tensorflow binaries (already compiled)
I have added to tensorflow source:
tensorflow\core\user_ops\icp_op_kernel.cc - contains:
https://github.com/tensorflow/models/blob/master/research/vid2depth/ops/icp_op_kernel.cc
tensorflow\core\user_ops\BUILD - contains:
load("//tensorflow:tensorflow.bzl", "tf_custom_op_library")
tf_custom_op_library(
name = "icp_op_kernel.so",
srcs = ["icp_op_kernel.cc"],
)
I am trying to build with:
bazel build --config opt //tensorflow/core/user_ops:icp_op_kernel.so
And I get:
tensorflow/core/user_ops/icp_op_kernel.cc(16): fatal error C1083: Cannot open include file: 'pcl/point_types.h': No such file or directory
Because bazel don't know where the pcl include files are.
I have installed pcl and the include directory is in:
C:\Program Files\PCL 1.6.0\include\pcl-1.6
How do I tell bazel to also include this directory?
Also I will probably need to add C:\Program Files\PCL 1.6.0\lib to the link, How do I do that?
You don't need bazel for building ops if it fails.
I have implemented customized ops both in CPU and GPU, and basically follow the two Tensorflow tutorials.
For CPU ops, follow Tensorflow tutorial on Build the op library:
TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
g++ -std=c++11 -shared zero_out.cc -o zero_out.so -fPIC ${TF_CFLAGS[#]} ${TF_LFLAGS[#]} -O2
Note on gcc version >=5: gcc uses the new C++ ABI since version 5. The binary pip packages available on the TensorFlow website are built with gcc4 that uses the older ABI. If you compile your op library with gcc>=5, add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line to make the library compatible with the older abi.
For GPU ops, check the current official GPU ops building instructions on Tensorflow adding GPU op support
nvcc -std=c++11 -c -o cuda_op_kernel.cu.o cuda_op_kernel.cu.cc \
${TF_CFLAGS[#]} -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
g++ -std=c++11 -shared -o cuda_op_kernel.so cuda_op_kernel.cc \
cuda_op_kernel.cu.o ${TF_CFLAGS[#]} -fPIC -lcudart ${TF_LFLAGS[#]}
As it says, Note that if your CUDA libraries are not installed in /usr/local/lib64, you'll need to specify the path explicitly in the second (g++) command above. For example, add -L /usr/local/cuda-8.0/lib64/ if your CUDA is installed in /usr/local/cuda-8.0.
Also, Note in some linux settings, additional options to nvcc compiling step are needed. Add -D_MWAITXINTRIN_H_INCLUDED to the nvcc command line to avoid errors from mwaitxintrin.h.

How to edit the linker flags bazel uses to build syntaxnet/tensorflow

I don't get Tensorflow with Syntaxnet built with CUDA on Ubuntu 16.04.
I have built it successfully without CUDA on this system.
Most likely the error is rooted in the configuration. The bazel build of tensorflow with CUDA generates linker commands for shared libraries with the linker option
-pie for generating executables with position independent code. This causes the error "undefined reference to `main'".
/home/patrick/.cache/bazel/_bazel_patrick/5b9c9cf56f3e0138be05b0752b134bcb/external/com_google_absl/absl/base/BUILD.bazel:28:1: Linking of rule '#com_google_absl//absl/base:spinlock_wait' failed (Exit 1):
crosstool_wrapper_driver_is_not_gcc failed: error executing command
`(cd /home/patrick/.cache/bazel/_bazel_patrick/5b9c9cf56f3e0138be05b0752b134bcb `/execroot/__main__ && exec env - \
CUDA_TOOLKIT_PATH=/usr/local/cuda \
CUDNN_INSTALL_PATH=/usr/local/cuda \
GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/extras/CUPTI/lib64:/usr/local/cuda-9.0/nvvm/lib64 \
NCCL_INSTALL_PATH=/usr \ PATH=/home/patrick/bin:/home/patrick/.local/bin:/usr/local/cuda/bin:/usr/bin:/bin \
PWD=/proc/self/cwd \
PYTHON_BIN_PATH=/usr/bin/python \
PYTHON_LIB_PATH=/usr/local/lib/python2.7/dist-packages \
TF_CUDA_CLANG=0 \
TF_CUDA_COMPUTE_CAPABILITIES=6.1 \
TF_CUDA_VERSION=9.0 \
TF_CUDNN_VERSION=7 \
TF_NCCL_VERSION=2 \
TF_NEED_CUDA=1 \
TF_NEED_OPENCL_SYCL=0 \
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -shared -o bazel-out/k8-opt/bin/external/com_google_absl/absl/base/libspinlock_wait.so -Wl,-no-as-needed -B/usr/bin/ -pie -Wl,-z,relro,-z,now -no-canonical-prefixes -pass-exit-codes '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,--gc-sections -Wl,#bazel-out/k8-opt/bin/external/com_google_absl/absl/base/libspinlock_wait.so-2.params)
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/Scrt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
This linking command succeeds when removing the option -pie.
Help would be appreciated to either find a way to edit the linker flags Bazel uses or to get a hint to the configuration error I made from users that encountered a similar problem. I don't think that posting the configuration steps I did will lead to other suggestions than the ones I already read on other posts. The build process looks too shaky for me.
I already had a look at the definition in the CROSSTOOL and BUILD files. I did not edit them and they look Ok (-pie is only enabled for linking executables).
I work with
Bazel 0.15.2
Tensorflow 1.8.0
Ubuntu 16.04
gcc 5.4
CUDA 9.0
CUDNN 7.1
NCCL 2.1

Tensorflow Serving Compile Error Using Docker on OSX

I'm trying to install TensorFlow serving on OSX El Capitan using Docker but keep running into an error. Here is the tutorial I'm following:
https://tensorflow.github.io/serving/docker.html
Here is the command causing the error:
bazel test tensorflow_serving/...
Here's the error I'm getting:
for (int i = 0; i < suffix.size(); ++i) {
^
ERROR: /root/.cache/bazel/_bazel_root/f8d1071c69ea316497c31e40fe01608c/external/tf/tensorflow/core/kernels/BUILD:212:1: C++ compilation of rule '#tf//tensorflow/core/kernels:mirror_pad_op' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -iquote external/tf -iquote ... (remaining 65 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
Solved! Looks like the issues was with running out of memory in the VM.
Here's how I fixed it:
1) When creating the machine, make sure it has more memory (mine was only 1GB). Here is how you create a docker machine with 4GB:
docker-machine create -d virtualbox --virtualbox-memory 4096 default
2) When running the bazel command pass in a parameter limiting the amount of memory to use. Here I'm running the command using only 2GB:
bazel build -c opt --copt=-mavx --verbose_failures --local_resources 2048,2.0,1.0 -j 1 //tensorflow_serving/example:mnist_export
Where the original command was:
bazel build //tensorflow_serving/example:mnist_export

g++: error: unrecognized option ‘--end-group’

I had been using ubuntu 10.10 for quite some time as my development PC. My code was to build without error with g++ version 4.4.5. Recently I had upgraded my system to 11.10 which has come with g++ version 4.6.1
Now , using this compiler, when I am trying to build the same piece of code, I am getting this error:
g++: error: unrecognized option ‘--end-group’
The Make file line, where this error is thrown is:
$(TARGET): $(OBJS)
g++ $(LDFLAGS) $^ $ -Wl,--start-group $(ARCHIVE_LIBS) --end-group -o $(TARGET)
cp -f $(TARGET) ../../../bin/
Can some please throw some light on this? I had googled but I did not get any clue?
Thanks and Regards,
Souvik
--end-group is a linker flag so you should prefix it with -Wl, i.e.
g++ $(LDFLAGS) $^ $ -Wl,--start-group $(ARCHIVE_LIBS) -Wl,--end-group -o $(TARGET)
I am not sure why this worked before.