I need to adapt a series of codes and scripts written for raspberry Pi (1st gen) (that was running a ARM11 cpu) to run on a Allwinner H6-based board cpu (an ARM Cortex-A53).
I already substituted CMAKE_SYSTEM_PROCESSOR from ARCH armv7l to ARCH aarch64.
But, to launch the cmake compiling command string I had
cmake -D CMAKE_CXX_FLAGS="-march=armv7-a" /..path
and I thought to substitute the -march=armv7-a with -march=armv8-a.
Now my doubt is: could be this correct to compile for the Allwinner H6 64bit? Why I can't put directly aarch64 instead of armv8-a? And, finally: what the difference between "armv8" and "armv8-a"?.
Sorry, I am a little bit confused here.
1) Yes, -march=armv8-a would be correct but less specific than,
say, -mtune=cortex-a53, since the Allwinner H6 is a cortex-a53.
My guess is that you cannot put -march=aarch64 instead of -march=arm-v8-a because this would be too generic: after all, you can already specify ‘armv8-a’, ‘armv8.1-a’, ‘armv8.2-a’, ‘armv8.3-a’, ‘armv8.4-a’ and ‘armv8.5-a’, as documented here.
armv8 is the umbrella name for ARMv8-A, ARMv8-M and ARMv8-R. A, R and M are 'profiles' according to arm terminology, and target different types of applications:
See here, here and here for more details.
Related
I have cross compiled a software for an HummingBoard-Pro (arm processor).
The software just receives some data using the lcm protocol.
If I use the cross compiled software, the data received by the application are invalid, while if I use on-board compiled software everything works fine.
-The software is exactly the same!
-I cross compiled using cmake and a specific arm toolchain.
Output example of cross compiled sw:
first value 5.73599e+107
second value 5.73599e+107
third value 5.73599e+107
Output example of on board compiled sw:
first value 1
second value 2
third value 3
Note: It's my first cross compilation attempt so probably something goes wrong but I haven't really idea about what.
CMakelists file
cmake_minimum_required(VERSION 3.1)
set(main_project_dir ${CMAKE_CURRENT_SOURCE_DIR})
set(external_dir ${main_project_dir}/external)
set(external_lcm_dir ${external_dir}/lcm_dir)
set(external_lcm ${external_lcm_dir}/lcm)
set(external_lcm_build ${external_lcm}/build)
set(external_lcm_gen_exe /usr/local/bin/lcm-gen)
set(lcm_input_file ${main_project_dir}/lcm_format_files/lcm_input_files/indrive.sensors.vanet.lcm)
set(lcm_libraries ${main_project_dir}/external/lcm_dir/lcm/build/lcm)
set(lmc_libraries_header ${main_project_dir}/external/lcm_dir/lcm/)
set(lcm_autogenerated_dir ${main_project_dir}/build/lcm_autogenerated_classes)
add_custom_target(
generate-lcm
COMMAND ${external_lcm_gen_exe} -x ${lcm_input_file} --cpp-hpath ${lcm_autogenerated_dir}
COMMENT "=================== Generating lcm files..."
)
add_subdirectory(testSender)
add_subdirectory(testReceiver)
TOOLCHAIN FILE
SET (CMAKE_SYSTEM_NAME Linux)
SET (CMAKE_SYSTEM_VERSION 1)
SET (CMAKE_SYSTEM_PROCESSOR arm)
INCLUDE_DIRECTORIES(/usr/hummingboard/usr/include /usr/hummingboard/include /usr/hummingboard/usr/include/arm-linux-gnueabihf/)
LINK_DIRECTORIES(/usr/hummingboard/usr/lib /usr/hummingboard/lib /usr/hummingboard/lib/arm-linux-gnueabihf )
SET(CMAKE_PREFIX_PATH /usr/arm-linux-gnueabihf/lib/
/usr/hummingboard/
/usr/hummingboard/lib/arm-linux-gnueabihf/
/usr/hummingboard/usr
/usr/hummingboard/usr/lib/arm-linux-gnueabihf/
)
SET (CMAKE_C_COMPILER /usr/bin/arm-linux-gnueabi-gcc)
SET (CMAKE_CXX_COMPILER /usr/bin/arm-linux-gnueabi-g++)
SET (CMAKE_FIND_ROOT_PATH /usr/hummingboard/ /usr/hummingboard/usr)
SET (CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
SET (CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
SET (CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
SET (CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
Turning my comments into an answer
Your toolchain file looks like a mixture of two GNU toolchains, which is not allowed and could explain strange behavior of your software.
I would e.g. expect there to be a /usr/hummingboard/bin directory. And shouldn't there be a arm-linux-gnueabihf-gcc to match with /usr/arm-linux-gnueabihf/lib/.
My guess would be that you are mixing hard-float (hf) with soft-float libraries and native- with cross-compilers.
It gets visible with the value 5.73599e+107 = 0x7f800000 which means infinite.
To find the root-cause I would recommend to check your floating point settings. Please compare the compiler command lines between both builds (working vs. non-working) using verbose makefiles.
References
Assign infinity to float
Using CMake with GNU Make: How can I see the exact commands?
I am having an issue with the LAPACK/BLAS libraries when compiling a C code that needs them.
The issues are, when I run "make", I get:
file.c:(.text+0x1c41): undefined reference to `zgesvd_'
file.c:(.text+0x1c9c): undefined reference to `zgetrf_'
../file.a(SpatialOrientation.o): In function `myfunction.c':myfunction.c:(.text+0x7be): undefined reference to `dsyev_'
And several other such lines, all referring to similar missing references.
I have chased this error down to being something to do with BLAS. I followed the directions given at this excellent link for installing BLAS and put the relevant directory on the path. I also changed my Makefile accordingly to find these libraries.
Any help on this issue would be really appreciated!
Just to update, I recently installed itpp as well, also following the instructional here, since it seemed my missing references were linked to that. No changes so far...
Thanks for your help!
The problem is solved! Hooray! I just danced around my office...
For those who have the same problem, here is what I did:
1) Follow the instructions given here to make the lapack and blas libraries. To paraphrase, for a scientific Linux 6 machine, they are:
wget http://www.netlib.org/lapack/lapack.tgz
tar xvzf lapack.tgz
cd lapack-3.3.0 //if version number changes, change here to the right directory
mv make.inc.example make.inc
2) Then (important bit, also recommended here):
edit make.inc and add -m64 -fPIC flag to fortran compiler options: FORTRAN, OPTS, NOOPT, LOADER
Then
make blaslib
make
Now, what you have is, in /lapack-3.6.1 (or whatever your directory is called after this process), two files:
librefblas.a , and liblapack.a.
3) The next thing I did was to copy librefblas.a and liblapack.a into some subdirectories - i.e. /lib/liblapack for liblapack.a and /lib/libblas for librefblas.a
4) Then, put those directories in your makefile, like this:
LIBDIR1 = /path/lib/lapack
LIBDIR2 = /path/lib/blas
LIBS = -L$(LIBDIR1) -llapack -L$(LIBDIR2) -lblas $(SYSLIBS)
LIBSMPI = -L$(LIBDIR1) -llapack -L$(LIBDIR2) -lblas $(MPILIBS) $(SYSLIBS)
I also added /path/lib/lapack and /path/lib/blas onto my LD_LIBRARY_PATH (and PATH, just-in-case...)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/lib/lapack:/path/lib/blas
export PATH=$PATH:/path/lib/lapack:/path/lib/blas
Then, go to wherever you Makefile is, and type
make
Yay yay yay!
By the way, with the latest version of lapack and blas, obtained in step 1), I compiled with gcc version 5.1.0 and the corresponding mpicc (openmpi 1.10.2).
Hope this helps someone else and shares the absolute delight.
I downloaded libblas.dll win32 version, "Prebuilt dynamic libraries using Mingw" from https://icl.cs.utk.edu/lapack-for-windows/lapack/#libraries and used a g77 blas sample "blas3_d_prb.f" from http://people.sc.fsu.edu/~jburkardt/f77_src/blas3_d/blas3_d.html with my g77 compiler, I already tried by converting "libblas.lib" to "libblas.a" with reimp and pexports etc.... but unsuccessful.
I hope anybody have some experience with using libblas.dll with g77(because linking the dll with g77 is seems to be tricky), I also want to confirm calling convention used by "libblas.dll" std or cdecl(what g77 follows)?
Thanks.
I finally determined the problems compiling this particular fortran blas program:
Actuallly you need sources (for blas0.f and blas3_d.f) not the libblas.dll(Since it is unknown which sources they used)
Blas0.f also required for auxiliary functions used e.g. r8mat_test, r8mat_print etc.
Compile each library i.e. blas0.f and blas3_d.f to object file with this command:
g77 -c blas0.f
g77 -c blas3_d.f
This will produce blas0.o, and blas3_d.o object files then you will compile main prog like this: (PS: Replace trime functs in blas3_d_prf.f with len_trim)
G77.EXE blas3_d_prf.f blas0.o blas3_d.o -o yourblas.exe
It will generate yourblas.exe binary for windows.
Does the objective-c compiler in Xcode know better, or is it faster if I use bit shift for multiplications and divisions by powers of 2?
NSInteger parentIndex = index >> 1; // integer division by 2
Isn't this a bit 1980's? Don't processors run these instructions in the same time these days? I remember back in my 68000 days when a div was 100+ cycles and a shift only 3 or 4... not sure this is the case any more as processors have moved on.
Why don't you get the compiler to generate the assembler file and have a look what it's generating and run some benchmarks.
I found this on the web which may help you... although it's for 'C' I think most of the options will be the same.
Q: How can I peek at the assembly code generated by GCC?
Q: How can I create a file where I can see the C code and its assembly
translation together?
A: Use the -S (note: capital S) switch to GCC, and it will emit the assembly code to a file with a .s extension. For example, the following command:
gcc -O2 -S -c foo.c
will leave the generated assembly code on the file foo.s.
If you want to see the C code together with the assembly it was converted to, use a command line like this:
gcc -c -g -Wa,-a,-ad [other GCC options] foo.c > foo.lst
which will output the combined C/assembly listing to the file foo.lst.
If you need to both get the assembly code and to compile/link the program, you can either give the -save-temps option to GCC (which will leave all the temporary files including the .s file in the current directory), or use the -Wa,aln=foo.s option which instructs the assembler to output the assembly translation of the C code (together with the hex machine code and some additional info) to the file named after the =.
I'm using cmake 2.8.1 on Mac OSX 10.6 with CUDA 3.0.
So I added a CUDA target which needs BLOCK_SIZE set to some number in order to compile.
cuda_add_executable(SimpleTestsCUDA
SimpleTests.cu
BlockMatrix.cpp
Matrix.cpp
)
set_target_properties(SimpleTestsCUDA PROPERTIES COMPILE_FLAGS -DBLOCK_SIZE=3)
When running make VERBOSE=1 I noticed that nvcc is invoked w/o -DBLOCK_SIZE=3, which results in an error, because BLOCK_SIZE is used in the code, but defined nowhere. Now I used the same definition for a CPU target (using add_executable(...)) and there it worked.
So now the questions: How do I figure out what cmake does with the set_target_properties line if it points to a CUDA target? Googling around didn't help so far and a workaround would be cool..
I think the best way to do this is by adding "OPTIONS -DBLOCK_SIZE=3" to cuda_add_executable. So your line would look like this:
cuda_add_executable(SimpleTestsCUDA
SimpleTests.cu
BlockMatrix.cpp
Matrix.cpp
OPTIONS -DBLOCK_SIZE=3
)
Or you can set it before cuda_add_executable:
SET(CUDA_NVCC_FLAGS -DBLOCK_SIZE=3)
The only workaround I found so far is using remove_definitions:
remove_definitions(-DBLOCK_SIZE=3)
add_definitions(-DBLOCK_SIZE=32)
Doing this before a target seems to help.