exp() math.c i c not working if in a loop - embedded

I'm currently working on a pid regulator for at school project.
Since the corona is has shut down the school we can't get any hardware to test it with.
So i want to simulate the PID regulator on a microprocessor (PSoC5LP).
So i'm implementing a function that returns a step respons of the chosen dc motor.
When i'm finding the output of the transferfunction "(-3.47*exp(-6.36*s)+3.47))" i get the right result if i define the value of "s" manually in the code.
But when i set s to be incremented in the code "undefined reference to "exp".
The problem seems to be that i can't run exp in a loop while incrementing it.
tried it in another project, where the only thing that happens is that a loop goes 10 times and prints out the exp() with the nr of times the loop have run (does not work)
float step_respons(){
double s = 0.1;
snprintf(outpurBuffer, sizeof(outpurBuffer), "%f \r\n", (-3.47*exp(-6.36*s)+3.47));
pc_uart_PutString(outpurBuffer);
snprintf(outpurBuffer, sizeof(outpurBuffer), "%f \r\n", (-3.47*exp(-6.36*1)+3.47));
pc_uart_PutString(outpurBuffer);
//These works just fine
pc_uart_PutString("Loop:\r\n");
for(s = 0.1; s < 2; s++){
snprintf(outpurBuffer, sizeof(outpurBuffer), "%f %f \r\n", s,(-3.47*exp(-6.36*s)+3.47));
pc_uart_PutString(outpurBuffer);
//this does not work (if "s" is changed with a number it works fine.)
};
return 0;
};
So it works if i choose S manually in the loop, but if it gets incremented it stops.
I've tried to use another variable and increment that on and it still does not work
for(s = 0.1; s < 2; s++){
snprintf(outpurBuffer, sizeof(outpurBuffer), "%f %f \r\n", s,(-3.47*exp(-6.36*s)+3.47));
pc_uart_PutString(outpurBuffer);
};
does not work
for(s = 0.1; s < 2; s++){
snprintf(outpurBuffer, sizeof(outpurBuffer), "%f %f \r\n", s,(-3.47*exp(-6.36*1)+3.47));
pc_uart_PutString(outpurBuffer);
};
does work but no incrementation of the result is done
As you can se the only changes done in the loop is the "s" have been defined a value
double temp = 0.5;
for(s = 0.1; s < 2; s++){
snprintf(outpurBuffer, sizeof(outpurBuffer), "%f %f \r\n", s,(-3.47*exp(-6.36*temp)+3.47));
pc_uart_PutString(outpurBuffer);
};
this works
double temp = 0.5;
for(s = 0.1; s < 2; s++){
snprintf(outpurBuffer, sizeof(outpurBuffer), "%f %f \r\n", s,(-3.47*exp(-6.36*temp)+3.47));
pc_uart_PutString(outpurBuffer);
temp += 0.5;
};
does not
This is the error: Build error: undefined reference to 'exp'
The log from the compiler output
--------------- Build Started: 04/28/2020 01:34:42 Project: Plotterkode, Configuration: ARM GCC 5.4-2016-q2-update Debug ---------------
The code generation step is up to date.
The compile step is up to date, no work needs to be done.
arm-none-eabi-ar.exe -rs .\CortexM3\ARM_GCC_541\Debug\Plotterkode.a .\CortexM3\ARM_GCC_541\Debug\cy_em_eeprom.o .\CortexM3\ARM_GCC_541\Debug\CyDmac.o .\CortexM3\ARM_GCC_541\Debug\CyFlash.o .\CortexM3\ARM_GCC_541\Debug\CyLib.o .\CortexM3\ARM_GCC_541\Debug\cyPm.o .\CortexM3\ARM_GCC_541\Debug\CySpc.o .\CortexM3\ARM_GCC_541\Debug\cyutils.o .\CortexM3\ARM_GCC_541\Debug\pc_uart.o .\CortexM3\ARM_GCC_541\Debug\pc_uart_PM.o .\CortexM3\ARM_GCC_541\Debug\pc_uart_INT.o .\CortexM3\ARM_GCC_541\Debug\pc_uart_BOOT.o .\CortexM3\ARM_GCC_541\Debug\pc_uart_IntClock.o .\CortexM3\ARM_GCC_541\Debug\isr_pc_uart.o .\CortexM3\ARM_GCC_541\Debug\Pin_adc_input.o .\CortexM3\ARM_GCC_541\Debug\MISO.o .\CortexM3\ARM_GCC_541\Debug\MOSI.o .\CortexM3\ARM_GCC_541\Debug\SCLK.o .\CortexM3\ARM_GCC_541\Debug\pot_adc_sar.o .\CortexM3\ARM_GCC_541\Debug\pot_adc_sar_INT.o .\CortexM3\ARM_GCC_541\Debug\pot_adc_sar_PM.o .\CortexM3\ARM_GCC_541\Debug\ui_spi.o .\CortexM3\ARM_GCC_541\Debug\ui_spi_PM.o .\CortexM3\ARM_GCC_541\Debug\ui_spi_INT.o .\CortexM3\ARM_GCC_541\Debug\pot_adc_sar_IRQ.o .\CortexM3\ARM_GCC_541\Debug\pot_adc_sar_theACLK.o .\CortexM3\ARM_GCC_541\Debug\ui_spi_IntClock.o .\CortexM3\ARM_GCC_541\Debug\motor_pwm.o .\CortexM3\ARM_GCC_541\Debug\motor_pwm_PM.o .\CortexM3\ARM_GCC_541\Debug\Clock.o .\CortexM3\ARM_GCC_541\Debug\pin_pwm_x.o .\CortexM3\ARM_GCC_541\Debug\pin_pwm_y.o .\CortexM3\ARM_GCC_541\Debug\pin_enable.o .\CortexM3\ARM_GCC_541\Debug\pin_border.o .\CortexM3\ARM_GCC_541\Debug\isr_border.o .\CortexM3\ARM_GCC_541\Debug\pin_led.o .\CortexM3\ARM_GCC_541\Debug\isr_goal_left.o .\CortexM3\ARM_GCC_541\Debug\pin_goal_right.o .\CortexM3\ARM_GCC_541\Debug\pin_goal_left.o .\CortexM3\ARM_GCC_541\Debug\isr_goal_right.o .\CortexM3\ARM_GCC_541\Debug\isr_ui_spi.o .\CortexM3\ARM_GCC_541\Debug\AMux.o .\CortexM3\ARM_GCC_541\Debug\CyBootAsmGnu.o
arm-none-eabi-ar.exe: creating .\CortexM3\ARM_GCC_541\Debug\Plotterkode.a
arm-none-eabi-gcc.exe -Wl,--start-group -o C:\Users\Vikto\Desktop\plottercode\Plotterkode.cydsn\CortexM3\ARM_GCC_541\Debug\Plotterkode.elf .\CortexM3\ARM_GCC_541\Debug\main.o .\CortexM3\ARM_GCC_541\Debug\cyfitter_cfg.o .\CortexM3\ARM_GCC_541\Debug\cymetadata.o .\CortexM3\ARM_GCC_541\Debug\Cm3Start.o .\CortexM3\ARM_GCC_541\Debug\Plotterkode.a "C:\Program Files (x86)\Cypress\PSoC Creator\4.2\PSoC Creator\psoc\content\cycomponentlibrary\CyComponentLibrary.cylib\CortexM3\ARM_GCC_541\Debug\CyComponentLibrary.a" -mcpu=cortex-m3 -mthumb -L Generated_Source\PSoC5 -Wl,-Map,.\CortexM3\ARM_GCC_541\Debug/Plotterkode.map -T Generated_Source\PSoC5\cm3gcc.ld -specs=nano.specs -u _printf_float -Wl,--gc-sections -u_printf_float -g -ffunction-sections -Og -ffat-lto-objects -Wl,--end-group
.\CortexM3\ARM_GCC_541\Debug\main.o: In function step_respons':
C:\Users\Vikto\Desktop\plottercode\Plotterkode.cydsn/./plotter_position.h:125: undefined reference toexp'
collect2.exe: error: ld returned 1 exit status
The command 'arm-none-eabi-gcc.exe' failed with exit code '1'.
--------------- Build Failed: 04/28/2020 01:34:42 ---------------

You are likely not linking the library that includes the exp() function.
It works in the case where you're passing it a literal (or a value the compiler can deduce) because the compiler is computing the value and putting that in the code.
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
I believe exp() is included in 'libm', so adding -Wl,-lm (or however else you specify libm in your build system) should fix it.
And if it is already there, try putting it at the end of the list.

Related

problem with sprint/printf with freeRTOS on stm32f7

Since two days I am trying to make printf\sprintf working in my project...
MCU: STM32F722RETx
I tried to use newLib, heap3, heap4, etc, etc. nothing works. HardFault_Handler is run evry time.
Now I am trying to use simple implementation from this link and still the same problem. I suppose my device has some problem with double numbers, becouse program run HardFault_Handler from this line if (value != value) in _ftoa function.( what is strange because this stm32 support FPU)
Do you guys have any idea? (Now I am using heap_4.c)
My compiller options:
target_compile_options(${PROJ_NAME} PUBLIC
$<$<COMPILE_LANGUAGE:CXX>:
-std=c++14
>
-mcpu=cortex-m7
-mthumb
-mfpu=fpv5-d16
-mfloat-abi=hard
-Wall
-ffunction-sections
-fdata-sections
-O1 -g
-DLV_CONF_INCLUDE_SIMPLE
)
Linker options:
target_link_options(${PROJ_NAME} PUBLIC
${LINKER_OPTION} ${LINKER_SCRIPT}
-mcpu=cortex-m7
-mthumb
-mfloat-abi=hard
-mfpu=fpv5-sp-d16
-specs=nosys.specs
-specs=nano.specs
# -Wl,--wrap,malloc
# -Wl,--wrap,_malloc_r
-u_printf_float
-u_sprintf_float
)
Linker script:
/* Highest address of the user mode stack */
_estack = 0x20040000; /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
/* Specify the memory areas */
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 256K
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
}
UPDATE:
I don't think so it is stack problem, I have set configCHECK_FOR_STACK_OVERFLOW to 2, but hook function is never called. I found strange think: This soulution works:
float d = 23.5f;
char buffer[20];
sprintf(buffer, "temp %f", 23.5f);
but this solution not:
float d = 23.5f;
char buffer[20];
sprintf(buffer, "temp %f",d);
No idea why passing variable by copy, generate a HardFault_Handler...
You can implement a hard fault handler that at least will provide you with the SP location to where the issue is occurring. This should provide more insight.
https://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html
It should let you know if your issue is due to a floating point error within the MCU or if it is due to a branching error possibly caused by some linking problem
I also had error with printf when using FreeRTOS for my SiFive HiFive Rev B.
To solve it, I rewrite _fstat and _write functions to change output function of printf
/*
* Retarget functions for printf()
*/
#include <errno.h>
#include <sys/stat.h>
int _fstat (int file, struct stat * st) {
errno = -ENOSYS;
return -1;
}
int _write (int file, char * ptr, int len) {
extern int uart_putc(int c);
int i;
/* Turn character to capital letter and output to UART port */
for (i = 0; i < len; i++) uart_putc((int)*ptr++);
return 0;
}
And create another uart_putc function for UART0 of SiFive HiFive Rev B hardware:
void uart_putc(int c)
{
#define uart0_txdata (*(volatile uint32_t*)(0x10013000)) // uart0 txdata register
#define UART_TXFULL (1 << 31) // uart0 txdata flag
while ((uart0_txdata & UART_TXFULL) != 0) { }
uart0_txdata = c;
}
The newlib C-runtime library (used in many embedded tool chains) internally uses it's own malloc-family routines. newlib maintains some internal buffers and requires some support for thread-safety:
http://www.nadler.com/embedded/newlibAndFreeRTOS.html
hard fault can caused by unaligned Memory Access:
https://www.keil.com/support/docs/3777.htm

What does JVM interpreter (NOT the JIT compiler) actually do?

Please note that my question is around JVM interpreter, not JIT compiler. JIT compiler converts java bytecodes to native machine code. As such, this MUST mean that the interpreter within the JVM DOES NOT convert bytecodes to machine code. Hence the question: in essence what does the interpreter do? If someone can help me answer this with a simple example of bytecodes equivalent of 1+1 = 2, i.e. what does the interpreter do with respect to executing this add operation? (My implicit question is, if interpreter does not translate to machine code which CPU then executes the ADD operation, how then is this operation performed? what machine code is ACTUALLY executed to support this ADD operation?)
The expression 1+1 will compile to the following bytecode:
iconst_1
iconst_1
add
(Actually, it will just compile to iconst_2 because the Java compiler performs constant-folding, but let's ignore that for the purposes of this answer.)
So to find out exactly what the interpreter does for those instructions, we should look at its source code. The relevant sections for const_1 and add start at line 983 and line 1221 respectively, so let's take a look:
#define OPC_CONST_n(opcode, const_type, value) \
CASE(opcode): \
SET_STACK_ ## const_type(value, 0); \
UPDATE_PC_AND_TOS_AND_CONTINUE(1, 1);
OPC_CONST_n(_iconst_m1, INT, -1);
OPC_CONST_n(_iconst_0, INT, 0);
OPC_CONST_n(_iconst_1, INT, 1);
// goes on for several other constants
//...
#define OPC_INT_BINARY(opcname, opname, test) \
CASE(_i##opcname): \
if (test && (STACK_INT(-1) == 0)) { \
VM_JAVA_ERROR(vmSymbols::java_lang_ArithmeticException(), \
"/ by zero", note_div0Check_trap); \
} \
SET_STACK_INT(VMint##opname(STACK_INT(-2), \
STACK_INT(-1)), \
-2); \
UPDATE_PC_AND_TOS_AND_CONTINUE(1, -1); \
// and then the same thing for longs instead of ints
OPC_INT_BINARY(add, Add, 0);
// other operators
The whole thing is inside a switch-statement that examines the opcode of the current instruction.
If we expand the macro-magic, replace the surrounding code with an extremely simplified template and make some simplifying assumptions (such as the stack only consisting of ints), we end up with something like this:
enum OpCode {
_iconst_1, _iadd
};
// ...
int* stack = new int[calculate_maximum_stack_size()];
size_t top_of_stack = 0;
size_t program_counter = 0;
while(program_counter < program_size) {
switch(opcodes[program_counter]) {
case _iconst_1:
// SET_STACK_INT(1, 0);
stack[top_of_stack] = 1;
// UPDATE_PC_AND_TOS_AND_CONTINUE(1, 1);
program_counter += 1;
top_of_stack += 1;
break;
case _iadd:
// SET_STACK_INT(VMintAdd(STACK_INT(-2), STACK_INT(-1)), -2);
stack[top_of_stack - 2] = stack[top_of_stack - 1] + stack[top_of_stack - 2];
// UPDATE_PC_AND_TOS_AND_CONTINUE(1, -1);
program_counter += 1;
top_of_stack += -1;
break;
}
So for 1+1 the sequence of operations would be:
stack[0] = 1;
stack[1] = 1;
stack[0] = stack[1] + stack[0];
And top_of_stack would be 1, so we'd end with a stack that contains the value 2 as its only element.

Rust optimizing out loops?

I was doing some very simple benchmarks to compare performance of C and Rust. I used a function adding integers 1 + 2 + ... + n (something that I could verify by a computation by hand), where n = 10^10.
The code in Rust looks like this:
fn main() {
let limit: u64 = 10000000000;
let mut buf: u64 = 0;
for u64::range(1, limit) |i| {
buf = buf + i;
}
io::println(buf.to_str());
}
The C code is as follows:
#include <stdio.h>
int main()
{
unsigned long long buf = 0;
for(unsigned long long i = 0; i < 10000000000; ++i) {
buf = buf + i;
}
printf("%llu\n", buf);
return 0;
}
I compiled and run them:
$ rustc sum.rs -o sum_rust
$ time ./sum_rust
13106511847580896768
real 6m43.122s
user 6m42.597s
sys 0m0.076s
$ gcc -Wall -std=c99 sum.c -o sum_c
$ time ./sum_c
13106511847580896768
real 1m3.296s
user 1m3.172s
sys 0m0.024s
Then I tried with optimizations flags on, again both C and Rust:
$ rustc sum.rs -o sum_rust -O
$ time ./sum_rust
13106511847580896768
real 0m0.018s
user 0m0.004s
sys 0m0.012s
$ gcc -Wall -std=c99 sum.c -o sum_c -O9
$ time ./sum_c
13106511847580896768
real 0m16.779s
user 0m16.725s
sys 0m0.008s
These results surprised me. I did expected the optimizations to have some effect, but the optimized Rust version is 100000 times faster :).
I tried changing n (the only limitation was u64, the run time was still virtually zero), and even tried a different problem (1^5 + 2^5 + 3^5 + ... + n^5), with similar results: executables compiled with rustc -O are several orders of magnitude faster than without the flag, and are also many times faster than the same algorithm compiled with gcc -O9.
So my question is: what's going on? :) I could understand a compiler optimizing 1 + 2 + .. + n = (n*n + n)/2, but I can't imagine that any compiler could derive a formula for 1^5 + 2^5 + 3^5 + .. + n^5. On the other hand, as far as I can see, the result must've been computed somehow (and it seems to be correct).
Oh, and:
$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
$ rustc --version
rustc 0.6 (dba9337 2013-05-10 05:52:48 -0700)
host: i686-unknown-linux-gnu
Yes, compilers do use the 1 + ... + n = n*(n+1)/2 optimisation to remove the loop, and there are similar tricks for any power of the summation variable. e.g. k1 are triangular numbers, k2 are pyramidal numbers, k3 are squared triangular numbers, etc. In general, there is even a formula to calculate ∑k kp for any p.
You can use a more complicated expression, so that the compiler doesn't have any tricks to remove the loop. e.g.
fn main() {
let limit: u64 = 1000000000;
let mut buf: u64 = 0;
for u64::range(1, limit) |i| {
buf += i + i ^ (i*i);
}
io::println(buf.to_str());
}
and
#include <stdio.h>
int main()
{
unsigned long long buf = 0;
for(unsigned long long i = 0; i < 1000000000; ++i) {
buf += i + i ^ (i * i);
}
printf("%llu\n", buf);
return 0;
}
which gives me
real 0m0.700s
user 0m0.692s
sys 0m0.004s
and
real 0m0.698s
user 0m0.692s
sys 0m0.000s
respectively (with -O for both compilers).

Building release version of a project with Android NDK r6

I am compiling helloworld example of Android NDK r6b using cygwin and Windows Vista. I have noticed that the following code takes between 14 and 20 mseconds on my Android phone (it has an 800mhz CPU Qualcomm MSM7227T chipset, with hardware floating point support):
float *v1, *v2, *v3, tot;
int num = 50000;
v1 = new float[num];
v2 = new float[num];
v3 = new float[num];
// Initialize vectors. RandomEqualREAL() returns a floating point number in a specified range.
for ( int i = 0; i < num; i++ )
{
v1[i] = RandomEqualREAL( -10.0f, 10.0f );
if (v1[i] == 0.0f) v1[i] = 1.0f;
v2[i] = RandomEqualREAL( -10.0f, 10.0f );
if (v2[i] == 0.0f) v2[i] = 1.0f;
}
clock_t start = clock() / (CLOCKS_PER_SEC / 1000);
tot = 0.0f;
for ( int k = 0; k < 1000; k++)
{
for ( int i = 0; i < num; i++ )
{
v3[i] = v1[i] / (v2[i]);
tot += v3[i];
}
}
clock_t end = clock() / (CLOCKS_PER_SEC / 1000);
printf("time %f\n", tot, (end-start)/1000.0f);
On my 2.4ghz notebook it takes .45 msec (timings taken when the system is full of other programs running, like Chrome, 2/3 ides, .pdf opens etc...). I wonder if the helloworld application is builded as a release version. I noticed that g++ get called with
-msoft-float.
Does this means that it is using floating point emulations?
What command line options i need to use in order to build an optimized version of the program? How to specify those options?
This is how g++ get called.:
/cygdrive/d/android/android-ndk-r6b/toolchains/arm-linux-androideabi-4.4.3/prebu
ilt/windows/bin/arm-linux-androideabi-g++ -MMD -MP -MF D:/android/workspace/hell
oworld/obj/local/armeabi/objs/ndkfoo/ndkfoo.o.d.org -fpic -ffunction-sections -f
unwind-tables -fstack-protector -D__ARM_ARCH_5__ -D__ARM_ARCH_5T__ -D__ARM_ARCH_
5E__ -D__ARM_ARCH_5TE__ -Wno-psabi -march=armv5te -mtune=xscale -msoft-float -f
no-exceptions -fno-rtti -mthumb -Os -fomit-frame-pointer -fno-strict-aliasing -f
inline-limit=64 -ID:/android/workspace/helloworld/jni/boost -ID:/android/workspa
ce/helloworld/jni/../../mylib/jni -ID:/android/android-ndk-r6b/sources/cxx-stl/g
nu-libstdc++/include -ID:/android/android-ndk-r6b/sources/cxx-stl/gnu-libstdc++/
libs/armeabi/include -ID:/android/workspace/helloworld/jni -DANDROID -Wa,--noex
ecstack -fexceptions -frtti -O2 -DNDEBUG -g -ID:/android/android-ndk-r6b/plat
forms/android-9/arch-arm/usr/include -c D:/android/workspace/helloworld/jni/ndk
foo.cpp -o D:/android/workspace/helloworld/obj/local/armeabi/objs/ndkfoo/ndkfoo.
o && ( if [ -f "D:/android/workspace/helloworld/obj/local/armeabi/objs/ndkfoo/nd
kfoo.o.d.org" ]; then awk -f /cygdrive/d/android/android-ndk-r6b/build/awk/conve
rt-deps-to-cygwin.awk D:/android/workspace/helloworld/obj/local/armeabi/objs/ndk
foo/ndkfoo.o.d.org > D:/android/workspace/helloworld/obj/local/armeabi/objs/ndkf
oo/ndkfoo.o.d && rm -f D:/android/workspace/helloworld/obj/local/armeabi/objs/nd
kfoo/ndkfoo.o.d.org; fi )
Prebuilt : libstdc++.a <= <NDK>/sources/cxx-stl/gnu-libstdc++/libs/armeabi
/
cp -f /cygdrive/d/android/android-ndk-r6b/sources/cxx-stl/gnu-libstdc++/libs/arm
eabi/libstdc++.a /cygdrive/d/android/workspace/helloworld/obj/local/armeabi/libs
tdc++.a
SharedLibrary : libndkfoo.so
/cygdrive/d/android/android-ndk-r6b/toolchains/arm-linux-androideabi-4.4.3/prebu
ilt/windows/bin/arm-linux-androideabi-g++ -Wl,-soname,libndkfoo.so -shared --sys
root=D:/android/android-ndk-r6b/platforms/android-9/arch-arm D:/android/workspac
e/helloworld/obj/local/armeabi/objs/ndkfoo/ndkfoo.o D:/android/workspace/hellow
orld/obj/local/armeabi/libstdc++.a D:/android/android-ndk-r6b/toolchains/arm-lin
ux-androideabi-4.4.3/prebuilt/windows/bin/../lib/gcc/arm-linux-androideabi/4.4.3
/libgcc.a -Wl,--no-undefined -Wl,-z,noexecstack -lc -lm -lsupc++ -o D:/androi
d/workspace/helloworld/obj/local/armeabi/libndkfoo.so
Install : libndkfoo.so => libs/armeabi/libndkfoo.so
mkdir -p /cygdrive/d/android/workspace/helloworld/libs/armeabi
install -p /cygdrive/d/android/workspace/helloworld/obj/local/armeabi/libndkfoo.
so /cygdrive/d/android/workspace/helloworld/libs/armeabi/libndkfoo.so
/cygdrive/d/android/android-ndk-r6b/toolchains/arm-linux-androideabi-4.4.3/prebu
ilt/windows/bin/arm-linux-androideabi-strip --strip-unneeded D:/android/workspac
e/helloworld/libs/armeabi/libndkfoo.so
Edit.
I have run the commnad adb shell cat /proc/cpuinfo. This is the result:
Processor : ARMv6-compatible processor rev 5 (v6l)
BogoMIPS : 532.48
Features : swp half thumb fastmult vfp edsp java
CPU implementer : 0x41
CPU architecture: 6TEJ
CPU variant : 0x1
CPU part : 0xb36
CPU revision : 5
Hardware : GELATO Global board (LGE LGP690)
Revision : 0000
Serial : 0000000000000000
I don't understand what swp, half thumb fastmult vfp edsp and java means, but i don't like that 'vfp'!! Does it means virtual-floating points? That processor should have a floating point unit...
You are right, -msoft-float is a synonym for -mfloat-abi=soft (see list of gcc ARM options) and means floating point emulation.
For hardware floating point the following flags can be used:
LOCAL_CFLAGS += -march=armv6 -marm -mfloat-abi=softfp -mfpu=vfp
To see what floating point unit you really have on your device you can check the output of adb shell cat /proc/cpuinfo command. Some units are compatible with another: vfp < vfpv3-d16 < vfpv3 < neon - so if you have vfpv3, then vfp also works for you.
Also you might want to add the line
APP_OPTIM := release
into your Application.mk file. This setting overrides automatic 'debug' mode for native part of application if the manifest sets android:debuggable to 'true'
But even with all these settings NDK will put -march=armv5te -mtune=xscale -msoft-float into the beginning of compiler options. And this behavior can not be changed without modifications in NDK sources (these options are hardcoded in file $NDKROOT\toolchains\arm-linux-androideabi-4.4.3\setup.mk).

What compilers can detect pure mathematical functions and optimize them (without telling you so)?

I have seen that GCC is not able to detect pure mathematical functions and it needs you to provide the attribute "const" to indicate that.
What compilers can detect pure mathematical functions and optimize them (without telling you so)?
To do so is inherently risky in languages that have pointers and lack global compilation & analysis. So, if a an operation is declared non-const, the compiler must assume it could have side-effects.
Example:
//getx.cpp
int GetX(int input)
{
int* pData = (int*) input;
*pData = 50;
return 0;
}
// gety.cpp
int GetY(int input)
{
return GetX(input + 4);
}
// main.cpp
int main()
{
int arg[] { 0, 4 };
return GetY((int)arg);
}
The compiler while compiling GetY can't tell that GetX treats its argument as a pointer and dereferences and modifies data in a non-functional, side-effect-prone manner. That information is only available during linking so you'd have to re-invent the concept of linking to include a lot of code generation and analysis to support such a feature.
It's not really (afaik) the compiler that does this, but when writing C# in Visual Studio when using the plugin ReSharper, you can get compile time hints that indicate that it is possible to declare something as const. On the other hand, that doesn't go under the category "without telling you so", so it might not be what you're looking for...
It seems that gcc now does: doing "gcc -O2 -S" on the following code, and reading the assembly, the call to foo() from within test() is identified as pure and moved outside of the loop:
#include <stdio.h>
double __attribute__((noinline)) foo(double x)
{
x = x + 1;
x = x * x;
if (x > 20)
x -= 1;
x -= x * x;
return x;
}
void test(int iters, double x)
{
int i;
for (i = 0; i < iters; ++i) {
printf("%g\n", foo(x));
}
}
This is Fedora 22, gcc 5.1.1, x86_64. I haven't tried, but with -flto, I would expect this to work across compilation units.
Also, it is worth noting that today gcc has the command line options -Wsuggest-attribute=pure and -Wsuggest-attribute=const.