Systick interrupt function not getting called on stm32F103RB nucleo board - interrupt

I am trying to implement a very simple program to call a function from systick interrupt on stm32f103rb board. The program runs fine but it never calls the interrupt function. I have gone through many fourms and experimented with different register values but I am not sure what I am missing here. My startup program and test program are below:
startup.s ::
.data
arr: .4byte 0x20001000 # Read-only array of bytes
.4byte start+1
.4byte reset1
.4byte reset2
.4byte reset3
.4byte reset4
.4byte reset5
.4byte reset6
.4byte reset7
.4byte reset8
.4byte reset9
.4byte reset10
.4byte reset11
.4byte reset12
.4byte reset13
.4byte reset14
.4byte reset15
.4byte reset16
eoa:
.text
reset1: b reset1
reset2: b reset2
reset3: b reset3
reset4: b reset4
reset5: b reset5
reset6: b reset6
reset7: b reset7
reset8: b reset8
reset9: b reset9
reset10: b reset10
reset11: b sysTickFunc
reset12: b sysTickFunc
reset13: b sysTickFunc
reset14: b sysTickFunc
reset15: b sysTickFunc
reset16: b sysTickFunc
start: # Label, not really required
mov r4, #4 # Load register r0 with the value 5
mov r5, #5 # Load register r1 with the value 4
add r6, r4, r5 # Add r0 and r1 and store in r2
ldr r4, =0x40021000
str r1, [r4]
cpsie i
b test_func
stop: b stop # Infinite loop to stop execution
Test.c - test function and systick function implementation::
#define SYSTICK_CTRL (volatile unsigned int *)( 0xE000E010 )
#define SYSTICK_LOAD (volatile unsigned int *)( 0xE000E014 )
#define SYSTICK_VAL (volatile unsigned int *)( 0xE000E018 )
#define PORT_C_CRL (volatile unsigned int *)( 0x40011000 )
#define PORT_C_CRH (volatile unsigned int *)( 0x40011004 )
#define PORT_C_ODR (volatile unsigned int *)( 0x4001100C )
#define APB2 (volatile unsigned int *)( 0x40021018 )
#define AHB (volatile unsigned int *)( 0x40021014 )
void sysTickFunc(void);
void test_func(void)
{
volatile unsigned int * p;
unsigned int x;
p = SYSTICK_CTRL;
*p = 7; /*CLKSRC to processor clock, TICK INT is 1, COUNTER ENABLE is 1 */
p = SYSTICK_LOAD;
*p = 20;
p = (volatile unsigned int *)( 0xE000E01C );
*p = 0x00002328;
x = 0;
/* loop in while and check if systick function is called */
while(1)
{
x++;
}
}
void sysTickFunc(void)
{
while(1)
{
asm("mov r0,0xCCCCCCCC;");
asm("mov r1,0xDDDDDDDD;");
asm("mov r2,0xBBBBBBBB;");
asm("mov r3,0x11111111;");
asm("mov r4,0x22222222;");
asm("mov r5,0x33333333;");
asm("mov r6,0x44444444;");
}
}
Linker file:
SECTIONS {
. = 0x08000000;
.data : { * (.data)}
. = 0x08003000;
.text : {* (.text)}
}
build script:
arm-none-eabi-gcc -nostdlib -mcpu=cortex-m3 -mthumb -g -o add.elf -T stm.ld test.c startup.s
arm-none-eabi-objcopy -O binary add.elf add.bin
dd if=/dev/zero of=flash.bin bs=4096 count=4096
dd if=add.bin of=flash.bin bs=4096 conv=notrunc
Can someone help me what is wrong in my code? When I run it with arm gdb on ubuntu, I never get my systick function called. ( I know I have put systick function in many places in the vector table. It was an attempt to check the possibility of sysTickFunction not being at right location in the vector table. )
Thanks
Ravi

Issue #1
How are you checking if the systick got called? Looking at code, you're just setting a few registers, so I assume you're checking those.
This will not work as you might think, as exception entry will store r0-r3 (and some other registers) on stack. You might want to read Cortex-M3 Technical Reference Manual (chapter on Exceptions) for details.
Issue #2
Your linker file lists .text at an offset. That won't work, since reset vectors need to be at beginning. You want to place reset vectors in a separate section that goes at beginning of FLASH, then put .text and .data. Don't forget to also put .data and .bss into RAM.
Issue #3
For start you store the pointer+1 (thumb code), but for any other vectors you don't.
Issue #4
You might also need to enable appropriate clocks before using systick registers, and enable interrupts globally and through NVIC to actually receive interrupts.
I'd also suggest you write clearer code - it makes things easier for you, and for us, trying to read it.

Related

problem with sprint/printf with freeRTOS on stm32f7

Since two days I am trying to make printf\sprintf working in my project...
MCU: STM32F722RETx
I tried to use newLib, heap3, heap4, etc, etc. nothing works. HardFault_Handler is run evry time.
Now I am trying to use simple implementation from this link and still the same problem. I suppose my device has some problem with double numbers, becouse program run HardFault_Handler from this line if (value != value) in _ftoa function.( what is strange because this stm32 support FPU)
Do you guys have any idea? (Now I am using heap_4.c)
My compiller options:
target_compile_options(${PROJ_NAME} PUBLIC
$<$<COMPILE_LANGUAGE:CXX>:
-std=c++14
>
-mcpu=cortex-m7
-mthumb
-mfpu=fpv5-d16
-mfloat-abi=hard
-Wall
-ffunction-sections
-fdata-sections
-O1 -g
-DLV_CONF_INCLUDE_SIMPLE
)
Linker options:
target_link_options(${PROJ_NAME} PUBLIC
${LINKER_OPTION} ${LINKER_SCRIPT}
-mcpu=cortex-m7
-mthumb
-mfloat-abi=hard
-mfpu=fpv5-sp-d16
-specs=nosys.specs
-specs=nano.specs
# -Wl,--wrap,malloc
# -Wl,--wrap,_malloc_r
-u_printf_float
-u_sprintf_float
)
Linker script:
/* Highest address of the user mode stack */
_estack = 0x20040000; /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */
/* Specify the memory areas */
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 256K
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
}
UPDATE:
I don't think so it is stack problem, I have set configCHECK_FOR_STACK_OVERFLOW to 2, but hook function is never called. I found strange think: This soulution works:
float d = 23.5f;
char buffer[20];
sprintf(buffer, "temp %f", 23.5f);
but this solution not:
float d = 23.5f;
char buffer[20];
sprintf(buffer, "temp %f",d);
No idea why passing variable by copy, generate a HardFault_Handler...
You can implement a hard fault handler that at least will provide you with the SP location to where the issue is occurring. This should provide more insight.
https://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html
It should let you know if your issue is due to a floating point error within the MCU or if it is due to a branching error possibly caused by some linking problem
I also had error with printf when using FreeRTOS for my SiFive HiFive Rev B.
To solve it, I rewrite _fstat and _write functions to change output function of printf
/*
* Retarget functions for printf()
*/
#include <errno.h>
#include <sys/stat.h>
int _fstat (int file, struct stat * st) {
errno = -ENOSYS;
return -1;
}
int _write (int file, char * ptr, int len) {
extern int uart_putc(int c);
int i;
/* Turn character to capital letter and output to UART port */
for (i = 0; i < len; i++) uart_putc((int)*ptr++);
return 0;
}
And create another uart_putc function for UART0 of SiFive HiFive Rev B hardware:
void uart_putc(int c)
{
#define uart0_txdata (*(volatile uint32_t*)(0x10013000)) // uart0 txdata register
#define UART_TXFULL (1 << 31) // uart0 txdata flag
while ((uart0_txdata & UART_TXFULL) != 0) { }
uart0_txdata = c;
}
The newlib C-runtime library (used in many embedded tool chains) internally uses it's own malloc-family routines. newlib maintains some internal buffers and requires some support for thread-safety:
http://www.nadler.com/embedded/newlibAndFreeRTOS.html
hard fault can caused by unaligned Memory Access:
https://www.keil.com/support/docs/3777.htm

Addressing pins of Register in microcontrollers

I'm working on Keil software and using LM3S316 microcontroller. Usually we address registers in microcontrollers in form of:
#define GPIO_PORTC_DATA_R (*((volatile uint32_t *)0x400063FC))
My question is how can I access to single pin of register for example, if I have this method:
char process_key(int a)
{ PC_0 = a ;}
How can I get PC_0 and how to define it?
Thank you
Given say:
#define PIN0 (1u<<0)
#define PIN1 (1u<<1)
#define PIN2 (1u<<2)
// etc...
Then:
char process_key(int a)
{
if( a != 0 )
{
// Set bit
GPIO_PORTC_DATA_R |= PIN0 ;
}
else
{
// Clear bit
GPIO_PORTC_DATA_R &= ~PIN0 ;
}
}
A generalisation of this idiomatic technique is presented at How do you set, clear, and toggle a single bit?
However the read-modify-write implied by |= / &= can be problematic if the register might be accessed in different thread/interrupt contexts, as well as adding a possibly undesirable overhead. Cortex-M3/4 parts have a feature known as bit-banding that allows individual bits to be addressed directly and atomically. Given:
volatile uint32_t* getBitBandAddress( volatile const void* address, int bit )
{
__IO uint32_t* bit_address = 0;
uint32_t addr = reinterpret_cast<uint32_t>(address);
// This bit maniplation makes the function valid for RAM
// and Peripheral bitband regions
uint32_t word_band_base = addr & 0xf0000000u;
uint32_t bit_band_base = word_band_base | 0x02000000u;
uint32_t offset = addr - word_band_base;
// Calculate bit band address
bit_address = reinterpret_cast<__IO uint32_t*>(bit_band_base + (offset * 32u) + (static_cast<uint32_t>(bit) * 4u));
return bit_address ;
}
Then you can have:
char process_key(int a)
{
static volatile uint32_t* PC0_BB_ADDR = getBitBandAddress( &GPIO_PORTC_DATA_R, 0 ) ;
*PC0_BB_ADDR = a ;
}
You could of course determine and hard-code the bit-band address; for example:
#define PC0 (*((volatile uint32_t *)0x420C7F88u))
Then:
char process_key(int a)
{
PC0 = a ;
}
Details of the bit-band address calculation can be found ARM Cortex-M Technical Reference Manual, and there is an on-line calculator here.

sem_open - valgrind complains about uninitialised bytes

I have a trivial program:
int main(void)
{
const char sname[]="xxx";
sem_t *pSemaphor;
if ((pSemaphor = sem_open(sname, O_CREAT, 0644, 0)) == SEM_FAILED) {
perror("semaphore initilization");
exit(1);
}
sem_unlink(sname);
sem_close(pSemaphor);
}
When I run it under valgrind, I get the following error:
==12702== Syscall param write(buf) points to uninitialised byte(s)
==12702== at 0x4E457A0: __write_nocancel (syscall-template.S:81)
==12702== by 0x4E446FC: sem_open (sem_open.c:245)
==12702== by 0x4007D0: main (test.cpp:15)
==12702== Address 0xfff00023c is on thread 1's stack
==12702== in frame #1, created by sem_open (sem_open.c:139)
The code was extracted from a bigger project where it ran successfully for years, but now it is causing segmentation fault.
The valgrind error from my example is the same as seen in the bigger project, but there it causes a crash, which my small example doesn't.
I see this with glibc 2.27-5 on Debian. In my case I only open the semaphores right at the start of a long-running program and it seems harmless so far - just annoying.
Looking at the code for sem_open.c which is available at:
https://code.woboq.org/userspace/glibc/nptl/sem_open.c.html
It seems that valgrind is complaining about the line (270 as I look now):
if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem.initsem, sizeof (sem_t)))
== sizeof (sem_t)
However sem.initsem is properly initialised earlier in a fairly baroque manner, firstly by explicitly setting fields in the sem.newsem (part of the union), and then once that is done by a call to memset (L226-228):
/* Initialize the remaining bytes as well. */
memset ((char *) &sem.initsem + sizeof (struct new_sem), '\0',
sizeof (sem_t) - sizeof (struct new_sem));
I think that this particular shenanigans is all quite optimal, but we need to make sure that all of the fields of new_sem have actually been initialised... we find the definition in https://code.woboq.org/userspace/glibc/sysdeps/nptl/internaltypes.h.html and it is this wonderful creation:
struct new_sem
{
#if __HAVE_64B_ATOMICS
/* The data field holds both value (in the least-significant 32 bytes) and
nwaiters. */
# if __BYTE_ORDER == __LITTLE_ENDIAN
# define SEM_VALUE_OFFSET 0
# elif __BYTE_ORDER == __BIG_ENDIAN
# define SEM_VALUE_OFFSET 1
# else
# error Unsupported byte order.
# endif
# define SEM_NWAITERS_SHIFT 32
# define SEM_VALUE_MASK (~(unsigned int)0)
uint64_t data;
int private;
int pad;
#else
# define SEM_VALUE_SHIFT 1
# define SEM_NWAITERS_MASK ((unsigned int)1)
unsigned int value;
int private;
int pad;
unsigned int nwaiters;
#endif
};
So if we __HAVE_64B_ATOMICS then the structure has a data field which contains both the value and the nwaiters, otherwise these are separate fields.
In the initialisation of sem.newsem we can see that these are initialised correctly, as follows:
#if __HAVE_64B_ATOMICS
sem.newsem.data = value;
#else
sem.newsem.value = value << SEM_VALUE_SHIFT;
sem.newsem.nwaiters = 0;
#endif
/* pad is used as a mutex on pre-v9 sparc and ignored otherwise. */
sem.newsem.pad = 0;
/* This always is a shared semaphore. */
sem.newsem.private = FUTEX_SHARED;
I'm doing all of this on a 64-bit system, so I think that valgrind is complaining about the initialisation of the 64-bit sem.newsem.data with a 32-bit value since from:
value = va_arg (ap, unsigned int);
We can see that value is defined simply as an unsigned int which will usually still be 32 bits even on a 64-bit system (see What should be the sizeof(int) on a 64-bit machine?), but that should just be an implicit cast to 64-bits when it is assigned.
So I think this is not a bug - just valgrind getting confused.

How unwind ARM Cortex M3 stack

The ARM Coretex STM32's HardFault_Handler can only get several registers values, r0, r1,r2, r3, lr, pc, xPSR, when crash happened. But there is no FP and SP in the stack. Thus I could not unwind the stack.
Is there any solution for this? Thanks a lot.
[update]
Following a web instruction to let ARMGCC(Keil uvision IDE) generate FP by adding a compiling option "--use_frame_pointer", but I could not find the FP in the stack. I am a real newbie here. Below is my demo code:
int test2(int i, int j)
{
return i/j;
}
int main()
{
SCB->CCR |= 0x10;
int a = 10;
int b = 0;
int c;
c = test2(a,b);
}
enum { r0 = 0, r1, r2, r3, r11, r12, lr, pc, psr};
void Hard_Fault_Handler(uint32_t *faultStackAddress)
{
uint32_t r0_val = faultStackAddress[r0];
uint32_t r1_val = faultStackAddress[r1];
uint32_t r2_val = faultStackAddress[r2];
uint32_t r3_val = faultStackAddress[r3];
uint32_t r12_val = faultStackAddress[r12];
uint32_t r11_val = faultStackAddress[r11];
uint32_t lr_val = faultStackAddress[lr];
uint32_t pc_val = faultStackAddress[pc];
uint32_t psr_val = faultStackAddress[psr];
}
I have two questions here:
1. I am not sure where the index of FP(r11) in the stack, or whether it is pushed into stack or not. I assume it is before r12, because I compared the assemble source before and after adding the option "--use_frame_pointer". I also compared the values read from Hard_Fault_Handler, seems like r11 is not in the stack. Because r11 address I read points to a place where the code is not my code.
[update] I have confirmed that FP is pushed into the stack. The second question still needs to be answered.
See below snippet code:
Without the option "--use_frame_pointer"
test2 PROC
MOVS r0,#3
BX lr
ENDP
main PROC
PUSH {lr}
MOVS r0,#0
BL test2
MOVS r0,#0
POP {pc}
ENDP
with the option "--use_frame_pointer"
test2 PROC
PUSH {r11,lr}
ADD r11,sp,#4
MOVS r0,#3
MOV sp,r11
SUB sp,sp,#4
POP {r11,pc}
ENDP
main PROC
PUSH {r11,lr}
ADD r11,sp,#4
MOVS r0,#0
BL test2
MOVS r0,#0
MOV sp,r11
SUB sp,sp,#4
POP {r11,pc}
ENDP
2. Seems like FP is not in the input parameter faultStackAddress of Hard_Fault_Handler(), where can I get the caller's FP to unwind the stack?
[update again]
Now I understood the last FP(r11) is not stored in the stack. All I need to do is to read the value of r11 register, then I can unwind the whole stack.
So now my final question is how to read it using inline assembler of C. I tried below code, but failed to read the correct value from r11 following the reference of http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0472f/Cihfhjhg.html
volatile int top_fp;
__asm
{
mov top_fp, r11
}
r11's value is 0x20009DCC
top_fp's value is 0x00000004
[update 3] Below is my whole code.
int test5(int i, int j, int k)
{
char a[128] = {0} ;
a[0] = 'a';
return i/j;
}
int test2(int i, int j)
{
char a[18] = {0} ;
a[0] = 'a';
return test5(i, j, 0);
}
int main()
{
SCB->CCR |= 0x10;
int a = 10;
int b = 0;
int c;
c = test2(a,b); //create a divide by zero crash
}
/* The fault handler implementation calls a function called Hard_Fault_Handler(). */
#if defined(__CC_ARM)
__asm void HardFault_Handler(void)
{
TST lr, #4
ITE EQ
MRSEQ r0, MSP
MRSNE r0, PSP
B __cpp(Hard_Fault_Handler)
}
#else
void HardFault_Handler(void)
{
__asm("TST lr, #4");
__asm("ITE EQ");
__asm("MRSEQ r0, MSP");
__asm("MRSNE r0, PSP");
__asm("B Hard_Fault_Handler");
}
#endif
void Hard_Fault_Handler(uint32_t *faultStackAddress)
{
volatile int top_fp;
__asm
{
mov top_fp, r11
}
//TODO: use top_fp to unwind the whole stack.
}
[update 4] Finally, I made it out. My solution:
Note: To access r11, we have to use embedded assembler, see here, which costs me much time to figure it out.
//we have to use embedded assembler.
__asm int getRegisterR11()
{
mov r0,r11
BX LR
}
//call it from Hard_Fault_Handler function.
/*
Function call stack frame:
FP1(r11) -> | lr |(High Address)
| FP2|(prev FP)
| ...|
Current FP(r11) ->| lr |
| FP1|(prev FP)
| ...|(Low Address)
With FP, we can access lr(link register) which is the address to return when the current functions returns(where you were).
Then (current FP - 1) points to prev FP.
Thus we can unwind the stack.
*/
void unwindBacktrace(uint32_t topFp, uint16_t* backtrace)
{
uint32_t nextFp = topFp;
int j = 0;
//#define BACK_TRACE_DEPTH 5
//loop backtrace using FP(r11), save lr into an uint16_t array.
for(int i = 0; i < BACK_TRACE_DEPTH; i++)
{
uint32_t lr = *((uint32_t*)nextFp);
if ((lr >= 0x08000000) && (lr <= 0x08FFFFFF))
{
backtrace[j*2] = LOW_16_BITS(lr);
backtrace[j*2 + 1] = HIGH_16_BITS(lr);
j += 1;
}
nextFp = *((uint32_t*)nextFp - 1);
if (nextFp == 0)
{
break;
}
}
}
#if defined(__CC_ARM)
__asm void HardFault_Handler(void)
{
TST lr, #4
ITE EQ
MRSEQ r0, MSP
MRSNE r0, PSP
B __cpp(Hard_Fault_Handler)
}
#else
void HardFault_Handler(void)
{
__asm("TST lr, #4");
__asm("ITE EQ");
__asm("MRSEQ r0, MSP");
__asm("MRSNE r0, PSP");
__asm("B Hard_Fault_Handler");
}
#endif
void Hard_Fault_Handler(uint32_t *faultStackAddress)
{
//get back trace
int topFp = getRegisterR11();
unwindBacktrace(topFp, persistentData.faultStack.back_trace);
}
Very primitive method to unwind the stack in such case is to read all stack memory above SP seen at the time of HardFault_Handler and process it using arm-none-eabi-addr2line. All link register entries saved on stack will be transformed into source line (remember that actual code path goes the line before LR points to). Note, if functions in between were called using branch instruction (b) instead of branch and link (bl) you'll not see them using this method.
(I don't have enough reputation points to write comments, so I'm editing my answer):
UPDATE for question 2:
Why do you expect that Hard_Fault_Handler has any arguments? Hard_Fault_Handler is usally a function to which address is stored in vector (exception) table. When the processor exception happens then Hard_Fault_Handler will be executed. There is no arguments passing involved doing this. But still, all registers at the time the fault happens are preserved. Specifically, if you compiled without omit-frame-pointer you can just read value of R11 (or R7 in Thumb-2 mode). However, to be sure that in your code Hard_Fault_Handler is actually a real hard fault handler, look into startup.s code and see if Hard_Fault_Handler is at the third entry in vector table. If there is an other function, it means Hard_Fault_Handler is just called from that function explicitly. See this article for details. You can also read my blog :) There is a chapter about stack which is based on Android example, but a lot of things are the same in general.
Also note, most probably in faultStackAddress should be stored a stack pointer, not a frame pointer.
UPDATE 2
Ok, lets clarify some things. Firstly, please paste the code from which you call Hard_Fault_Handler. Secondly, I guess you call it from within real HardFault exception handler. In that case you cannot expect that R11 will be at faultStackAddress[r11]. You've already mentioned it at the first sentence in your question. There will be only r0-r3, r12, lr, pc and psr.
You've also written:
But there is no FP and SP in the stack. Thus I could not unwind the
stack. Is there any solution for this?
The SP is not "in the stack" because you have it already in one of the stack registers (msp or psp). See again THIS ARTICLE. Also, FP is not crucial to unwind stack because you can do it without it (by "navigating" through saved Link Registers). Other thing is that if you dump memory below your SP you can expect FP to be just next to saved LR if you really need it.
Answering your last question: I don't now how you're verifying this code and how you're calling it (you need to paste full code). You can look into assembly of that function and see what's happening under the hood. Other thing you can do is to follow this post as a template.

are 2^n exponent calculations really less efficient than bit-shifts?

if I do:
int x = 4;
pow(2, x);
Is that really that much less efficient than just doing:
1 << 4
?
Yes. An easy way to show this is to compile the following two functions that do the same thing and then look at the disassembly.
#include <stdint.h>
#include <math.h>
uint32_t foo1(uint32_t shftAmt) {
return pow(2, shftAmt);
}
uint32_t foo2(uint32_t shftAmt) {
return (1 << shftAmt);
}
cc -arch armv7 -O3 -S -o - shift.c (I happen to find ARM asm easier to read but if you want x86 just remove the arch flag)
_foo1:
# BB#0:
push {r7, lr}
vmov s0, r0
mov r7, sp
vcvt.f64.u32 d16, s0
vmov r0, r1, d16
blx _exp2
vmov d16, r0, r1
vcvt.u32.f64 s0, d16
vmov r0, s0
pop {r7, pc}
_foo2:
# BB#0:
movs r1, #1
lsl.w r0, r1, r0
bx lr
You can see foo2 only takes 2 instructions vs foo1 which takes several instructions. It has to move the data to the FP HW registers (vmov), convert the integer to a float (vcvt.f64.u32) call the exp function and then convert the answer back to an uint (vcvt.u32.f64) and move it from the FP HW back to the GP registers.
Yes. Though by how much I can't say. The easiest way to determine that is to benchmark it.
The pow function uses doubles... At least, if it conforms to the C standard. Even if that function used bitshift when it sees a base of 2, there would still be testing and branching to reach that conclusion, by which time your simple bitshift would be completed. And we haven't even considered the overhead of a function call yet.
For equivalency, I assume you meant to use 1 << x instead of 1 << 4.
Perhaps a compiler could optimize both of these, but it's far less likely to optimize a call to pow. If you need the fastest way to compute a power of 2, do it with shifting.
Update... Since I mentioned it's easy to benchmark, I decided to do just that. I happen to have Windows and Visual C++ handy so I used that. Results will vary. My program:
#include <Windows.h>
#include <cstdio>
#include <cmath>
#include <ctime>
LARGE_INTEGER liFreq, liStart, liStop;
inline void StartTimer()
{
QueryPerformanceCounter(&liStart);
}
inline double ReportTimer()
{
QueryPerformanceCounter(&liStop);
double milli = 1000.0 * double(liStop.QuadPart - liStart.QuadPart) / double(liFreq.QuadPart);
printf( "%.3f ms\n", milli );
return milli;
}
int main()
{
QueryPerformanceFrequency(&liFreq);
const size_t nTests = 10000000;
int x = 4;
int sumPow = 0;
int sumShift = 0;
double powTime, shiftTime;
// Make an array of random exponents to use in tests.
const size_t nExp = 10000;
int e[nExp];
srand( (unsigned int)time(NULL) );
for( int i = 0; i < nExp; i++ ) e[i] = rand() % 31;
// Test power.
StartTimer();
for( size_t i = 0; i < nTests; i++ )
{
int y = (int)pow(2, (double)e[i%nExp]);
sumPow += y;
}
powTime = ReportTimer();
// Test shifting.
StartTimer();
for( size_t i = 0; i < nTests; i++ )
{
int y = 1 << e[i%nExp];
sumShift += y;
}
shiftTime = ReportTimer();
// The compiler shouldn't optimize out our loops if we need to display a result.
printf( "Sum power: %d\n", sumPow );
printf( "Sum shift: %d\n", sumShift );
printf( "Time ratio of pow versus shift: %.2f\n", powTime / shiftTime );
system("pause");
return 0;
}
My output:
379.466 ms
15.862 ms
Sum power: 157650768
Sum shift: 157650768
Time ratio of pow versus shift: 23.92
That depends on the compiler, but in general (when the compiler is not totally braindead) yes, the shift is one CPU instruction, the other is a function call, that involves saving the current state an setting up a stack frame, that requires many instructions.
Generally yes, as bit shift is very basic operation for the processor.
On the other hand many compilers optimise code so that raising to power is in fact just a bit shifting.