I have a problem that missing symbols when link static libraries and .o files to a shared libray. I have checked the symbol table of static libray, the functions i needed list in the table normally, like this:
...
00000000 g F .text 000000b0 av_int2dbl
...
000000b0 g F .text 00000060 av_int2flt
but when i generate shared library, av_int2dbl and av_int2flt and some else functions
missed(they all list in the static symtable normally), I used a stupid method to resolve this problem, by making a dummy function in .o file, and reference to functions missed form the dummy function, the DYNAMIC SYMBOL TABLE of shared library add some functions that missed before, but strange thing is av_int2dbl and av_int2flt missed as before.
Could anybody tell me, what's the principle to remove symbols when generate shared library?
If ld will remove all unreferfenced symbol, why functions defined in .o files (these funcs are not be referenced from other location) existed in shared library still? Why av_int2dbl and av_int2flt are invoked explicitly in dummy func, while disassembly loss the these two funcs?
Below is dummy function defined in .o file:
int my_dummy_funcs(void)
{
av_rdft_init(0x01,0x1);
av_rdft_calc(NULL, NULL);
av_rdft_end(NULL);
av_int2dbl(1);
av_int2flt(1);
av_resample(NULL,NULL,NULL,NULL,0,0,0);
av_resample_close(NULL);
av_resample_init(0,0,0,0,0,1.0);
return 0;
}
disassemble the dummy function as follow:
0008951c <my_dummy_funcs>:
8951c: e3a00001 mov r0, #1
89520: e92d40d0 push {r4, r6, r7, lr}
89524: e1a01000 mov r1, r0
89528: e3a04000 mov r4, #0
8952c: e24dd010 sub sp, sp, #16
89530: eb03a21e bl 171db0 <av_rdft_init>
89534: e1a01004 mov r1, r4
89538: e1a00004 mov r0, r4
8953c: e3a06000 mov r6, #0
89540: eb03a22f bl 171e04 <av_rdft_calc>
89544: e1a00004 mov r0, r4
89548: eb03a231 bl 171e14 <av_rdft_end>
8954c: e1a01004 mov r1, r4
89550: e1a02004 mov r2, r4
89554: e1a03004 mov r3, r4
89558: e1a00004 mov r0, r4
8955c: e58d4000 str r4, [sp]
89560: e58d4004 str r4, [sp, #4]
89564: e3a07000 mov r7, #0
89568: e58d4008 str r4, [sp, #8]
8956c: eb0e4a62 bl 41befc <av_resample>
89570: e1a00004 mov r0, r4
89574: e3437ff0 movt r7, #16368 ; 0x3ff0
89578: eb0e4a4a bl 41bea8 <av_resample_close>
8957c: e1a00004 mov r0, r4
89580: e1a01004 mov r1, r4
89584: e1a02004 mov r2, r4
89588: e1a03004 mov r3, r4
8958c: e58d4000 str r4, [sp]
89590: e1cd60f8 strd r6, [sp, #8]
89594: eb0e495f bl 41bb18 <av_resample_init>
89598: e1a00004 mov r0, r4
8959c: e28dd010 add sp, sp, #16
895a0: e8bd80d0 pop {r4, r6, r7, pc}
but when i generate shared library, av_int2dbl and av_int2flt and some else functions missed
The most likely reason: they are marked as HIDDEN in the regular symbol table, and that tells the linker to not export them in the dynamic symbol table.
You can verify this hypothesis by running
readelf -s libfoo.a | grep av_int2dbl
(and learn to use readelf instead of objdump on ELF platforms).
Related
I have written a code for Pico Pi, and basically the program is about one LED and two buttons where one button turns on the LED and one turns it off. I am pretty new to raspberry and so I don't know much, I am using a virtual machine for cmake and make, but unfortunately, I can't turn my code into uf2, because I have not defined my link_gpio_get function in the sdlink.c file, which I don't know how to do so cmake is failing due to an undefined reference...
.EQU LED_PIN1, 0
.EQU BUT_PIN1, 1
.EQU BUT_PIN2, 2
.EQU GPIO_IN, 0
.EQU GPIO_OUT, 1
.thumb_func
.global main
main:
MOV R0, #LED_PIN1
BL gpio_init
MOV R0, #LED_PIN1
MOV R1, #GPIO_OUT
BL link_gpio_set_dir # Initialize PIN1
MOV R0, #BUT_PIN1
BL gpio_init
MOV R0, #BUT_PIN1
MOV R1, #GPIO_IN
BL link_gpio_set_dir
MOV R0, #BUT_PIN2
BL gpio_init
MOV R0, #BUT_PIN2
MOV R1, #GPIO_IN
BL link_gpio_set_dir
wait_on:
MOV R0, #BUT_PIN1 # Wait for turn on button
BL link_gpio_get
CMP R0, #1
BEQ turn_on
B wait_on
turn_on:
MOV R0, #LED_PIN1
MOV R1, #1
BL link_gpio_put # Turn on led
B wait_off
turn_off:
MOV R0, #LED_PIN1
MOV R1, #0
BL link_gpio_put # Turn off led
B wait_on
wait_off:
MOV R0, #BUT_PIN2 # Wait for off
BL link_gpio_get
CMP R0, #1
BEQ turn_off
B wait_off
Here is my sdlink.c file
/* C wrapper functions for the RP2040 SDK
* Incline functions gpio_set_dir and gpio_put.
*/
#include "hardware/gpio.h"
void link_gpio_set_dir(int pin, int dir)
{
gpio_set_dir(pin, dir);
}
void link_gpio_put(int pin, int value)
{
gpio_put(pin, value);
}
I've been working to output uf2, using cmake on Windows 10 and I after watching a Youtube video, reviewing hackster and making my own edits I was able to get it working.
I'm not sure what OS you are using but hopefully these links and my edit can help guide you to identify the issue with your project.
https://www.youtube.com/watch?v=mUF9xjDtFfY
https://www.hackster.io/lawrence-wiznet-io/how-to-setup-raspberry-pi-pico-c-c-sdk-in-window10-f2b816
The following is my edit that allowed me to build and output I hope it helps!
After you've cloned the pico-examples project, navigate to the pico-examples directory. I opened pico_sdk_import.cmake in a text editor and I changed line 6 from if (DEFINED ENV{PICO_SDK_PATH} AND (NOT PICO_SDK_PATH)) to if (DEFINED ENV{PICO_SDK_PATH})
If you can provide a link to where you obtained the code you posted, maybe I can help further figure our what sdlink.c should contain.
The relationship between the variable size and the data bus size was confusing for me so I decided to get to the bottom of it by examining the assembly code.
I compiled the source code below in the STM32CubeIDE Version 1.2.0.
#define BUFFER_SIZE ((uint8_t)0x20)
uint8_t aTxBuffer[BUFFER_SIZE];
int i;
for(i=0; i<BUFFER_SIZE; i++){
aTxBuffer[i]=0xFF; /* TxBuffer init */
}
Looking at the assembly code confirmed my suspicion. Unless I misunderstood it grossly, this code will allocate an array with total size of BUFFER_SIZE * DATA_BUS_SIZE (Which is 32 bits on Cortex-M) but we will use only the least significant byte of each memory address.
for(i=0; i<BUFFER_SIZE; i++)
//reset i to 0
800051c: 4b09 ldr r3, [pc, #36] ; (8000544 <main+0x3c>)
800051e: 2200 movs r2, #0
8000520: 601a str r2, [r3, #0]
8000522: e009 b.n 8000538 <main+0x30>
{
//store 0xFF in each member of TxBuffer
aTxBuffer[i]=0xFF; /* TxBuffer init */
8000524: 4b07 ldr r3, [pc, #28] ; (8000544 <main+0x3c>)
8000526: 681b ldr r3, [r3, #0]
8000528: 4a07 ldr r2, [pc, #28] ; (8000548 <main+0x40>)
800052a: 21ff movs r1, #255 ; 0xff
800052c: 54d1 strb r1, [r2, r3]
for(i=0; i<BUFFER_SIZE; i++)
//increment i
800052e: 4b05 ldr r3, [pc, #20] ; (8000544 <main+0x3c>)
8000530: 681b ldr r3, [r3, #0]
8000532: 3301 adds r3, #1
8000534: 4a03 ldr r2, [pc, #12] ; (8000544 <main+0x3c>)
8000536: 6013 str r3, [r2, #0]
//compare if i is less than 31. then jump to 8000524
8000538: 4b02 ldr r3, [pc, #8] ; (8000544 <main+0x3c>)
800053a: 681b ldr r3, [r3, #0]
800053c: 2b1f cmp r3, #31
800053e: d9f1 bls.n 8000524 <main+0x1c>
//pointer to i in SRAM
8000544: 2000002c .word 0x2000002c
//pointer to TxBuffer in SRAM
8000548: 20000064 .word 0x20000064
As the SRAM is at premium in embedded devices I believe there must be some clever ways to optimize usage. One naive solution that I can think of is to allocate the buffer as uint32_t and do bit shifting to access higher bytes but this seems like costly from speed optimization perspective. What is the recommended practice here?
Bus size does not matter in this case. Memory usage will be the the same.
Some Cortex cores do not allow not aligned access. What is unaligned access? Unaligned memory accesses occur when you try to access (as single operation) N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0). In our case N can be 1, 2 and 4.
your example should be analyzed with optimizations turned on.
#define BUFFER_SIZE ((uint8_t)0x20)
uint8_t aTxBuffer[BUFFER_SIZE];
void init(uint8_t x)
{
for(int i=0; i<BUFFER_SIZE; i++)
{
aTxBuffer[i]=x;
}
}
The STM32F0 which does not allow unaligned access will have to store the data byte by byte
init:
ldr r3, .L5
movs r2, r3
adds r2, r2, #32
.L2:
strb r0, [r3]
adds r3, r3, #1
cmp r3, r2
bne .L2
bx lr
.L5:
.word aTxBuffer
but stm32F4 will faster (in less operations) store the full words 32birs - 4 bytes.
init:
movs r3, #0
bfi r3, r0, #0, #8
bfi r3, r0, #8, #8
ldr r2, .L3
bfi r3, r0, #16, #8
bfi r3, r0, #24, #8
str r3, [r2] # unaligned
str r3, [r2, #4] # unaligned
str r3, [r2, #8] # unaligned
str r3, [r2, #12] # unaligned
str r3, [r2, #16] # unaligned
str r3, [r2, #20] # unaligned
str r3, [r2, #24] # unaligned
str r3, [r2, #28] # unaligned
bx lr
.L3:
.word aTxBuffer
the SRAM consumption is exactly the same in both cases
The given code does not utilize more BUFFER_SIZE*8 bits for aTxBuffer.
Note the following line in your assembly
800052c: 54d1 strb r1, [r2, r3]
Note the b suffix to the instruction here, indicating 'byte'.
In effect, the instruction translates to 'store 1 byte of value 0xFF (stored in r1) at aTxBuffer (stored in r2) + i (stored in r3)'.
So, while the assembly doesn't indicate the end of the buffer, it certainly accesses all bytes in the aTxBuffer array without any waste.
It's possible that your minimal example doesn't capture the problem you face in your actual code but I find it unlikely that the compiler will have such wasted bytes, especially one for an embedded device.
In case you do find that to be the case, you can simply allocate a uint32 array of the same size in bits (or one element higher) and cast the address of the first element to a uint8_t pointer to a uint8_t variable. Now you can access the uint8_t variable as normal.
Note that such programming should be avoided and is only shown as an example. Specifically, this makes it difficult for compilers to analyze pointer aliasing which makes some optimizations difficult. It also creates some burden on the user; careful memory management will be required to avoid mistakes (for example, you should free only one of these pointers to avoid a double-free error).
Example:
#define BUFFSIZE 0x20
// number of elements in int32 will be BUFFSIZE / 4
#define BUFFSIZE_IN_INT_32 (BUFFSIZE >> 2)
// allocate the buffer
uint32_t uint32_array[BUFFSIZE_IN_INT_32];
// point to 1 byte sized elements
uint8_t * aTxBuffer = (uint8_t *)(uint32_array)
// use aTxBuffer as you like
Note here that I assume BUFFSIZE to be divisible by 4. If that is not the case, add BUFFSIZE_IN_INT_32 by 1 more.
I'm trying to have a bunch of operation executed on different targets such as ARM,Bfin... but every time I write a simple code in C and then compile it for each operation it has like 2 loads and one store which is unnecessary for every operation.
ldr r2, [fp, #-24]
ldr r3, [fp, #-28]
add r3, r2, r3
str r3, [fp, #-20]
ldr r2, [fp, #-36]
ldr r3, [fp, #-40]
add r3, r2, r3
str r3, [fp, #-32]
ldr r2, [fp, #-44]
ldr r3, [fp, #-48]
add r3, r2, r3
str r3, [fp, #-20]
ldr r3, [fp, #-16]
add r3, r3, #1
str r3, [fp, #-16]
When I turn on any optimization options, even -O1, it simply calculates the result and stores it in the output:
subl $24, %esp
movl $4, 4(%esp)
movl $.LC0, (%esp)
Is there anyway,I can have operations without fetching the same variable over and over again? I've tried gcc -fgcse-lm and -fgcse-sm but that didn't work.
It depends on the operation. Gcc can't figure out a high level optimizations for
int a(int b, int c)
{
b-=c;
c-=b;
b-=c;
c-=b;
b-=c;
c-=b;
return c;
}
If you want to do benchmarking and avoid constant folding and dead code elimination of the optimizer in gcc, you need to use non-constants as input and make sure the result goes somewhere.
For instance, instead of using
int main(int argc, char** argv) {
int a = 1;
int b = 2;
start_clock();
int c = a + b;
int d = c + a;
int e = d + b;
stop_clock();
output_time_needed();
return 0;
}
You should use something like
int main(int argc, char** argv) {
int a = argc;
int b = argc + 1;
start_clock();
int c = a + b;
int d = c + a;
int e = d + b;
stop_clock();
output_time_needed();
return e;
}
I am writting an OS and trying to use the PIT. I have a handler written and wrote an ISR entry for the IRQ0 (Interrupt 32). The handler is not being called at all. I am pretty sure I am not putting the ISR entry in right. Any suggestions? Here is my ASM code
mov dword EAX, irq_common_stub
mov byte [_NATIVE_IDT_Contents + 0x100], AL
mov byte [_NATIVE_IDT_Contents + 0x101], AH
mov byte [_NATIVE_IDT_Contents + 0x102], 0x8
mov byte [_NATIVE_IDT_Contents + 0x105], 0x8E
shr dword EAX, 0x10
mov byte [_NATIVE_IDT_Contents + 0x106], AL
mov byte [_NATIVE_IDT_Contents + 0x107], AH
My code to init the PIT is
public static void PIT_Init(uint frequency)
{
uint divisor = 1193180 / frequency;
GruntyOS.IO.Ports.Outb(0x43, 0x36);
byte l = (byte)(divisor & 0xFF);
byte h = (byte)((divisor >> 8) & 0xFF);
GruntyOS.IO.Ports.Outb(0x40, l);
GruntyOS.IO.Ports.Outb(0x40, h);
}
The handler is
public static void HandlePIT()
{
GruntyOS.IO.Ports.Outb(0xA0, 0x20);
GruntyOS.IO.Ports.Outb(0x20, 0x20);
print("Tick: " + Tick.ToString());
Tick++;
}
Which is called from
irq_common_stub:
pusha
mov ax, ds
push eax
mov ax, 0x10
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
call System_Void__GruntyOS_Entry_HandlePIT__
pop ebx
mov ds, bx
mov es, bx
mov fs, bx
mov gs, bx
popa
add esp, 8
sti
iret
Maybe this might help. Its a simple kernel
that is capable of handling IRQs and Exceptions.
http://www.osdever.net/bkerndev/Docs/irqs.htm
http://www.ni.com/white-paper/2874/en
I'm new to programming in any sort of assembly, and since I've heard that NASM-type assembly for Linux is comparatively simple to DOS based assembly, I decided to give it a try.
This is my program thus far:
section .data
opening: db 'Opening file...',10
openingLen: equ $-opening
opened: db 'File opened.',10
openedLen: equ $-opened
bad_params: db 'Usage: writeFile filename.ext',10
bad_paramsLen: equ $-bad_params
not_opened: db 'Unable to open file. Halted.',10
not_openedLen: equ $-not_opened
hello: db 'Hello, this is written to a file'
helloLen: equ $-hello
success: db 'Successfully wrote to file.',10
successLen: equ $-success
section .bss
file: resd 1
section .text
global _start:
_start:
pop ebx ; pop number of params
test ebx,2 ; make sure there are only 2
jne bad_param_list
pop ebx
mov eax,4 ; write out opening file msg
mov ebx,1
mov ecx,opening
mov edx,openingLen
int 80h
mov eax,5 ; open file
pop ebx
mov ecx,64
mov edx,777o ; permissions of file
int 80h
mov dword [file],eax
test dword [file],0
jle bad_open
mov eax,4 ; write successful open message
mov ebx,1
mov ecx,opened
mov edx,openedLen
int 80h
mov ebx,file ; write to file (4 already in eax)
mov ecx,hello
mov edx,helloLen
int 80h
mov eax,6 ; close file
mov ebx,file
int 80h
mov eax,4 ; write successfully written msg
mov ebx,1
mov ecx,success
mov edx,successLen
int 80h
mov eax,1 ; exit
mov ebx,0
int 80h
bad_param_list:
mov eax,4 ; write that params are bad
mov ebx,1
mov ecx,bad_params
mov edx,bad_paramsLen
int 80h
mov eax,1 ; exit with code 1
mov ebx,1
int 80h
bad_open:
mov eax,4 ; write that we couldn't open the file
mov ebx,1
mov ecx,not_opened
mov edx,not_openedLen
int 80h
mov eax,1 ; exit with code 2
mov ebx,2
int 80h
The goal is to write a string of text to a file without library functions; I'm only using the Linux kernel. I had a few problems with missing brackets here and there, and all the rest of mistakes that you'd expect from a noob to assembly, but I think this is mostly under control now.
Here's my issue: From what I know, the first four lines of this program should pop the number of arguments off the stack, jump to bad_param_list if there is not only one parameter (aside from the program name), and pop the program name off the stack.
But this is not what happens. Here's some sample I/O, reformatted for clarity:
$./writeFile
Opening file...
Unable to open file. Halted.
$./writeFile x
Usage: writeFile filename.ext
$./writeFile x x
Usage: writeFile filename.ext
$./writeFile x x x
Opening file...
Unable to open file. Halted.
$./writeFile x x x x
Opening file...
Unable to open file. Halted.
$./writeFile x x x x x
Usage: writeFile filename.ext
$./writeFile x x x x x x
Usage: writeFile filename.ext
What I've noticed is that if you take the number of arguments including the name of the program, divide by 2, and discard the decimal, if the answer is odd, you'll get my usage error, but if the answer is even, you'll get the unable to open error. This is true up until at least 10 arguments!
How the heck did I manage to do this? And how do I get it to have the expected result?
Instead of
test ebx,2
you want
cmp ebx,2
test performs a bitwise AND between the arguments and throws the result away, except for setting the flags. So, in particular ZF will be set if the two arguments have no 1-bits in positions that match. (In your particular case, this works out as setting ZF to the complement of the second-to-lowest bit of ebx).
Conversely cmp subtracts its arguments and throws away the result after setting flags. In that case, ZF will be set if the two arguments are equal.