How does an ELF file determine the offset values of each segment? - elf

This is the command I've done:
readelf -l helloworld
And this is the output:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000002d8 0x00000000000002d8 R 0x8
INTERP 0x0000000000000318 0x0000000000400318 0x0000000000400318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000004d0 0x00000000000004d0 R 0x1000
LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000
0x00000000000001d5 0x00000000000001d5 R E 0x1000
LOAD 0x0000000000002000 0x0000000000402000 0x0000000000402000
0x0000000000000148 0x0000000000000148 R 0x1000
LOAD 0x0000000000002e10 0x0000000000403e10 0x0000000000403e10
0x0000000000000214 0x0000000000000218 RW 0x1000
DYNAMIC 0x0000000000002e20 0x0000000000403e20 0x0000000000403e20
0x00000000000001d0 0x00000000000001d0 RW 0x8
NOTE 0x0000000000000338 0x0000000000400338 0x0000000000400338
0x0000000000000020 0x0000000000000020 R 0x8
NOTE 0x0000000000000358 0x0000000000400358 0x0000000000400358
0x0000000000000044 0x0000000000000044 R 0x4
GNU_PROPERTY 0x0000000000000338 0x0000000000400338 0x0000000000400338
0x0000000000000020 0x0000000000000020 R 0x8
GNU_EH_FRAME 0x0000000000002020 0x0000000000402020 0x0000000000402020
0x000000000000003c 0x000000000000003c R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000002e10 0x0000000000403e10 0x0000000000403e10
0x00000000000001f0 0x00000000000001f0 R 0x1
My question is, where do values like 0x0000000000000318 in the INTERP offset come from? And if you can get all the offset information for every segment, how can you get those values exactly if you have all the hex in the elf as a vector?

where do values like 0x0000000000000318 in the INTERP offset come from?
From the program header table, offset to which can be found in the ELF header.
And if you can get all the offset information for every segment, how can you get those values exactly if you have all the hex in the elf as a vector?
By "hex in the elf as a vector" you probably mean "I have the entire contents of the file in memory".
The answer is: you cast the pointer to in-memory data to Elf32_Ehdr* or Elf64_Ehdr* as appropriate, and go from there.
This answer has sample code which should get you started.

Related

Why does u64::trailing_zeros() generate branched assembly when branchless works?

This function:
pub fn g(n: u64) -> u32 {
n.trailing_zeros()
}
generates assembly with a branch:
playground::g:
testq %rdi, %rdi
je .LBB0_1
bsfq %rdi, %rax
retq
.LBB0_1:
movl $64, %eax
retq
This alternative function:
pub fn g(n: u64) -> u32 {
if n == 0 { u32::MAX } else { n.trailing_zeros() }
} ^^^^^^^^
generates assembly without a branch:
playground::g:
bsfq %rdi, %rcx
xorl %eax, %eax
cmpq $1, %rdi
sbbl %eax, %eax
orl %ecx, %eax
retq
It turns out that the branch gets created only when the constant returned is 64. Returning 0, or u32::MAX, or any other number generates branchless assembly.
Why is this? Just a quirk of the optimizer or there's a reason?
I'm trying to create performant, branchless code.
Using Rust 1.65 release profile
trailing_zeros corresponds to the cttz LLVM intrinsic.
That intrinsic just so happens to compile to the following instructions on x86-64:
g: # #g
test rdi, rdi
je .LBB0_1
bsf rax, rdi
ret
.LBB0_1:
mov eax, 64
ret
The output of that intrinsic is the bit width of the integer when the input value is 0. LLVM is able to recognize the redundant operation and remove it, which is why u64::BITS or just 64 in your conditional result in the same machine code as just the intrinsic.
It appears that using any other number results in the compiler recognizing the intrinsic branch as dead code, which is therefore removed:
e: # #e
xor ecx, ecx
bsf rax, rdi
cmove eax, ecx
ret
Instead, a single conditional move is generated. I believe this variance in output is just a quirk of the LLVM x86-64 assembler when certain intrinsics are involved.
You can reproduce the same discrepancy with C using clang. godbolt
It might be worth opening an LLVM issue for this, but only if the branchless version is actually better.
this LLVM issue may be related

PCI Interrupt Not Assigned

The legacy interrupt assignment for a PCI interface is receiving interrupt 0.
We are evaluating the Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit. We have a PMC interface that is on a PCI-e carrier inserted into the PCI-e slot on the board.
When the driver is loaded the interrupt for the board is assigned interrupt 0 from the OS (Linux 16.0.4). Interrupt 0 is clearly not correct.
The device tree for PCI should be assigning interrupts. We do see the misc interrupt assigned, but the intx interrupt is not being reported, or rather is returning 0 from the OS.
How can we determine why the interrupt is not being reported? What changes can we make to determine where the problem lies?
Here is the device tree entry for pcie --
ZynqMP> fdt print /amba/pcie
pcie#fd0e0000 {
compatible = "xlnx,nwl-pcie-2.11";
status = "okay";
#address-cells = <0x00000003>;
#size-cells = <0x00000002>;
#interrupt-cells = <0x00000001>;
msi-controller;
device_type = "pci";
interrupt-parent = <0x00000004>;
interrupts = <0x00000000 0x00000076 0x00000004 0x00000000 0x00000075 0x00000004 0x00000000 0x00000074 0x00000004 0x00000000 0x00000073 0x00000004 0x00000000 0x00000072 0x00000004>;
interrupt-names = "misc", "dummy", "intx", "msi1", "msi0";
msi-parent = <0x00000023>;
reg = <0x00000000 0xfd0e0000 0x00000000 0x00001000 0x00000000 0xfd480000 0x00000000 0x00001000 0x00000080 0x00000000 0x00000000 0x01000000>;
reg-names = "breg", "pcireg", "cfg";
ranges = <0x02000000 0x00000000 0xe0000000 0x00000000 0xe0000000 0x00000000 0x10000000 0x43000000 0x00000006 0x00000000 0x00000006 0x00000000 0x00000002 0x00000000>;
interrupt-map-mask = <0x00000000 0x00000000 0x00000000 0x00000007>;
bus-range = <0x00000000 0x000000ff>;
interrupt-map = * 0x000000007ff8495c [0x00000060];
power-domains = <0x00000025>;
clocks = <0x00000003 0x00000017>;
xlnx,pcie-mode = "Root Port";
linux,phandle = <0x00000023>;
phandle = <0x00000023>;
legacy-interrupt-controller {
interrupt-controller;
#address-cells = <0x00000000>;
#interrupt-cells = <0x00000001>;
linux,phandle = <0x00000024>;
phandle = <0x00000024>;
};
};

Why do these two variables sync up in NASM

I am a beginner in NASM and I have encountered something I can not understand. Given this code:
global main
extern printf
section .text
main:
mov qword [VAR_0], 1 ; Init first variable
mov qword [VAR_1], 2 ; Init second variable
mov rdi, format ; Print first variable -> outputs 2
mov rsi, [VAR_0]
mov eax, 0
call printf
mov rdi, format ; Print second variable -> outputs 2
mov rsi, [VAR_1]
mov eax, 0
call printf
section .bss
VAR_0: resq 0
VAR_1: resq 0
section .data
format db "%d", 10, 0
Why does the program output
2
2
Instead of
1
2
I am compiling it with
nasm -felf64 test.s
gcc test.o
And simply running it as
./a.out
I am at the end of my wits with this.
The problem is that you are misusing the resq directive. The proper use is:
IDENTIFIER: resq number_quad_words_to_reserve
In your case you have:
VAR0: resq 0
This reserves a total of zero quad words. Modifying each of these to:
VAR0: resq 1
VAR1: resq 1
will correct the behavior that you are observing.

Assembly variables

I am new to assembly and am confused how some variables magically obtain values from nowhere, like in this code I have (program shifts by one ASCII code all entered symbols)
.model small
.stack 100h
.data
Enterr db 10, 13, "$"
buffer db 255
number db ?
symb db 255 dup (?)
.code
START:
MOV ax, #data
MOV ds, ax
MOV ah, 10
MOV dx, offset buffer
INT 21h
MOV ah, 9
MOV dx, offset ENTERR
INT 21h
MOV bx, offset symb
MOV cl, number
MOV ch, 0
CMP cx, 0
JE terminate
cycle:
INC byte ptr [bx]
INC bx
LOOP cycle
MOV byte ptr [bx], '$'
MOV ah, 9
MOV dx, offset symb
INT 21h
terminate:
MOV ah, 4Ch
MOV al, 0
INT 21h
END START
Just before the loop, cx has the number of symbols entered, and cycle begins to take pace from there on. This value of cx was obtained when variable "number" is copied to cl. How did variable "number" obtained such a value? Replacing
MOV cl, number
with
MOV cl, [number]
Does not effect the program. Why is that? Does every variable defined by
variable db ?
has the same value, i.e. number of symbols entered?(I am using TASM)

Assembly $ - operator [duplicate]

This question already has an answer here:
What is $ in nasm assembly language? [duplicate]
(1 answer)
Closed 9 years ago.
I came across this following code:
SYS_EXIT equ 1
SYS_WRITE equ 4
STDIN equ 0
STDOUT equ 1
section .text
global _start ;must be declared for using gcc
_start: ;tell linker entry point
mov eax, SYS_WRITE
mov ebx, STDOUT
mov ecx, msg1
mov edx, len1
int 0x80
mov eax, SYS_WRITE
mov ebx, STDOUT
mov ecx, msg2
mov edx, len2
int 0x80
mov eax, SYS_WRITE
mov ebx, STDOUT
mov ecx, msg3
mov edx, len3
int 0x80
mov eax,SYS_EXIT ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg1 db 'Hello, programmers!',0xA,0xD
len1 equ $ - msg1
msg2 db 'Welcome to the world of,', 0xA,0xD
len2 equ $ - msg2
msg3 db 'Linux assembly programming! '
len3 equ $- msg3
with intuition i can make out that len1, len2 and len3 are variables holding the lengths of the three strings and that the $ - operator is fetching the length of it..
but i am not able to understand properly how the syntax to find the length works.. can anyone, please tell me how it does and give me links for further reading, to understand this concept..
Thanks in advance...
$ evaluates to the "current address", so $ - msg1 means "the current address minus the address with the label msg1". This calculates the length of the string that starts at msg1.
Your snippet looks like it might be NASM. Is it? Anyway, NASM has documentation of its special tokens $ and $$.