Some questions about ELF file format - elf

I am trying to learn how ELF files are structured and probably how to make one manually.
I am working on aarch64 Linux OS, the ELF files I am inspecting are of elf64-littleaarch64 format.
Also I try to learn by myself, however I got stuck with some questions...
When I do xxd code, the first number in each line of the output specifies the address of bytes in the file. But when objdump -D code, the first number is something like 4000b0, however corresponds to 000000b0 in xxd. Why is there a four at the beginning?
In objdump, the bytecode is for example 11000a94, which 'means'
add w20, w20, #2 in assembly. I know, that 11 is the opcode, but what does 000a94 mean? I thought, it should be the parameters, but I am adding the value 2 and can't find the number 2 in it.
If you have a good article to read, or can help me explain this, I will be very grateful!

xxd shows the offset of the bytes within the file on disk. objdump -D shows (tentatively) the address in memory where those bytes will be loaded when the program is run. It is common for them to differ by a round number. In particular, 0x400000 may correspond to one higher-level page table entry; see Why Linux/gnu linker chose address 0x400000? which is for x86-64 but I think ARM64 is similar (haven't checked). It doesn't have anything to do with the fact that 0x40 is ASCII #; that's just a coincidence.
Note that if ASLR is in use, the actual memory address will be randomly chosen every time the program is run, and will not match what objdump shows you, though the difference will still be a multiple of the page size.

Well, I was too fast asking this question, but now, I will answer it too.
40 at the beginning of the addresses in objdump is the hex representation of the char "#", which means "at" and points to an address, very simple!
Little Endian has CPU addresses stored in 5 bits instead of 6 or 8. That means, that I should look for the binary value of the objdump code: 11000a94 --> 10001000000000000101010010100, where it can be divided into [10001][00000000000010][10100][10100] with [opcode][value][first address][second address]
Both answers are wrong, see the accepted answer.
I will still let them here, though

Related

Raku operator for 2's complement arithmetic?

I sometimes use this:
$ perl -e "printf \"%d\", ((~18446744073709551592)+1)"
24
I can't seem to do it with Raku. The best I could get is:
$ raku -e "say +^18446744073709551592"
-18446744073709551593
So: how can I make Raku give me the same answer as Perl ?
Gotta go with (my variant¹ of) Liz's custom op (in her comment below).
sub prefix:<²^>(uint $a) { (+^ $a) + 1 }
say ²^ 18446744073709551592; # 24
My original "semi-educated wild guess"² that turned out to be acceptable to #zentrunix and the basis for Liz's op:
say (+^ my uint $ = 18446744073709551592) + 1; # 24
\o/ It works!³
Footnotes
¹ I flipped the two character op because I wanted to follow the +^ form, have it sub-vocalize as "two's complement", and avoid it looking like ^2.
² One line of thinking was about the particular integer. I saw that 18446744073709551592 is close to 2**64. Another was that integers are limited precision in Perl unless you do something to make them otherwise, whereas in Raku they are arbitrary precision unless you do something to make them otherwise. A third line of thinking came from reading the doc for prefix +^ which says "converts the number to binary using as many bytes as needed" which I interpreted as meaning that the representation is somehow important. Hmm. What if I try an int variable? Overflow. (Of course.) uint? Bingo.
³ I've no idea if this solution is right for the wrong reasons. Or even worse. One thing that's concerning is that uint in Raku is defined to correspond to the largest native unsigned integer size supported by the Raku compiler used to compile the Raku code. (Iirc.) In practice today this means Rakudo and whatever underlying platform is being targeted, and I think that almost certainly means C's uint64_t in almost all cases. I imagine perl has some similar platform dependent definition. So my solution, if it is a reasonable one, is presumably only portable to the degree that the Raku compiler (which in practice today means Rakudo) agrees with the perl binary (which in practice today means P5P's perl) when run on some platform. See also #p6steve's comment below.
'Long-hand' answer:
raku -e 'put ( (18446744073709551592.base(2) - 0b1).comb.map({!$_.Int+0}).join.parse-base(2));'
OR
raku -e 'say 18446744073709551592.base(2).comb.map({!$_.Int+0}).join.parse-base(2) + 1;'
Sample Output: 24
The answers above (should?) implement "Two's-Complement" encoding directly. Neither uses Raku's +^ twos-complement operator. The first one subtracts one from the binary representation, then inverts. The second one inverts first, then adds one. Neither answer feels truly correct, yet the same answer as Perl5 is obtained (24).
Looking at the Raku Docs page, one would conclude that the "twos-complement" of a positive number would be negative, hence it's not clear what the Perl (and now Raku) answers represent. Hopefully the foregoing is somewhat useful.
https://docs.raku.org/routine/+$CIRCUMFLEX_ACCENT

Trying to replicate a CRC made with ielftool in srec_cat

So I'm trying to figure out a way to calculate a CRC with srec_cat before putting the code on a microcontroller. Right now, my post-build script uses the ielftool from IAR to do the calculation and insert it into the correct spot in the hex file.
I'm wondering how I can produce the same CRC with srec_cat, using the same hex file of course.
Here is the ielftool command that produces the CRC32 that I want to replicate:
--checksum APP_SYS_ApplicationCrc:4,crc32:1mi,0xffffffff;0x08060000-0x081fffff
APP_SYS_ApplactionCrc is the symbol where the checksum will be stored with a 4 byte offset added
crc32is the algorithm
1 specifies one’s complement
m reverses the input bytes and the final checksum
i initializes the checksum value with the start value
0xffffffff is the start value
And finally, 0x08060000-0x081fffff is the memory range for which the checksum will be calculated
I've tried a lot of things, but this, I think, is the closest I've gotten to the same command so far with srec_cat:
-crop 0x08060000 0x081ffffc -Bit_Reverse -crc32_b_e 0x081ffffc -CCITT -Bit_Reverse
-crop 0x08060000 0x081ffffc In a way specifies the memory range for which the CRC will be calculated
-Bit_Reverse should do the same thing as m in the ielftool when put in the right spot
-crc32_b_e is the algorithm. (I'm not sure yet if I need big endian _b_e or little endian _l_e)
0x081ffffc is the location in memory to place the CRC
-CCITT The initial seed (start value in ielftool) is all one bits (it's the default, but I figured I'd throw it in there)
Does anyone have ideas of how I can replicate the ielftool's CRC? Or am I just trying in vain?
I'm new to CRCs and don't know much more than the basics. Does it even matter anyway if I have exactly the same algorithm? Won't the CRC still work when I put the code on a board?
Note: I'm currently using ielftool 10.8.3.1326 and srec_cat 1.63
After many days of trying to figure out how to get the CRCs from each tool to match (and to make sure I was giving both tools the same data), I finally found a solution.
Based on Mark Adler's comment above I was trying to figure out how to get the CRC of a small amount of data such as an unsigned int. I finally had a lightbulb moment this morning and I realized that I simply needed to put a uint32_t with the value 123456789 in the code for the project I was already work on. Then I would place the variable at a specific location in memory using:
#pragma location=0x08060188
__root const uint32_t CRC_Data_Test = 123456789; //IAR specific pragma and keyword
This way I knew the variable location and length so could then tell the ielftool and srec_cat to only calculate the CRC over the area of that variable in memory.
I then took the elf file from the compiled project and created an intel hex file, so I could more easily look and make sure the correct variable data was at the correct address.
Next I sent the elf file through ielftool with this command:
ielftool proj.elf --checksum APP_SYS_ApplicationCrc:4,crc32:1mi,0xffffffff;0x08060188-0x0806018b proj.elf
And I sent the hex file through srec_cat with this command:
srec_cat proj.hex -intel -crop 0x08060188 0x0806018c -crc32_b_e 0x081ffffc -o proj_srec.hex -intel
After converting the elf with the CRC to a hex file and comparing two hex files I saw that the CRCs were very similar. The only difference was the endianness. Changing -crc32_b_e to -crc32_l_e got both tools to give me 9E 6C DF 18 as the CRC.
I then changed the memory address ranges for the CRC calculation to what they originally were (see the question) and I once again got the same CRC with both ielftool and srec_cat.

Failure to read full line including embedded zero bytes

Lua script:
i=io.read()
print(i)
Command line:
echo -e "sala\x00m" | lua ll.lua
Output:
sala
I want it to print all character from input, similar to this:
salam
in HEX editor:
0000000: 7361 6c61 006d 0a sala.m.
How can I print all character from input?
You tripped over one of the few places where the Lua standard library is still not 8-bit-clean.
Specifically, file reading line-by-line is not embedded-0 proof.
The reason it isn't yet is an unfortunate combination of:
Only standard C90 or equally portable constructs are allowed for the core, which does not provide for efficient 0-clean text parsing.
Every solution discussed to date on the mailinglist under that constraint has considerable overhead.
Embedded 0-bytes in text files are quite rare.
Workarounds:
Use a modified library, fixing these formats: "*l" "*L" for file:read(...)
parse your raw data yourself. (read a block using a number or as much as possible using "*a")
Badger the Lua developers/maintainers for a bugfix until they give in.

How to write a custom assembly compiler (sort of) in VB.NET

I've been trying to write a simple script compiler for a custom language used by the Game Boy Advance's Z80 processor.
All I want it to do is look at a human-readable command, take it and its arguments and convert it into a hexadecimal value into a ROM file. That's it. Each command is a byte, and each may take a different number of arguments - arguments can be either 8, 16, or 32 bits and each command has a specific number of arguments that it takes.
All of this sort of code is handled by the game and converted into workable machine code within the game's memory, so I'm not writing a full-on assembly compiler if you will. The game automatically knows how many args a command has, what each command does, exactly how to execute it as it is, etc.
For instance, you have command 0x4E, which takes in one 8-bit argument and another 32-bit argument. In hex that would obviously be 4E XX YY YY YY YY. I want my compiler to read it from text as foo 0xXX 0xYYYYYYYY and directly write it into a file as the former.
My question is, how would I do that in VB.NET? I know it's probably a very simple answer, but I see a lot of different options to write it to a file--some work and most don't for me. Could you give me some sample code as to how I would do this?
Writing an assembly compiler as I understand it is not so simple. I recomed you to use one already written see: Software Development Tools for Z80 Family
If you are still interested in writing it here are instructions:
Write the text you want to translate to some file (or memory stream)
Read it line by line
Parse the line either splitting it to an array or with regular
expressions
Identify command and arguments (as far as I remember it some commands
does not have arguments)
Translate the command to Hex (with a collection or dictionary of
commands)
Write results to an array remembering the references for jump
addresses
When everything is translated resolve addresses and write them to
right places.
I think that the most tricky part is to deal with symbolic addressees.
If you are still interested write a first piece of code (or ask how to do it) and continue with next ones.
This sounds like an assembler, even if it for a 'custom language'.
Start by parsing the command lines. use string.split method to convert the string to an array of strings. the first element in the array is your foo, you can then look that up and output 4E, then convert the subsequent elements to bytes.

reading unformatted data, intel ifort vs Ibm xlf

I'm trying to shift from intel ifort to IBM xlf, but when reading "unformatted output data"(unformatted I mean they are not the same length), there is problem. Here is an example:
program main
implicit none
real(8) a,b
open(unit=10,file='1.txt')
read (10,*) a
read (10,*) b
write(*,'(E20.14E2)'),a,b
close(10)
end program
1.txt:
0.10640229631236
8.5122792850319D-02
using ifort I get output:
0.10640229631236E+00
0.85122792850319E-01
using xlf I get output:
' in the input file. The program will recover by assuming a zero in its place.e invalid digit '
0.10640229631236E+00
0.85122792850319E-01
Since the data in the 1.txt is unformatted, I can't use a fixed format to read the data. Dose anyone know how to solve this warning?
(Question answered in the comments. See Question with no answers, but issue solved in the comments (or extended in chat) )
#M.S.B wrote:
Is there an apostrophe in the input file? Or any character besides digits, decimal point and "D"? Your reads are "list directed".
The OP Wrote:
Yes it seems to have some character after 0.10640229631236 that costs this warning. When I write those numbers to a new file by hand(change line after 0.10640229631236 by the enter key), this warning goes away. I cat -v these two files: With the warning file I get 0.10640229631236^M 8.5122792850319D-02 while the no warning files I get 0.10640229631236 8.5122792850319D-02 Do you know what that M stands for and where it comes from?
#agentp gave the link:
'^M' character at end of lines
Which explains that ^M is the windows character for carriage return