Why is this uint32_t ordered this way in memory? - objective-c

I'm learning about endianness, and i read that Intel processors usually are little-endian. Im on an Intel mac and thought i'd try it for myself to see it in action. I define a uint32_t and then try to print out the 4 bytes as they are ordered in memory.
uint32_t integer = 1027;
uint8_t * bytes = (uint8_t*)&integer;
printf("%04x\n", integer);
printf("%x%x%x%x\n", bytes[0], bytes[1], bytes[2], bytes[3]);
Output:
0403
3400
I expected to see the bytes either in reverse order (3040) or unchanged, but what's output is neither. What am i missing?
Im actually compiling it as Objective-C using Xcode if that makes any difference.

Because saving data occurs in unit of bytes (8 bits) in today's typical computers.
In machines in which little endian is used, the first byte is 03, the second byte is 04, and the third and forth bytes are 00.
The first digit in the second line 3 represents the first byte and the second digit 4 represents the second byte. To show bytes with 2 digits for each bytes, specify width to print in the format like
printf("%02x%02x%02x%02x\n", bytes[0], bytes[1], bytes[2], bytes[3]);

That is endianness.
There are two different approaches for storing data in memory. Little endian and big endian.
In big endian the most significant byte is stored first.
In little endian the least significant byte is stored first.
You system is little endian as the data is stored as
03 04 00 00
On a big endian system, it would have been
00 00 04 03
For printing use %02x to get the leading zero printed.

Related

What exactly is the size of an ELF symbol (both for 64 & 32 bit) & how do you parse it

According to oracles documentation on the ELF file format a 64 bit elf symbol is 30 bytes in size (8 + 1 + 1 + 4 + 8 + 8), However when i use readelf to print out the sections headers of an elf file, & then inspect the "EntSize" (entry size) member of the symbol table section header, it reads that the symbol entries are in fact only hex 0x18 (dec 24) in size.
I have attached a picture of readelfs output next to the oracle documentation. The highlighted characters under "SYMTAB" is the "EntSize" member.
As i am about to write an ELF parser i am curious as to which i should believe? the read value of the EntSize member or the documentation?
I have also attempted to look for an answer in this ELF documentation however it doesn't seem to go into any detail of the 64 bit ELF structures.
It should be noted that the ELF file i run readelf on, in the above picture, is a 64bit executable
EICLASS, the byte just after the ELF magic number, contains the "class" of the ELF file, with the value "2" (in hex of course) meaning a 64 bit class.
When the 32 bit standard was drafted there were competing popular 64 bit architectures. The 32 bit standard was a bit vague about the 64 bit standard as it was quite possible at that time to imagine multiple competing 64 bit standards
https://www.uclibc.org/docs/elf-64-gen.pdf
should cover the 64 bit standard with better attention to the 64 bit layouts.
The way you "parse" it is to read the bytes in the order described in the struct.
typedef struct {
Elf64_Word st_name;
unsigned char st_info;
unsigned char st_other;
Elf64_Half st_shndx;
Elf64_Addr st_value;
Elf64_Xword st_size;
} Elf64_Sym;
The first 8 bytes are a st_name, the next byte is a st_info, and so on. Of course, it is critical to know where the struct "starts" within the file, and the spec above should help with that.
"64" in this case means a "64 bit entry", byte means an 8 bit entry, and so on.
the Elf64_Sym has 8+1+1+8+8+8 bytes in it, or 34 bytes.

How can a 32 bytes address represent more than 32 characters?

I have just started studying solidity and coding in general, and I tend to see things like this:
Click for image
I am confused as to how a "32 bytes hash" can include more than 32 characters (even after the "0x000"). I was under the impression that each byte can represent a character. I often see references, as well, saying things like "32 bytes address (64 bytes hex address)". But how can a 64 byte hex address be represented if it is a 32 bytes address - would you still need a byte per character? I know this is probably a stupid/noob question, and I'm probably missing something obvious, but I can't quite figure it out.
One byte is the range 00000000 - 11111111 in binary, or 0x00 - 0xFF in hex. As you can see, one byte is represented in hex as a 2 character string. Therefore, a 32 byte hex string is 64 characters long.
The 32-bit address points to the first byte of 32, 64, 1000 or 100 million sequential bytes. All other follow or are stored on address + 1, +2, +3...

Extracting GPS metadata from hex of JPG image

I am trying to extract GPS metadata from hex following this tutorial, but cannot understand why at the end the latitude and longitude have length 24 and values 42 and 73:
http://itbrigadeinc.com/post/2012/03/06/Anatomy-of-a-JPG-image.aspx
http://www.itbrigadeinc.com/post/2012/03/16/Seeing-the-EXIF-data-for-a-JPG-image.aspx
I found the tags of latitude and longitude (00 02 00 05 00 00 00 03 00 00 02 42) and (00 04 00 05 00 00 00 03 00 00 02 5A). As I understood, if count = 3, then the values of both of them should follow in the last 4 bytes of tags. but 02 42 and 02 5A are not "42" and "73"...
Who could explain me what is wrong?
Please, don't recommend any tools - I need to do it manually.
You need to also consider the size of each value. The count is three, but the size of each is larger than one byte. Therefore it won't fit in the four bytes, and those four bytes represent an offset to the value.
GPS data is usually stored as three rational numbers, where each rational number is two 32-bit integers (numerator, denominator). Therefore you have three values for latitude, but each is 8 bytes. The 24 bytes won't fit within the TIFF tag, so it is stored somewhere else in the file, and the four bytes you're seeing are a pointer to it. You need to look into the spec to find out where that pointer is relative to, as it's probably not the start of the file.
Check out my metadata extractor libraries (in Java and C#) for reference.
Apparently the 24 bit data type is a PropertyTagTypeRational
https://msdn.microsoft.com/en-us/library/ms534414(v=vs.85).aspx
Specifies that the value data member is an array of pairs of unsigned long integers. Each pair represents a fraction; the first integer is the numerator and the second integer is the denominator.
Mostly gotten from: Getting GPS data from an image's EXIF in C#
This bit of python code might have a good hint too at how you can decode the data http://eran.sandler.co.il/2011/05/20/extract-gps-latitude-and-longitude-data-from-exif-using-python-imaging-library-pil/

how hex file is converting into binary in microcontroller

I am new to embedded programming. I am using a compiler to convert source code into hex and I will burn into microcontroller. My question is: microntroller (all ICs) will support binary numbers only (0 & 1). Then how it is working with hex file?
the software that loads the program/data into the flash reads whatever format it support which may be intel hex, motorola srecord, elf, coff, or a raw binary or other. and then do the right thing to program the flash with just the relevant ones and zeros.
First of all, the PC you are using right now has a processor inside, which works just like any other microcontroller. You are using it to browse the internet, although it's all "1s and 0s on the inside". And I am presuming your actual firmware doesn't come even close to running what your PC is running at this moment.
microntroller will support binary numbers only (0 & 1)
Your idea that "microntroller only supports binary numbers (0 & 1)" is a misconception. At it's very low level, yes, microcontroller contains a bunch of transistors, and each of them can store only two states of information (a bit).
But the reason for this is simply because this is a practical way to physically store one small chunk of data.
If you check the assembly instruction manual for your uC architecture, you will see a large number of instructions operating on different data widths (bits grouped into 8, 16 or larger chunks). If your controller is, say, 16-bit, then this will the basic word size for most instructions, and the one that will be the most efficient. When programming in C, this will also be the size of the "special" int type which all smaller integral types get expanded to.
In other words, bits are just building blocks of your hardware, and most of the time shouldn't even concern you at the firmware level, let alone higher application levels. Compare it to a human life form: human body is made of cells, but is also capable of doing more than a single-cell organism, isn't it?
i am using compiler to convert source code into hex
Actually, you are using the compiler to create the machine code for your particular microcontroller architecture. "Hex", or more precisely Intel Hex file format, is just one of several file formats used for storing the machine code into a file, and it's by convenience a plain-text ASCII file which you can easily open in Notepad.
To clarify, let's say you wrote a simple line of C code like this:
a = b + c;
Your compiler needs to know which architecture you are targeting, in order to convert this to machine code. For a fictional uC architecture, this will first get compiled to the following fictional assembly language:
// compiler decides that a,b,c will be stored at addresses 0x1000, 1004, 1008
mov ax, (0x1004) // move value from address 0x1004 to accumulator
add ax, (0x1008) // add value from address 0x1008 to accumulator
mov (0x1000), ax // move value from accumulator to address 0x1000
Each of these instructions has its own instruction opcode, which can be found inside the assembly instruction manual. If the instruction operates on one or more parameters, uC will know that the bytes following the instruction are data bytes:
// mov ax, (addr) --> opcode 0x10
// add ax, (addr) --> opcode 0x20
// mov (addr), ax --> opcode 0x30
mov ax, (0x1004) // 0x10 (0x10 0x04)
add ax, (0x1008) // 0x20 (0x10 0x08)
mov (0x1000), ax // 0x30 (0x10 0x00)
Now you've got your machine-code, which, written as hex values, becomes:
10 10 04 20 10 08 30 10 00
And converted to binary becomes:
0001000000010000000010000100000...
To transfer this to your controller, you will use a file format which your flash uploader knows how to read, which is what Intel Hex is most commonly used for.
Once transferred to your microcontroller, it will be stored as a bunch of bits in its flash memory, but the controller is designed to read these bits in chunks of 8 or more bits, and evaluate them as instruction opcodes or data, depending on the context. For the example above, it will read first 8 bits, and seeing that it's an instruction opcode 0x10 (which takes an additional address parameter), it will read the next two bytes to form the address 0x1004. It will then execute the instruction and advance the instruction pointer.
Hex, Decimal, Binary, they are all just ways of representing a number.
AA in hex is the same as 170 in decimal and 10101010 in binary (and 252 or Octal).
The reason the hex representation is used is because it is very convenient when working with microcontrollers as one hex character fits into 1 nibble. Hence F is 1111, FF is 1111 1111 and so fourth.

How to obtain number of entries in ELF's symbol table?

Consider standard hello world program in C compiled using GCC without any switches. As readelf -s says, it contains 64 symbols. It says also that .symtab section is 1024 bytes long. However each symbol table entry has 18 bytes, so how it is possible it contains 64 entries? It should be 56 entries. I'm constructing my own program which reads symbol table and it does not see those "missing" entries as it reads till section end. How readelf knows how long to read?
As one can see in elf.h, symbol entry structure looks like that:
typedef struct elf32_sym {
Elf32_Word st_name;
Elf32_Addr st_value;
Elf32_Word st_size;
unsigned char st_info;
unsigned char st_other;
Elf32_Half st_shndx;
} Elf32_Sym;
Elf32_Word and Elf32_Addr are 32 bit values, `Elf32_Half' is 16 bit, chars are 8 bit. That means that size of structure is 16 not 18 bytes. Therefore 1024 bytes long section gives exactly 64 entries.
The entries are aligned to each other and padded with blanks, therefore the size mismatch. Check out this mailthread for a similar discussion.
As for your code, I suggest to check out the source for readelf, especially the function process_symbol_table() in binutils/readelf.c.
The file size of an ELF data type can differ from the size of its in-memory representation.
You can use the elf32_fsize() and elf64_fsize() functions in libelf to retrieve the file size of an ELF data type.