How can a 32 bytes address represent more than 32 characters? - solidity

I have just started studying solidity and coding in general, and I tend to see things like this:
Click for image
I am confused as to how a "32 bytes hash" can include more than 32 characters (even after the "0x000"). I was under the impression that each byte can represent a character. I often see references, as well, saying things like "32 bytes address (64 bytes hex address)". But how can a 64 byte hex address be represented if it is a 32 bytes address - would you still need a byte per character? I know this is probably a stupid/noob question, and I'm probably missing something obvious, but I can't quite figure it out.

One byte is the range 00000000 - 11111111 in binary, or 0x00 - 0xFF in hex. As you can see, one byte is represented in hex as a 2 character string. Therefore, a 32 byte hex string is 64 characters long.

The 32-bit address points to the first byte of 32, 64, 1000 or 100 million sequential bytes. All other follow or are stored on address + 1, +2, +3...

Related

What exactly is the size of an ELF symbol (both for 64 & 32 bit) & how do you parse it

According to oracles documentation on the ELF file format a 64 bit elf symbol is 30 bytes in size (8 + 1 + 1 + 4 + 8 + 8), However when i use readelf to print out the sections headers of an elf file, & then inspect the "EntSize" (entry size) member of the symbol table section header, it reads that the symbol entries are in fact only hex 0x18 (dec 24) in size.
I have attached a picture of readelfs output next to the oracle documentation. The highlighted characters under "SYMTAB" is the "EntSize" member.
As i am about to write an ELF parser i am curious as to which i should believe? the read value of the EntSize member or the documentation?
I have also attempted to look for an answer in this ELF documentation however it doesn't seem to go into any detail of the 64 bit ELF structures.
It should be noted that the ELF file i run readelf on, in the above picture, is a 64bit executable
EICLASS, the byte just after the ELF magic number, contains the "class" of the ELF file, with the value "2" (in hex of course) meaning a 64 bit class.
When the 32 bit standard was drafted there were competing popular 64 bit architectures. The 32 bit standard was a bit vague about the 64 bit standard as it was quite possible at that time to imagine multiple competing 64 bit standards
https://www.uclibc.org/docs/elf-64-gen.pdf
should cover the 64 bit standard with better attention to the 64 bit layouts.
The way you "parse" it is to read the bytes in the order described in the struct.
typedef struct {
Elf64_Word st_name;
unsigned char st_info;
unsigned char st_other;
Elf64_Half st_shndx;
Elf64_Addr st_value;
Elf64_Xword st_size;
} Elf64_Sym;
The first 8 bytes are a st_name, the next byte is a st_info, and so on. Of course, it is critical to know where the struct "starts" within the file, and the spec above should help with that.
"64" in this case means a "64 bit entry", byte means an 8 bit entry, and so on.
the Elf64_Sym has 8+1+1+8+8+8 bytes in it, or 34 bytes.

Why is this uint32_t ordered this way in memory?

I'm learning about endianness, and i read that Intel processors usually are little-endian. Im on an Intel mac and thought i'd try it for myself to see it in action. I define a uint32_t and then try to print out the 4 bytes as they are ordered in memory.
uint32_t integer = 1027;
uint8_t * bytes = (uint8_t*)&integer;
printf("%04x\n", integer);
printf("%x%x%x%x\n", bytes[0], bytes[1], bytes[2], bytes[3]);
Output:
0403
3400
I expected to see the bytes either in reverse order (3040) or unchanged, but what's output is neither. What am i missing?
Im actually compiling it as Objective-C using Xcode if that makes any difference.
Because saving data occurs in unit of bytes (8 bits) in today's typical computers.
In machines in which little endian is used, the first byte is 03, the second byte is 04, and the third and forth bytes are 00.
The first digit in the second line 3 represents the first byte and the second digit 4 represents the second byte. To show bytes with 2 digits for each bytes, specify width to print in the format like
printf("%02x%02x%02x%02x\n", bytes[0], bytes[1], bytes[2], bytes[3]);
That is endianness.
There are two different approaches for storing data in memory. Little endian and big endian.
In big endian the most significant byte is stored first.
In little endian the least significant byte is stored first.
You system is little endian as the data is stored as
03 04 00 00
On a big endian system, it would have been
00 00 04 03
For printing use %02x to get the leading zero printed.

Hexadecimal numbers vs. hexadecimal enocding (with base64 as well)

Encoding with hexadecimal numbers seems to be different from using hexadecimals to represent numbers. For example, then hex number 0x40 to me should be equal to 64, or BA_{64}, but when I put it through this hex to base64 converter, I get the output: QA== which to me is equal to some number times 64. Why is this?
Also when I check the integer value of the hex string deadbeef I get 3735928559, but when I check it other places I get: 222 173 190 239. Why is this?
Addendum: So I guess it is because it is easier to break the number into bit chunks than treat it as a whole number when encoding? That is pretty confusing to me but I guess I get it.
You may wish to read this:
http://en.wikipedia.org/wiki/Base64
In summary, base64 specifies a specific encoding, which involves using different values for letters than their ASCII encoding.
For the second part, one source is treating the entire string as a 32 bit integer, and the other is dividing it into bytes and giving the value of each byte.

Why does this code encodes random salt first as hexadecimal digits?

I'm looking at some existing code that is generating a salt which is used as input into an authentication hash.
The salt is 16 bytes long, and is generated by first using an OS random number generator to get 8 bytes of random data.
Then each byte in the 8 byte buffer is used to place data into 2 bytes of the 16 byte buffer as follows:
out[j] = hexTable[data[i] & 0xF];
out[j-1] = hexTable[data[i] >> 4 & 0xF];
Where out is the 16 byte salt, data is the initial 8 byte buffer, j and i are just loop incrementers obviously, and hexTable is just an array of the hex digits i.e. 0 to F.
Why is all this being done? Why isn't the 16 byte salt just populated with random data to begin with? Why go through this elaborate process?
Is what is being done here a standard way of generating salts? What's the benefit and point of this over just generating 16 random bytes in the first place?
This is simply conversion of your 8 random bytes to 16 hexadecimal digits.
It seems that someone misunderstood the concept of salt, or what input your hash needs, and thought it only accepts hexadecimal digits.
Maybe also the salt is stored somewhere where it is easier to store hexadecimal digits instead of pure bytes, and the programmer thought it would be good to be able to reuse the stored salt as-is (i.e. without converting it back to bytes first).

Do certain characters take more bytes than others?

I'm not very experienced with lower level things such as howmany bytes a character is. I tried finding out if one character equals one byte, but without success.
I need to set a delimiter used for socket connections between a server and clients. This delimiter has to be as small (in bytes) as possible, to minimize bandwidth.
The current delimiter is "#". Would getting an other delimiter decrease my bandwidth?
It depends on what character encoding you use to translate between characters and bytes (which are not at all the same thing):
In ASCII or ISO 8859, each character is represented by one byte
In UTF-32, each character is represented by 4 bytes
In UTF-8, each character uses between 1 and 4 bytes
In ISO 2022, it's much more complicated
US-ASCII characters (of whcich # is one) will take only 1 byte in UTF-8, which is the most popular encoding that allows multibyte characters.
It depends on the encoding. In Single-byte character sets such as ANSI and the various ISO8859 character sets it is one byte per character. Some encodings such as UTF8 are variable width where the number of bytes to encode a character depends on the glyph being encoded.
The answer of course is that it depends. If you are in a pure ASCII env, then yes, every char takes 1 byte, but if you are in a Unicode env (all of Windows for example), then chars can range from 1 to 4 bytes in size.
If you choose a char from the ASCII set, then yes your delimter is a small as possible.
No, all characters are 1 byte, unless you're using Unicode or wide characters (for accents and other symbols for example).
A character is 1 byte, or 8 bits, long which gives 256 possible combination to form characters with. 1 byte characters are called ASCII characters. They only use 7 bits (even though 8 are available, but you can't use this 8th bit) to form the standard alphabet and various symbols used when teletypes and typewriters were still common.
You can find an ASCII chart and what numbers correspond to what characters here.