x86 binary bloat - 32-bit offsets when 8-bits would do - optimization

I'm using clang+LLVM 2.9 to compile various workloads for x86 with the -Os option. Small binary size is important and I must use static linking. All binaries are 32-bit.
I notice that many instructions use addressing modes with 32-bit displacements when only 8 bits are actually used. For example:
89 84 24 d4 00 00 00 mov %eax,0xd4(%esp)
Why didn't the compiler/assembler choose the compact 8-bit displacement?
89 44 24 d4 mov %eax,0xd4(%esp)
In fact, these wasted addressing bytes are over 2% of my entire binary!
I looked at LLVM's link time optimization and tried --emit-llvm, but it didn't mention or help this issue.
Is there some link-time optimization that can use knowledge of the actual displacements to choose the smaller instruction form?
Thanks for any help!

In x86, offsets are signed. This allows you to access data on both sides of the base address. Therefore, the range of an 8 bit offset is -128 to 127. Your instruction is referencing data 212 bytes forward (the value 0xD4 in decimal). If it had been encoded using an 8 bit offset, it would be -44 in decimal, which is not what you wanted.

Related

Are there unused bits in aarch64 instruction encoding?

As per this link about aarch64 instruction encoding, there are unused bits in some instructions, like x in below listing for LDR instruciton. But I any documentation about unused bits in armv8 manual. Are these unused bits valid according to armv8 manual?
xxx1 1101 x1ii iiii iiii iinn nnnt tttt - ldr Ft ADDR_UIMM12
That link is from 2012, that is when the ARMv8 architecture was released, so there was not a lot of information about it. The 'x' in that case is related to the decoding of the instruction, not sure about how they do it, it does not look correct to me.
You can find all the values for the encoding in the ARM Architecture Reference Manual, look at the LDR instructions use immediate values (e.g LDR (immediate) page 693 specifically the Unsigned offset in the next page).
You will see there that the two most significant bits are used for the size of the register (size == 10 is for W registers (32 bits) and size == 11 for X registers (64 bits)).
In the ARM Architecture Reference Manual usually, when there are encodings that are not used it says Unallocated Encoding or Reserve Encoding or something similar.
Also, there are plenty of free encodings to be used, probably for future use or for example for the Scalable Vector Extensions module. You can see all the encodings used and free in the following slides by Nigel Stephens at the Hot Chips 28 conference on August 22, 2016, look at slice 8, the grey squares are free unused encodings.

What is the difference between “SHA-2” and “SHA-256”

I'm a bit confused on the difference between SHA-2 and SHA-256 and often hear them used interchangeably. I think SHA-2 a "family" of hash algorithms and SHA-256 a specific algorithm in that family. Can anyone clear up the confusion.
The SHA-2 family consists of multiple closely related hash functions. It is essentially a single algorithm in which a few minor parameters are different among the variants.
The initial spec only covered 224, 256, 384 and 512 bit variants.
The most significant difference between the variants is that some are 32 bit variants and some are 64 bit variants. In terms of performance this is the only difference that matters.
On a 32 bit CPU SHA-224 and SHA-256 will be a lot faster than the other variants because they are the only 32 bit variants in the SHA-2 family. Executing the 64 bit variants on a 32 bit CPU will be slow due to the added complexity of performing 64 bit operations on a 32 bit CPU.
On a 64 bit CPU SHA-224 and SHA-256 will be a little slower than the other variants. This is because due to only processing 32 bits at a time, they will have to perform more operations in order to make it through the same number of bytes. You do not get quite a doubling in speed from switching to a 64 bit variant because the 64 bit variants do have a larger number of rounds than the 32 bit variants.
The internal state is 256 bits in size for the two 32 bit variants and 512 bits in size for all four 64 bit variants. So the number of possible sizes for the internal state is less than the number of possible sizes for the final output. Going from a large internal state to a smaller output can be good or bad depending on your point of view.
If you keep the output size fixed it can in general be expected that increasing the size of the internal state will improve security. If you keep the size of the internal state fixed and decrease the size of the output, collisions become more likely, but length extension attacks may become easier. Making the output size larger than the internal state would be pointless.
Due to the 64 bit variants being both faster (on 64 bit CPUs) and likely to be more secure (due to larger internal state), two new variants were introduced using 64 bit words but shorter outputs. Those are the ones known as 512/224 and 512/256.
The reasons for wanting variants with output that much shorter than the internal state is usually either that for some usages it is impractical to use such a long output or that the output need to be used as key for some algorithm that takes an input of a certain size.
Simply truncating the final output to your desired length is also possible. For example a HMAC construction specify truncating the final hash output to the desired MAC length. Due to HMAC feeding the output of one invocation of the hash as input to another invocation it means that using a hash with shorter output results in a HMAC with less internal state. For this reason it is likely to be slightly more secure to use HMAC-SHA-512 and truncate the output to 384 bits than to use HMAC-SHA-384.
The final output of SHA-2 is simply the internal state (after processing length extended input) truncated to the desired number of output bits. The reason SHA-384 and SHA-512 on the same input look so different is that a different IV is specified for each of the variants.
Wikipedia:
The SHA-2 family consists of six hash functions with digests (hash
values) that are 224, 256, 384 or 512 bits: SHA-224, SHA-256, SHA-384,
SHA-512, SHA-512/224, SHA-512/256.

How to write integer value "60" in 16bit binary, 32bit binary & 64bit binary

How to write integer value "60" in other binary formats?
8bit binary code of 60 = 111100
16bit binary code of 60 = ?
32bit binary code of 60 = ?
64bit binary code of 60 = ?
is it 111100000000 for 16 bit binary?
why does 8bit binary code contain 6bits instead of 8?
I googled for the answers but I'm not able to get these answers. Please provide answers as I'm still a beginner of this area.
Imagine you're writing the decimal value 60. You can write it using 2 digits, 4 digits or 8 digits:
1. 60
2. 0060
3. 00000060
In our decimal notation, the most significant digits are to the left, so increasing the number of digits for representation, without changing the value, means just adding zeros to the left.
Now, in most binary representations, this would be the same. The decimal 60 needs only 6 bits to represent it, so an 8bit or 16bit representation would be the same, except for the left-padding of zeros:
1. 00111100
2. 00000000 00111100
Note: Some OSs, software, hardware or storage devices might have different Endianness - which means they might store 16bit values with the least significant byte first, then the most signficant byte. Binary notation is still MSB-on-the-left, as above, but reading the memory of such little-endian devices will show any 16bit chunk will be internally reversed:
1. 00111100 - 8bit - still the same.
2. 00111100 00000000 - 16bit, bytes are flipped.
every number has its own binary number, that means that there is only one!
on a 16/32/64 bit system 111100 - 60 would just look the same with many 0s added infront of the number (regulary not shown)
so on 16 bit it would be 0000000000111100
32 bit - 0000000000000000000000000011110
and so on
For storage Endian matters ... otherwise bitwidth zeros are always prefixed so 60 would be...
8bit: 00111100
16bit: 0000000000111100

What does alignment to 16-byte boundary mean in x86

Intel's official optimization guide has a chapter on converting from MMX commands to SSE where they state the fallowing statment:
Computation instructions which use a memory operand that may not be aligned to a 16-byte boundary must be replaced with an unaligned 128-bit load (MOVDQU) followed by the same computation operation that uses instead register operands.
(chapter 5.8 Converting from 64-bit to 128-bit SIMD Integers, pg. 5-43)
I can't understand what they mean by "may not be aligned to a 16-byte boundary", could you please clarify it and give some examples?
Certain SIMD instructions, which perform the same instruction on multiple data, require that the memory address of this data is aligned to a certain byte boundary. This effectively means that the address of the memory your data resides in needs to be divisible by the number of bytes required by the instruction.
So in your case the alignment is 16 bytes (128 bits), which means the memory address of your data needs to be a multiple of 16. E.g. 0x00010 would be 16 byte aligned, while 0x00011 would not be.
How to get your data to be aligned depends on the programming language (and sometimes compiler) you are using. Most languages that have the notion of a memory address will also provide you with means to specify the alignment.
I'm guessing here, but could it be that "may not be aligned to a 16-byte boundary" means that this memory location has been aligned to a smaller value (4 or 8 bytes) before for some other purposes and now to execute SSE instructions on this memory you need to load it into a register explicitly?
Data that's aligned on a 16 byte boundary will have a memory address that's an even number — strictly speaking, a multiple of two. Each byte is 8 bits, so to align on a 16 byte boundary, you need to align to each set of two bytes.
Similarly, memory aligned on a 32 bit (4 byte) boundary would have a memory address that's a multiple of four, because you group four bytes together to form a 32 bit word.

Calculate size in Hex Bytes

what is the proper way to calculate the size in hex bytes of a code segment. I am given:
IP = 0848 CS = 1488 DS = 1808 SS = 1C80 ES = 1F88
The practice exercise I am working on asks what is the size (in hex bytes) of the code segment and gives these choices:
A. 3800 B. 1488 C. 0830 D. 0380 E. none of the above
The correct answer is A. 3800, but I haven't a clue as to how to calculate this.
How to calculate the length:
Note CS. Find the segment register that's nearest to it, but greater.
Take the difference between the two, and multiply by 0x10 (read: tack on a 0).
In your example, DS is closest. 1808 - 1488 == 380. And 380 x 10 = 3800.
BTW, this only works on the 8086 and other, similarly boneheaded CPUs, and in real mode on x86. In protected mode on x86 (which is to say, unless you're writing a boot sector or a simple DOS program), the value of the segment register has very little to do with the size of the segment, and thus the stuff above simply doesn't apply.