GMP variable's bit size - gmp

In GMP library,
_mp_size holds the number of limbs of an integer..
we can create integers of size
1 limb(32bits),2 limbs(64bits),3 limbs(96bits)...so on. using mpz_init or mpz_random functions..
cant we create an integer variable of size 8bit or 16 bit.. other than multiples of 32 bit size ???
can you code for that??
thank you ..

The GNU GMP library is for numbers that exceed the ranges provided by the standard C types. Use (unsigned) char or (unsigned) short for 8 and 16 bit integers, respectively.

This would be of limited utility, because most modern processors use at least a 32-bit word size.

I don't think you can. Here's an excerpt from a discussion at http://gmplib.org/list-archives/gmp-discuss/2004-June/001200.html:
The limb size is compiled into the
library, and is determined from the
available types of tghe [sic] processor and
the host environment.

Related

What is the bit width of a single webassembly instruction?

I know that webassembly currently supports a 32 bit architecture, so I am supposing that, like RISCV32, that its base instruction set has instructions which are 32 bit wide (Of course, RISCV32 supports 16-bit compressed instructions and 48-bit ones as well). RISC-V's instructions are interpreted mostly as left-endian (in terms of bit indices).
For example, in RISC-V, we can have an instruction like lui (load upper-immediate to register), that embeds a 20-bit immediate into an instruction, has a 5-bit field to encode the desitination register, and a 7-bit format to specify the opcode. Among other things, the opcode contains two bits at the beginning that connote whether the instruction is compressed or not. This is encoded in the specification, where lui has an LUI opcode.:
RISC-V instructions have a variety of different layouts specified in the specification as well, and for example, the lui instruction takes the "U" format, so we know exactly where the 20-bit field is and where the 5-bit destination register is in the serialization:
What is the bit width of a wasm instruction? What are the possible layouts of a wasm instruction? Are there compressed instruction formats for webassembly, such as 16-bit instructions for very common operations?
If webassembly instructions are variable-width, how is the width of an instruction encoded for the interpreter?
Binary WASM bytecode has variable-length instruction, not fixed-width like a RISC CPU. https://en.wikipedia.org/wiki/WebAssembly#Code_representation has an example.
It's not intended to be executed directly, but rather JITed into native machine code, thus a fixed-width format that would require multiple instructions for some 32 or 64-bit constants would make more work for the JIT optimizer. And would be less compact in the WASM binary format, and more instructions to parse.
Much better for the JIT optimizer to know the ultimate goal is to materialize a whole constant, since some ISAs will be able to do that in one instruction, and others will need it split up in different parts depending on the ISA. e.g. 20:12 for RISC-V, 16:16 for ARM movw/movk or MIPS, or if the constant only has set bits in a narrow region, ARM rotated immediates can maybe still use one instruction. Or AArch64 bit-pattern immediates can materialize a constant like 0x01010101 (or 0x0101010101010101) in a single 32-bit instruction.
TL:DR: Don't make the JIT put the pieces back together before breaking back down into asm that works for the target machine.
And in general, variable-length isn't much of a problem for a stream that will be parsed once by software anyway, not decoded repeatedly by hardware every time through a loop.
Examples
A lot of webassembly instructions take up one byte. For example, the left shift instructions are i32.shl andi64.shl and take single byte opcodes 0x74 and 0x86 without any subsequent values, while the i32.const instruction for example starts with 0x41 and takes from 2 to 6 bytes.
Instruction
Opcode
i32.const
0x41
i64.const
0x42
f32.const
0x43
f64.const
0x44
-
-
i32.shl
0x74
i64.shl
0x86
-
-
i32.eqz
0x45
i32.eq
0x46
i64.eqz
0x50
i64.eq
0x51
And so on. The values here are taken from the MDN website. See the Numeric Instructions.
Encoding Numbers
Some instructions such as the const above require specifying the immediate, which increases the overall size of the instruction. The immediates are encoded in LEB128, and the variant depends on whether the integer is signed or unsigned. Those are normally given in the specification.
LEB128 is roughly this: bits are padded to a multiple of seven, split into groups and the last bit is used to determine whether the end is reached. Those numbers are constrained to their maximum width. Floating point numbers are encoded in IEE-754
The const instructions are followed by the respective literal.
All other numeric instructions are plain opcodes without any immediates.
Source: https://webassembly.github.io/spec/core/binary/instructions.html#numeric-instructions
Wasm instructions are represented with a unique opcode (typically 1 byte, more for newer instruction), followed by the encodings of immediate operands, for instructions that have them. There is no specific length, it depends on both the opcode and the immediate values.
For example:
i32.add is opcode 0x6A with no immediates;
i64.const i is opcode 0x42, followed by a variable-length encoding of i in LEB128 format;
br_table l* ld is opcode 0x0E, followed by a variable-length encoding of the length of l* in LEB128, followed by as many variable-length encodings of the label indices in l*, followed by the variable-length encoding of label index ld.
See the binary grammar in the specification for details. A Wasm decoder is essentially "parsing" the binary input according to this grammar.
Here are some citations from the current specification v2.0 related to the instructions (as "seen" by the specification itself):
some instructions also have static immediate arguments, typically
indices or type annotations, which are part of the instruction itself.
Some instructions are structured in that they bracket nested sequences of instructions.
In relation to the nesting:
Implementations typically impose additional restrictions on a number of aspects of a WebAssembly module or execution
Then, one of the noted implementation limitations is:
the nesting depth of structured control instructions
As the nesting depth of the instructions is not strictly defined by the specification, but its left to the implementation to choose, that means that there is no limit of the instructions length regardless are they encoded as binary or text, as per the specification.
Even if we ignore the structured instructions (as we should not), there are many instructions having vectors as arguments. The vectors length is limited to 2^32-1. If my memory serves me right, there was and an instruction having vector of vectors as an argument.

What exactly is the size of an ELF symbol (both for 64 & 32 bit) & how do you parse it

According to oracles documentation on the ELF file format a 64 bit elf symbol is 30 bytes in size (8 + 1 + 1 + 4 + 8 + 8), However when i use readelf to print out the sections headers of an elf file, & then inspect the "EntSize" (entry size) member of the symbol table section header, it reads that the symbol entries are in fact only hex 0x18 (dec 24) in size.
I have attached a picture of readelfs output next to the oracle documentation. The highlighted characters under "SYMTAB" is the "EntSize" member.
As i am about to write an ELF parser i am curious as to which i should believe? the read value of the EntSize member or the documentation?
I have also attempted to look for an answer in this ELF documentation however it doesn't seem to go into any detail of the 64 bit ELF structures.
It should be noted that the ELF file i run readelf on, in the above picture, is a 64bit executable
EICLASS, the byte just after the ELF magic number, contains the "class" of the ELF file, with the value "2" (in hex of course) meaning a 64 bit class.
When the 32 bit standard was drafted there were competing popular 64 bit architectures. The 32 bit standard was a bit vague about the 64 bit standard as it was quite possible at that time to imagine multiple competing 64 bit standards
https://www.uclibc.org/docs/elf-64-gen.pdf
should cover the 64 bit standard with better attention to the 64 bit layouts.
The way you "parse" it is to read the bytes in the order described in the struct.
typedef struct {
Elf64_Word st_name;
unsigned char st_info;
unsigned char st_other;
Elf64_Half st_shndx;
Elf64_Addr st_value;
Elf64_Xword st_size;
} Elf64_Sym;
The first 8 bytes are a st_name, the next byte is a st_info, and so on. Of course, it is critical to know where the struct "starts" within the file, and the spec above should help with that.
"64" in this case means a "64 bit entry", byte means an 8 bit entry, and so on.
the Elf64_Sym has 8+1+1+8+8+8 bytes in it, or 34 bytes.

what the minimal amount of bytes that required to change for skip function

Consider that you get an ELF that has a segmentation fault in a function that names print_debug.
Since that function not relevant for the program you want to "cancel" the function manually by using Hexedit.
the size of the function is 100 bytes.
what the minimal amount of byte that required to change for fixing the file?
the answers:
1
2
99
The answers
The answer is: it depends on the instruction set.
On i*86 and x86_64 you can use a single-byte RET, but on a typical RISC machine you would need 4 bytes, and on ARM in Thumb mode you will need 2 (I think).

How to obtain number of entries in ELF's symbol table?

Consider standard hello world program in C compiled using GCC without any switches. As readelf -s says, it contains 64 symbols. It says also that .symtab section is 1024 bytes long. However each symbol table entry has 18 bytes, so how it is possible it contains 64 entries? It should be 56 entries. I'm constructing my own program which reads symbol table and it does not see those "missing" entries as it reads till section end. How readelf knows how long to read?
As one can see in elf.h, symbol entry structure looks like that:
typedef struct elf32_sym {
Elf32_Word st_name;
Elf32_Addr st_value;
Elf32_Word st_size;
unsigned char st_info;
unsigned char st_other;
Elf32_Half st_shndx;
} Elf32_Sym;
Elf32_Word and Elf32_Addr are 32 bit values, `Elf32_Half' is 16 bit, chars are 8 bit. That means that size of structure is 16 not 18 bytes. Therefore 1024 bytes long section gives exactly 64 entries.
The entries are aligned to each other and padded with blanks, therefore the size mismatch. Check out this mailthread for a similar discussion.
As for your code, I suggest to check out the source for readelf, especially the function process_symbol_table() in binutils/readelf.c.
The file size of an ELF data type can differ from the size of its in-memory representation.
You can use the elf32_fsize() and elf64_fsize() functions in libelf to retrieve the file size of an ELF data type.

GMP variable's bit size

How to know the size of a declared variable in GMP??or how can we decide the size of an integer in GMP?
mpz_random(temp,1);
in manual it is given that this function allocates 1limb(=32bits for my comp) size to the "temp"....
but it is having 9 digit number only..
SO i dont think that 32 bit size number holds only 9 digits number..
So please help me to know the size of integer variable in GMP ..
thanks in adv..
mpz_sizeinbase(num, 2) will give you the size in 'used' bits.
32 bits (4 bytes) really can be used to store only 9 decimal digits
2^32 = 4 294 967 296
so only 9 full decimal digits here (the 10th is in interval from 0 up 4, so it is not full).
You can recompute this via logarithms:
log_10(2^32)
let's ask google
log base 10(2^32) = 9.63295986
Everything is correct.
You can check the number of limbs in a debugger. A GMP integer has the internal field '_mp_size' which is the count of the limbs used to hold the current value of the variable (0 is a special case: it's represented with _mp_size = 0). Here's an example I ran in Visual C++ (see my article How to Install and Run GMP on Windows Using MPIR):
mpz_set_ui(temp, 1073741824); //2^30, (_mp_size = 1)
mpz_mul(temp,temp,temp); //2^60 (_mp_size = 2)
mpz_mul(temp,temp,temp); //2^120 (_mp_size = 4)