What happens exactly when you assign mload(0x40) to an address variable? - solidity

Size of an address is 20 bytes, which is smaller than the size of a slot. I didn't find any reference for what exactly happens in this case:
address tempBytes;
assembly {
tempBytes := mload(0x40) // 0x40 is the free memory pointer
}

The free memory pointer holds the position of the first unallocated memory position. Assuming there are no in-memory variables before this snippet, the first free memory slot is at position 0x80 (decimal 128, beginning of the 5th 32byte slot).
Meaning mload(0x40) (reads the value of the pointer located at 0x40) returns 0x80 (the value of the pointer).
Docs: https://docs.soliditylang.org/en/v0.8.16/internals/layout_in_memory.html
When a smaller type is casted to a larger type, it is prepended with leading zeros.
Applied to your example:
mload(0x40) returns 0x80
this value is typecased to address
the value of address tempBytes is 0x0000000000000000000000000000000000000080 (20 bytes, ends with hex80)

Related

Twincat 3 - SizeOf returning wrong structure size

I have a structure, and am trying to get the size of this structure. SizeOf returns 16, but I am expecting 14 as answer.
2+2+4+2+2+2=14
By using pointers I noticed that there are 2 empty bytes at the end of the structure.
If I replace the UDINT with UINT then the size is correct. If I put the UDINT at the end of the structure, then the two empty bytes are placed after iCrateCnt.
This leads me to believe that the sizeOf is working properly, but for some unknown reason there are two additional bytes placed somewhere in my structure that I am not using.
Why is this happening and how can it be solved?
The unexpected size returned by SIZEOF() are due to so called 'padding bytes'.
Where these padding bytes occur depends on:
The system that is used (Tc2 x86, Tc2 ARM, Tc3)
The data types that are used
The order in which these datatypes (c.q. variables) are defined
For more information about padding bytes see Alignment and Structures
As Kolyur has rightfully mentioned the attribute Pack_Mode can be used to control these padding bytes.
For example in Tc3:
TYPE HMI_POPUPSTRUCT : // The total size of this struct is 8 bytes
STRUCT
bVar1: BOOL; // At byte 0.
// At byte 1 there will be a padding byte
bVar2: INT; // At byte 2 and 3
bVar3: BOOL; // At byte 4
bVar4: BOOL; // At byte 5
bVar5: BOOL; // At byte 6.
// At byte 7 there will be a padding byte (8th byte)
END_STRUCT
When inserting either
{attribute 'pack_mode' := '0'}
or
{attribute 'pack_mode' := '1'}
just above the struct then there won't be any padding bytes resulting in a struct-size of 6 bytes instead of 8.
The pack_mode attribute can be used to eliminate unused bytes in a structure.
https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_plc_intro/2529746059.html&id=3686945105176987925

What does this notation mean in TLS documentation?

For example, looking at RFC 7301, which defines ALPN:
enum {
application_layer_protocol_negotiation(16), (65535)
} ExtensionType;
The (16) is the enum value to be used, but how should I read the (65535) part?
From the same document:
opaque ProtocolName<1..2^8-1>;
struct {
ProtocolName protocol_name_list<2..2^16-1>
} ProtocolNameList;
...how should I read the <1..2^8-1> and <2..2^16-1> parts?
The notation is described in https://www.rfc-editor.org/rfc/rfc8446.
For "enumerateds" (enums), see https://www.rfc-editor.org/rfc/rfc8446#section-3.5, which says that the value in brackets is the value of that enum member, and that the enum occupies as many octets as required by the highest documented value.
Thus, if you want to leave some room, you need an un-named enum member with a sufficiently high value.
One may optionally specify a value without its associated tag to force the width definition without defining a superfluous element.
In the following example, Taste will consume two bytes in the data stream but can only assume the values 1, 2, or 4.
enum { sweet(1), sour(2), bitter(4), (32000) } Taste;
For vectors, see https://www.rfc-editor.org/rfc/rfc8446#section-3.4. This says:
Variable-length vectors are defined by specifying a subrange of legal lengths, inclusively, using the notation <floor..ceiling>. When these are encoded, the actual length precedes the vector's contents in the byte stream. The length will be in the form of a number consuming as many bytes as required to hold the vector's specified maximum (ceiling) length.
So the notation <1..2^8-1> means that ProtocolName must be at least one octet, and up to 255 octets in length.
Similarly <2..2^16-1> means that protocol_name_list must have at least 2 octets (not entries), and can have up to 65535 octets (not entries).
In this particular case, the minimum of 2 octets is because it must contain at least one entry, which is itself at least 2 octets long (u8 length prefix, at least one octet in the value).
To make the octets/entries distinction clear, later in that section, it says:
uint16 longer<0..800>;
/* zero to 400 16-bit unsigned integers */

Soft question - Memory address of the last byte allocated of an array

The variable int A[10] is allocated from the HEX address DDDD04BA. I would like to find the HEX address of the last byte allocated of the array. Why we subtract 1 from:
DDDD04BA+28=DDDD04E2, DDDD04E2-1=DDDD04E1? Also, why the HEX address of the first byte of A[8] is DDDD.04BA+32=DDDD04DA, shouldn't it be +36 instead since it starts from A[0]?
Address of first element of array is &A[0]=DDDD04BA
Address of element A[1] is obtained by adding 4 to this value
Address of element A[i] is &A[0]+4*i
And the last element A[9] is at address &A[0]+4*9=&A[0]+36=&A[0]+0x24
A[9] is formed of 4 bytes. First is at address &A[9] and last at address &A[9]+3
We get the result 0xDDDD04BA+0x24+3=DDDD04E1
For the same reason, &A[8]=&A[0]+8*4=0xDDDD04DA
Do not forget that arrays with a 0 index, so to find address of element i, you just have to add i*sizeof(array_element) to base address.

Memory addresses, pointers, variables, values - what goes on behind the scenes

This is going to be a pretty loaded question but ever since I started learning about pointers I've been very curious about what happens behind the scenes when a program is run.
As far as I know, computer memory is commonly thought of as a long strip of memory divided evenly into individual bytes. Certainly pictures such as the following evoke such a metaphor:
One thing I've been wondering, what do the memory addresses themselves represent? I'm sure it's no coincidence that memory addresses appear as 8 digit hexadecimal values (eg/ 00EB5748). Why is this?
Furthermore, when I declare a variable x, what is happening at the memory level? Is the compiler simply reserving a random address (+however many consecutive addresses it needs for the variable type) for data storage?
Now suppose x is an unsigned int that occupies 2 bytes of memory (ie values ranging from 0 to 65536). When I declare x = 12, what is happening? What is it that I'm making equal to 12? When I draw conceptual diagrams, I usually have a box for an address (say &x) pointing to a variable (x) that occupies seemingly nothing, and I'm sure that can't be a fully accurate picture of what's going on.
And what's happening at the binary level? Is the address 00EB5748 treated as 111010110101011101001000 and storing a value of 12 somewhere, or 1100?
Mostly my confusion & curiosity stems from the relationship between memory addresses and actual values being declared (eg/ 12, 'a', -355.2). As another example, suppose our address 00EB5748 is pointing to a char 's' whose value is 115 according to ASCII charts. Is the address describing a position that stores the value 115 in 1 byte, by flipping the appropriate 1s and 0s at that position in memory?
Just open any book. You will see pages. Every page has a number. Consecutive pages are numbered by consecutive numbers. Do you have any confusion with numbered pages? I think no. Then you should not have confusion with computer memory.
Books were main memory storage devices before computer era. Computer memory derived basic concept from books: book has pages -> computer memory has memory cells, book has page numbers -> computer memory has memory addresses.
One thing I've been wondering, what do the memory addresses themselves represent?
Numbers. Every memory cell has number, like every page in book.
Furthermore, when I declare a variable x, what is happening at the memory level? Is the compiler simply reserving a random address (+however many consecutive addresses it needs for the variable type) for data storage?
Memory manager marks some memory cells occupied and tells the address of first reserved cell to compiler. Compiler associates name and type of variable with this address. (This picture is from my head, it can be inaccurate).
When I declare x = 12, what is happening?
When you declared variable x, memory cells were reserved for this variable. Now you write 12 into these memory cells. Note that 12 is binary coded in some way, depending on type of variable x. If x is unsigned int which occupies 2 memory cells, then one cell will contain 0, other will contain 12. Because binary integer representation of 12 is
0000 0000 0000 1100
|_______| |_______|
cell cell
If 12 is floating-point number it will be coded in other way.
A memory address is simply the position of a given byte in memory. The zeroth byte is at 0x00000000. The tenth at 0x0000000A. The 65535th at 0x0000FFFF. And so on.
Local variables live on the stack*. When compiling a block of code, the compiler counts how many bytes are needed to hold all the local variables, and then increments the stack pointer so that all the variables can fit below it (along with some other stuff like frame pointers and return addresses and whatnot). Then it just remembers that, for example, local variable x is at an offset -2 from the stack pointer, foo is at an offset -4 and so on, and uses those addresses whenever those variables are referenced in the following code.
Since the compiler knows that x is at address (stack pointer - 2), that's the location that is set to the value 12 when you do x = 12.
Not entirely sure if I understand this question, but say you want to read the memory at address 0x00EB5748. The control unit in the CPU reads the instruction, sees that it is a load instruction, and passes the address (in binary of course) to the load/store unit, along with some other junk like how many bytes to read. Then the LSU sends that address to some memory (probably L1 cache), and after a certain time gets the value 12 back. Then this data is available to, say, put in a register, or send to the ALU to do arithmetic, or whatever.
That seems to be accurate, yes. Going back to the first question, an address simply means "byte number 0xWHATEVER in memory".
Hope this clarified things a bit at least.
*I should probably explain the stack as well. A stack is a portion of memory reserved for local variables (and some other stuff). It starts at a fixed location in memory, and stops at the memory address contained in a special register called the stack pointer. To begin with, the stack is empty, so the stack pointer just contains the start of the stack. As you put more data on the stack, the SP is incremented. This means that you can always put more data on it simply by putting it at the address in the SP, and then incrementing the SP so that once again anything past that address is free memory.

Should there be a difference between an empty BSTR and a NULL BSTR?

When maintaining a COM interface should an empty BSTR be treated the same way as NULL?
In other words should these two function calls produce the same result?
// Empty BSTR
CComBSTR empty(L""); // Or SysAllocString(L"")
someObj->Foo(empty);
// NULL BSTR
someObj->Foo(NULL);
Yes - a NULL BSTR is the same as an empty one. I remember we had all sorts of bugs that were uncovered when we switched from VS6 to 2003 - the CComBSTR class had a change to the default constructor that allocated it using NULL rather than an empty string. This happens when you for example treat a BSTR as a regular C style string and pass it to some function like strlen, or try to initialise a std::string with it.
Eric Lippert discusses BSTR's in great detail in Eric's Complete Guide To BSTR Semantics:
Let me list the differences first and
then discuss each point in
excruciating detail.
A BSTR must have identical
semantics for NULL and for "". A PWSZ
frequently has different semantics for
those.
A BSTR must be allocated and freed
with the SysAlloc* family of
functions. A PWSZ can be an
automatic-storage buffer from the
stack or allocated with malloc, new,
LocalAlloc or any other memory
allocator.
A BSTR is of fixed length. A PWSZ
may be of any length, limited only by
the amount of valid memory in its
buffer.
A BSTR always points to the first
valid character in the buffer. A PWSZ
may be a pointer to the middle or end
of a string buffer.
When allocating an n-byte BSTR you
have room for n/2 wide characters.
When you allocate n bytes for a PWSZ
you can store n / 2 - 1 characters --
you have to leave room for the null.
A BSTR may contain any Unicode data
including the zero character. A PWSZ
never contains the zero character
except as an end-of-string marker.
Both a BSTR and a PWSZ always have a
zero character after their last valid
character, but in a BSTR a valid
character may be a zero character.
A BSTR may actually contain an odd
number of bytes -- it may be used for
moving binary data around. A PWSZ is
almost always an even number of bytes
and used only for storing Unicode
strings.
The easiest way to handle this dilemma is to use CComBSTR and check for .Length() to be zero. That works for both empty and NULL values.
However, keep in mind, empty BSTR must be released or there will be a memory leak. I saw some of those recently in other's code. Quite hard to find, if you are not looking carefully.