CIL hex code to call method in another assembly - cil

For example, I'm writing code in assembly A, and the method I want to call is in assembly B at 0x06000DF2. Here is the hex dnSpy create for me 6F8701000A, but I don't know how it's calculated. Please explain to me. Thank you!

The first byte (6F) indicates that it is the callvirt instruction, the remaining 4 bytes is the metadata token for the method little endian byte order.
callvirt 0x0A000187
The metadata token is a reference to a particular row in a particular table in the metadata of the current module (the module that contains the IL). The high order byte indicates the type of token (and hence, which metadata table to look in), while the remaining 3 bytes indicates the row number within the table. 0x0A indicates that the target row is in the MemberRef table and the referenced record will provide the details necessary to find the correct member.
The MemberRef table is described in ECMA-335 Partition II, section 22.25.

Related

How to decide when to reflect or XOR CRC data?

I found multiple optimal CRC-32 polynomials on the CRC Polynomal Zoo site of Philip Koopman. Now I want to generate a CRC lookup table for one of the polynomials, by using the software pycrc.
To generate a CRC lookup table you have to provide the following information for the choosen polynomial:
Reflected in (boolean)
Reflected out (boolean)
XOR in (hex value)
XOR out (hex value)
For some polynomials I found the above parameters in a specification (for instance a AUTOSAR specification for the polynomial "F4ACFB13"), but what parameters should I choose if there is no specification for a certain polynomial? The Koopman site doesn't seem to provide the recommended parameters to use.
I already tried to find an explanation how to choose these parameters, but I could only find explanations how to implement these parameters and not how to choose them. Most websites recommend searching for specifications describing "common CRC polynomials", because they provide the optimal parameters.
Generally you are trying to match the CRC used in some existing protocol. In that case you need to do the same thing you did for the AUTOSAR CRC: find the specification for the CRC. Or you need to get several examples of messages and correct CRCs and try to reverse-engineer the CRC parameters.
You can find over a hundred CRC definitions here.
If you are creating your own protocol from scratch, then you can select any polynomial, reflection, initial value, and final exclusive-or you like, as well as any byte order of the CRC in the message. I would recommend that the polynomial be chosen with good properties for your message length from Phil's data, and that the initial value of the CRC register, init, not be zero. (If it is zero, then the CRC of any string of zeros will be the same value, that final exclusive-or, regardless of the length.) Also there is no detriment, and it is more aesthetic to pick the initial value and the final exclusive-or to be equal, so that the CRC of an empty sequence is zero.

Why does the trailer object report a previous value for the "Size" entry?

I'm trying to write code that investigates changes to a PDF document after signing (pointers welcome) and came across this strange issue.
I want to retrieve the number of objects in the PDF file as indexed in the xref tables. It seems that, while all other entries in the trailer dictionary are that of the final trailer, the number for Size is the one on the original trailer. In my particular case there have been 2 updates to the original document (adding 2 xref tables for a total of 3), adding objects up to the number 567, from the original of 550.
This is how I get the Size from the trailer dictionary:
private static long getMaxObjId(PDDocument doc) {
COSDocument cosdoc = doc.getDocument();
COSDictionary trailer = cosdoc.getTrailer();
long maxobj = trailer.getLong(COSName.SIZE);
return maxobj;
}
I'm using PDFBox 2.0.21.
You are right. The Size entry in that trailer contains the lowest (i.e. usually the oldest) Size value of all trailers in the document while all other entries in that trailer contain the newest value of their respective keys.
And the cause for this is even worse than I originally thought: That trailer object you get is not simply the latest (or, considering the Size value, the earliest) trailer dictionary in the document, it is the union of all trailer dictionaries, starting with the earliest trailer in the Prev chain up to the newest one.
So far so good. But shouldn't this mean that all entries in that union trailer should have the value from the newest trailer dictionary with the entry key? That's what I thought until I saw the COSDictionary.addAll(COSDictionary) code used to create that union:
/**
* This will add all of the dictionaries keys/values to this dictionary.
* Only called when adding keys to a trailer that already exists.
*
* #param dic The dictionaries to get the keys from.
*/
public void addAll(COSDictionary dic)
{
dic.forEach((key, value) ->
{
/*
* If we're at a second trailer, we have a linearized pdf file, meaning that the first Size entry represents
* all of the objects so we don't need to grab the second.
*/
if (!COSName.SIZE.equals(key) || !items.containsKey(COSName.SIZE))
{
setItem(key, value);
}
});
}
Here an existing Size entry is explicitly not replaced!
This explains the original observation that the Size entry in that trailer contains the lowest (i.e. usually the oldest) Size value of all trailers in the document while all other entries in that trailer contain the newest value of their respective keys.
The comments give rise to the assumption that this is a relic from the times when PDFBox by default parsed a PDF from the front, ignoring cross reference tables, and the only relevant test PDFs were ones without normal incremental updates, merely ones without updates at all and ones with linearization which uses mechanisms defined for incremental updates in inverse order. And only in case of such linearized documents this exception might make sense.
But why I consider this worse than originally thought: this addAll method is a public COSDictionary method which by its name parallels the Java Collection Framework addAll. Thus, it makes the user think the first JavaDoc line, This will add all of the dictionaries keys/values to this dictionary, is true; so he'll use it for that task, never expecting that Size entries won't be replaced.
Indeed, even in the PDFBox code itself COSDictionary.addAll(COSDictionary) is used in other context than for trailer unions in spite of the second JavaDoc line, Only called when adding keys to a trailer that already exists.
This should be inspected and fixed. I created a Jira issue to that effect, PDFBOX-4999.

Erlang binary protocol serialization

I'm currently using Erlang for a big project but i have a question regarding a proper proceeding.
I receive bytes over a tcp socket. The bytes are according to a fixed protocol, the sender is a pyton client. The python client uses class inheritance to create bytes from the objects.
Now i would like to (in Erlang) take the bytes and convert these to their equivelant messages, they all have a common message header.
How can i do this as generic as possible in Erlang?
Kind Regards,
Me
Pattern matching/binary header consumption using Erlang's binary syntax. But you will need to know either exactly what bytes or bits your are expecting to receive, or the field sizes in bytes or bits.
For example, let's say that you are expecting a string of bytes that will either begin with the equivalent of the ASCII strings "PUSH" or "PULL", followed by some other data you will place somewhere. You can create a function head that matches those, and captures the rest to pass on to a function that does "push()" or "pull()" based on the byte header:
operation_type(<<"PUSH", Rest/binary>>) -> push(Rest);
operation_type(<<"PULL", Rest/binary>>) -> pull(Rest).
The bytes after the first four will now be in Rest, leaving you free to interpret whatever subsequent headers or data remain in turn. You could also match on the whole binary:
operation_type(Bin = <<"PUSH", _/binary>>) -> push(Bin);
operation_type(Bin = <<"PULL", _/binary>>) -> pull(Bin).
In this case the "_" variable works like it always does -- you're just checking for the lead, essentially peeking the buffer and passing the whole thing on based on the initial contents.
You could also skip around in it. Say you knew you were going to receive a binary with 4 bytes of fluff at the front, 6 bytes of type data, and then the rest you want to pass on:
filter_thingy(<<_:4/binary, Type:6/binary, Rest/binary>>) ->
% Do stuff with Rest based on Type...
It becomes very natural to split binaries in function headers (whether the data equates to character strings or not), letting the "Rest" fall through to appropriate functions as you go along. If you are receiving Python pickle data or something similar, you would want to write the parsing routine in a recursive way, so that the conclusion of each data type returns you to the top to determine the next type, with an accumulated tree that represents the data read so far.
I only covered 8-bit bytes above, but there is also a pure bitstring syntax, which lets you go as far into the weeds with bits and bytes as you need with the same ease of syntax. Matching is a real lifesaver here.
Hopefully this informed more than confused. Binary syntax in Erlang makes this the most pleasant binary parsing environment in a general programming language I've yet encountered.
http://www.erlang.org/doc/programming_examples/bit_syntax.html

Does the "C" code algorithm in RFC1071 work well on big-endian machine?

As described in RFC1071, an extra 0-byte should be added to the last byte when calculating checksum in the situation of odd count of bytes:
But in the "C" code algorithm, only the last byte is added:
The above code does work on little-endian machine where [Z,0] equals Z, but I think there's some problem on big-endian one where [Z,0] equals Z*256.
So I wonder whether the example "C" code in RFC1071 only works on little-endian machine?
-------------New Added---------------
There's one more example of "breaking the sum into two groups" described in RFC1071:
We can just take the data here (addr[]={0x00, 0x01, 0xf2}) for example:
Here, "standard" represents the situation described in the formula [2], while "C-code" representing the C code algorithm situation.
As we can see, in "standard" situation, the final sum is f201 regardless of endian-difference since there's no endian-issue with the abstract form of [Z,0] after "Swap". But it matters in "C-code" situation because f2 is always the low-byte whether in big-endian or in little-endian.
Thus, the checksum is variable with the same data(addr&count) on different endian.
I think you're right. The code in the RFC adds the last byte in as low-order, regardless of whether it is on a litte-endian or big-endian machine.
In these examples of code on the web we see they have taken special care with the last byte:
https://github.com/sjaeckel/wireshark/blob/master/epan/in_cksum.c
and in
http://www.opensource.apple.com/source/tcpdump/tcpdump-23/tcpdump/print-ip.c
it does this:
if (nleft == 1)
sum += htons(*(u_char *)w<<8);
Which means that this text in the RFC is incorrect:
Therefore, the sum may be calculated in exactly the same way
regardless of the byte order ("big-endian" or "little-endian")
of the underlaying hardware. For example, assume a "little-
endian" machine summing data that is stored in memory in network
("big-endian") order. Fetching each 16-bit word will swap
bytes, resulting in the sum; however, storing the result
back into memory will swap the sum back into network byte order.
The following code in place of the original odd byte handling is portable (i.e. will work on both big- and little-endian machines), and doesn't depend on an external function:
if (count > 0)
{
char buf2[2] = {*addr, 0};
sum += *(unsigned short *)buf2;
}
(Assumes addr is char * or const char *).

How are the digits in ObjC method type encoding calculated?

Is is a follow-up to my previous question:
What are the digits in an ObjC method type encoding string?
Say there is an encoding:
v24#0:4:8#12B16#20
How are those numbers calculated? B is a char so it should occupy just 1 byte (not 4 bytes). Does it have something to do with "alignment"? What is the size of void?
Is it correct to calculate the numbers as follows? Ask sizeof on every item and round up the result to multiple of 4? And the first number becomes the sum of all the other ones?
The numbers were used in the m68K days to denote stack layout. That is, you could literally decode the the method signature and, for just about all types, know exactly which bytes at what offset within the stack frame you could diddle to get/set arguments.
This worked because the m68K's ABI was entirely [IIRC -- been a long long time] stack based argument/return passing. There wasn't anything shoved into registers across call boundaries.
However, as Objective-C was ported to other platforms, always-on-the-stack was no longer the calling convention. Arguments and return values are often passed in registers.
Thus, those offsets are now useless. As well, the type encoding used by the compiler is no longer complete (because it never was terribly useful) and there will be types that won't be encoded. Not too mention that encoding some C++ templatized types yields method type encoding strings that can be many Kilobytes in size (I think the record I ran into was around 30K of type information).
So, no, it isn't correct to use sizeof() to generate the numbers because they are effectively meaningless to everything. The only reason why they still exist is for binary compatibility; there are bits of esoteric code here and there that still parse the type encoding string with the expectation that there will be random numbers sprinkled here and there.
Note that there are vestiges of API in the ObjC runtime that still lead one to believe that it might be possible to encode/decode stack frames on the fly. It really isn't as the C ABI doesn't guarantee that argument registers will be preserved across call boundaries in the face of optimization. You'd have to drop to assembly and things get ugly really really fast (>shudder<).
The full encoding string is constructed (in clang) by the method ASTContext::getObjCEncodingForMethodDecl, which you can find in lib/AST/ASTContext.cpp.
The method that does the size rounding is ASTContext::getObjCEncodingTypeSize, in the same file. It forces each size to be at least the size of an int. On all of Apple's current platforms, an int is 4 bytes.
The stack frame size and argument offsets are calculated by the compiler. I'm actually trying to track this down in the Clang source myself this week; it possibly has something to do with CodeGenTypes::arrangeObjCMessageSendSignature. (Looks like Rob just made my life a lot easier!)
The first number is the sum of the others, yes -- it's the total space occupied by the arguments. To get the size of the type represented by an ObjC type encoding in your code, you should use NSGetSizeAndAlignment().