Convert MethodBody to Expression Tree - .net-4.0

Is there a way to convert a MethodBody (or other Reflection technique) into a System.Linq.Expressions.Expression tree?

It is indeed possible, see DelegateDecompiler:
https://github.com/hazzik/DelegateDecompiler
NOTE: I am not affiliated with this project
Edit
Here is the basic approach that the project takes:
Get MethodInfo for the method you want to convert
Use methodInfo.GetMethodBody to get a MethodBody object. This contains,
among other things, the MSIL and info on arguments and locals
Go through the instructions, examine the opcodes, and build the appropriate Expressions
Tie it all together and return an optimized Expression
Here is a code snippet from the project that decompiles a method body:
public class MethodBodyDecompiler
{
readonly IList<Address> args;
readonly VariableInfo[] locals;
readonly MethodInfo method;
public MethodBodyDecompiler(MethodInfo method)
{
this.method = method;
var parameters = method.GetParameters();
if (method.IsStatic)
args = parameters
.Select(p => (Address) Expression.Parameter(p.ParameterType, p.Name))
.ToList();
else
args = new[] {(Address) Expression.Parameter(method.DeclaringType, "this")}
.Union(parameters.Select(p => (Address) Expression.Parameter(p.ParameterType, p.Name)))
.ToList();
var body = method.GetMethodBody();
var addresses = new VariableInfo[body.LocalVariables.Count];
for (int i = 0; i < addresses.Length; i++)
{
addresses[i] = new VariableInfo(body.LocalVariables[i].LocalType);
}
locals = addresses.ToArray();
}
public LambdaExpression Decompile()
{
var instructions = method.GetInstructions();
var ex = Processor.Process(locals, args, instructions.First(), method.ReturnType);
return Expression.Lambda(new OptimizeExpressionVisitor().Visit(ex), args.Select(x => (ParameterExpression) x.Expression));
}
}

No, there isn't.
You're basically asking for a somewhat simpler version of Reflector.

Yes, it is possible... but it hasn't been done yet, as far as I know.
If anyone does know of a library that de-compiles methods to expression trees, please let me know, or edit the above statement.
The most difficult part of what you would have to do is write a CIL de-compiler. That is, you would need to translate the fairly low-level CIL instructions (which conceptually target a stack machine) into much higher-level expressions.
Tools such as Redgate's Reflector or Telerik's JustDecompile do just that, but instead of building expression trees, they display source code; you could say they go one step further, since expression trees are basically still language-agnostic.
Some notable cases where this would get especially tricky:
You would have to deal with cases of CIL instructions for which no pre-defined Expression tree node exists; let's say, tail.call, or cpblk (I'm guessing a little here). That is, you'd have to create custom expression tree node types; having them compiled back into an executable method when you .Compile() the expression tree might be an issue, because the expression tree compiler tries to break down custom nodes into standard nodes. If that is not possible, then you cannot compile the expression tree any more, you could only inspect it.
Would you try to recognise certain high-level constructs, such as a C# using block, and try to build a (custom) expression tree node for it? Remember that C# using breaks down to the equivalent of try…finally { someObj.Dispose(); } during compilation, so that is what you might see instead of using if you reflected over the method body's CIL instructions and exception handling clauses.
Thus, in general, expect that you need to be able to "recognise" certain code patterns and summarise them into a higher-level concept.

Related

MicrosoftAjaxMinifier doesn't seem to remove "unreachable code"

I'm using this with BundleTransformer from nuget and System.Web.Optimisation in an ASP.Net app. According to various docs this minifier is supposed to "remove unreachable code". I know it's not as aggressive as google closure (which I can't use presently) but I can't get even the simplest cases to work, eg;
function foo() {
}
where foo isn't called from anywhere. I can appreciate the argument that says this might be an exported function but I can't see a way to differentiate that. All my JS code is concatenated so it would be able to say for sure whether that function was needed or not if I can find the right switches.
The only way I've found to omit unnecessary code is to use the debugLookupList property in the web.config for BundleTransformer but that seems like a sledgehammer to crack a nut. It's not very granular.
Does anyone have an example of how to write so-called 'unreachable code' that this minifier will recognise?
Here's a place to test online
I doubt the minifier has any way of knowing if a globally defined function can be removed safely (as it doesn't know the full scope). On the other hand it might not remove any unused functions and might only be interested in unreachable code (i.e. code after a return).
Using the JavaScript Module Pattern, your unused private functions would most likely get hoovered up correctly (although I've not tested this). In the example below, the minifier should only be confident about removing the function called privateFunction. Whether it considers unused functions as unreachable code is another matter.
var AmazingModule = (function() {
var module = {};
function privateFunction() {
// ..
}
module.otherFunction = function() {
// ..
};
return module;
}());
function anotherFunction() {
// ..
}

Serialization of ANTLR ParseTree

I have a generated grammar that does two things:
Check the syntax of a domain specific language
Evaluate input against that domain specific language
These two functions are separate, lets call them validate() and evaluate().
The validate() function builds the tree from a String input while ensuring it meets the requirements of the BNF for the language. The evaluate() function plugs in values to that tree to get a result (usually true or false).
What the code is currently doing is running validate() each time on the input, just to generate the tree that evaluate() uses. Some of the inputs take up to 60 seconds to be checked. What I would LIKE to do is serialize the results of validate() (assuming it meets the syntax requirements), store the serialized form in the backend database, and just load it from the database as part of evaluate().
I noticed that I can execute the method toStringTree() on the parse tree, and retrieve a LISP style tree. However, can I restore a LISP style tree to an ANTLR parse tree? If not, can anyone recommend another way to serialize and store the generated parse tree?
Thanks for any help.
Jason
ANTLR 4's ParseRuleContext data structure (the specific implementation of ParseTree used by generated parsers to represent grammar rules in the parse tree) is not serializable by default. Open issue #233 on the project issue tracker covers the feature request. However, based on my experience with many applications using ANTLR for parsing, I'm not convinced serializing the parse trees would be useful in the long run. For each problem serializing the parse tree is meant to address, a better solution already exists.
Another option is to store a hash of the last known valid file in the database. After you use the parser to create a parse tree, you could skip the validation step if the input file has the same hash as the last time it was validated. This leverages two aspects of ANTLR 4:
For the same input file, running the parser twice will produce the same parse tree.
The ANTLR 4 parser is extremely fast in almost all cases (e.g. the Java grammar can process around 20MB of source per second). The remaining cases tend to be caused by poorly structured grammar rules that the new parser interpreter feature in ANTLRWorks 2.2 can analyze and make suggestions for improvement.
If you need performance beyond what you get with this, then a parse tree isn't the data structure you should be using. StringTemplate 4's enormous performance advantage over StringTemplate 3 came primarily from the fact that the interpreter switched from using ASTs (equivalent to parse trees for this reasoning) to a linear bytecode representation/interpreter. The ASTs for ST4 would never need to be serialized for performance reasons because the bytecode would be serialized instead. In fact, the C# port of StringTemplate 4 provides exactly this feature.
If the input data to your grammar is made of several independent blocks, you could try to store the string of each block separately, and run the parsing process again for each block independently, using a ThreadPool for example.
Say for example your input data is a set of method declarations:
int add(int a, int b) {
return a+b;
}
int mul(int a, int b) {
return a*b;
}
...
and the grammar is something like:
methodList : methodDeclaration methodList
|
;
methodDeclaration : // your method declaration rules...
The first run of the parser just collects each method text and store it. The parser starts the process at the methodList rule.
void visitMethodList(MethodListContext ctx) {
if(ctx.methodDeclaration() != null) {
String methodStr = formatParseTree(ctx.methodDeclaration(), " ");
// store methodStr for later parsing
}
// visit next method list item, if any
if(ctx.methodList() != null) {
visit(ctx.methodList());
}
}
The second run launch the parsing of each method declaration (in a separate thread for example). For this, the parser starts at the methodDeclaration rule.
void visitMethodDeclaration(MethodDeclarationContext ctx) {
// parse the method block
}
The reason why the text of a methodDeclaration rule is formatted if because calling directly ctx.methodDeclaration().getText() would combine the text of all child nodes AntLR doc, possibly making it unusable for parsing again. If white space is a token separator in the grammar, then adding one space between tokens should not change the parse tree.
String formatParseTree(ParseTree tree, String separator) {
StringBuilder builder = new StringBuilder();
for(int i = 0; i < tree.getChildCount(); i ++) {
ParseTree child = tree.getChild(i);
if(child instanceof TerminalNode) {
builder.append(child.getText());
builder.append(separator);
} else if(child instanceof RuleContext) {
builder.append(formatParseTree(child, separator));
}
}
return builder.toString();
}

Will code written in this style be optimized out by RVO in C++11?

I grew up in the days when passing around structures was bad mojo because they are often large, so pointers were always the way to go. Now that C++11 has quite good RVO (right value optimization), I'm wondering if code like the following will be efficient.
As you can see, my class has a bunch of vector structures (not pointers to them). The constructor accepts value structures and stores them away.
My -hope- is that the compiler will use move semantics so that there really is no copying of data going on; the constructor will (when possible) just assume ownership of the values passed in.
Does anyone know if this is true, and happens automagically, or do I need a move constructor with the && syntax and so on?
// ParticleVertex
//
// Class that represents the particle vertices
class ParticleVertex : public Vertex
{
public:
D3DXVECTOR4 _vertexPosition;
D3DXVECTOR2 _vertexTextureCoordinate;
D3DXVECTOR3 _vertexDirection;
D3DXVECTOR3 _vertexColorMultipler;
ParticleVertex(D3DXVECTOR4 vertexPosition,
D3DXVECTOR2 vertexTextureCoordinate,
D3DXVECTOR3 vertexDirection,
D3DXVECTOR3 vertexColorMultipler)
{
_vertexPosition = vertexPosition;
_vertexTextureCoordinate = vertexTextureCoordinate;
_vertexDirection = vertexDirection;
_vertexColorMultipler = vertexColorMultipler;
}
virtual const D3DVERTEXELEMENT9 * GetVertexDeclaration() const
{
return particleVertexDeclarations;
}
};
Yes, indeed you should trust the compiler to optimally "move" the structures:
Want Speed? Pass By Value
Guideline: Don’t copy your function arguments. Instead, pass them by value and let the compiler do the copying
In this case, you'd move the arguments into the constructor call:
ParticleVertex myPV(std::move(pos),
std::move(textureCoordinate),
std::move(direction),
std::move(colorMultipler));
In many contexts, the std::move will be implicit, e.g.
D3DXVECTOR4 getFooPosition() {
D3DXVECTOR4 result;
// bla
return result; // NRVO, std::move only required with MSVC
}
ParticleVertex myPV(getFooPosition(), // implicit rvalue-reference moved
RVO means Return Value Optimization not Right value optimization.
RVO is a optimization performed by the compiler when the return of a function is by value, and its clear that the code returns a temporary object created in the body, so the copy can be avoided. The function returns the created object directly.
What C++11 introduces is Move Semantics. Move semantics allows us to "move" the resource from a certain temporary to a target object.
But, move implies that the object wich the resource comes from, is in a unusable state after the move. This is not the case (I think) you want in your class, because the vertex data is used by the class, even if the user calls to this function or not.
So, use the common return by const reference to avoid copies.
On the other hand,, DirectX provides handles to the resources (Pointers), not the real resource. Pointers are basic types,its copying is cheap, so don't worry about performance. In your case, you are using 2d/3d vectors. Its copying is cheap too.
Personally, I think that returning a pointer to an internal resource is a very bad idea, always. I think that in this case the best aproach is to return by const reference.

std::unique_ptr and pointer-to-pointer

I've been using std::unique_ptr to store some COM resources, and provided a custom deleter function. However, many of the COM functions want pointer-to-pointer. Right now, I'm using the implementation detail of _Myptr, in my compiler. Is it going to break unique_ptr to be accessing this data member directly, or should I store a gajillion temporary pointers to construct unique_ptr rvalues from?
COM objects are reference-countable by their nature, so you shouldn't use anything except reference-counting smart pointers like ATL::CComPtr or _com_ptr_t even if it seems inappropriate for your usecase (I fully understand your concerns, I just think you assign too much weight to them). Both classes are designed to be used in all valid scenarios that arise when COM objects are used, including obtaining the pointer-to-pointer. Yes, that's a bit too much functionality, but if you don't expect any specific negative consequences you can't tolerate you should just use those classes - they are designed exactly for this purpose.
I've had to tackle the same problem not too long ago, and I came up with two different solutions:
The first was a simple wrapper that encapsulated a 'writeable' pointer and could be std::moved into my smart pointer. This is just a little more convenient that using the temp pointers you are mentioning, since you cannot define the type directly at the call-site.
Therefore, I didn't stick with that. So what I did was a Retrieve helper-function that would get the COM function and return my smart-pointer (and do all the temporary pointer stuff internally). Now this trivially works with free-functions that only have a single T** parameter. If you want to use this on something more complex, you can just pass in the call via std::bind and only leave the pointer-to-be-returned free.
I know that this is not directly what you're asking, but I think it's a neat solution to the problem you're having.
As a side note, I'd prefer boost's intrusive_ptr instead of std::unique_ptr, but that's a matter of taste, as always.
Edit: Here's some sample code that's transferred from my version using boost::intrusive_ptr (so it might not work out-of-the box with unique_ptr)
template <class T, class PtrType, class PtrDel>
HRESULT retrieve(T func, std::unique_ptr<PtrType, PtrDel>& ptr)
{
ElementType* raw_ptr=nullptr;
HRESULT result = func(&raw_ptr);
ptr.reset(raw_ptr);
return result;
}
For example, it can be used like this:
std::unique_ptr<IFileDialog, ComDeleter> FileDialog;
/*...*/
using std::bind;
using namespace std::placeholders;
std::unique_ptr<IShellItem, ComDeleter> ShellItem;
HRESULT status = retrieve(bind(&IFileDialog::GetResult, FileDialog, _1), ShellItem);
For bonus points, you can even let retrieve return the unique_ptr instead of taking it by reference. The functor that bind generates should have signature typedefs to derive the pointer type. You can then throw an exception if you get a bad HRESULT.
C++0x smart pointers have a portable way to get at the raw pointer container .get() or release it entirely with .release(). You could also always use &(*ptr) but that is less idiomatic.
If you want to use smart pointers to manage the lifetime of an object, but still need raw pointers to use a library which doesn't support smart pointers (including standard c library) you can use those functions to most conveniently get at the raw pointers.
Remember, you still need to keep the smart pointer around for the duration you want the object to live (so be aware of its lifetime).
Something like:
call_com_function( &my_uniq_ptr.get() ); // will work fine
return &my_localscope_uniq_ptr.get(); // will not
return &my_member_uniq_ptr.get(); // might, if *this will be around for the duration, etc..
Note: this is just a general answer to your question. How to best use COM is a separate issue and sharptooth may very well be correct.
Use a helper function like this.
template< class T >
T*& getPointerRef ( std::unique_ptr<T> & ptr )
{
struct Twin : public std::unique_ptr<T>::_Mybase {};
Twin * twin = (Twin*)( &ptr );
return twin->_Myptr;
}
check the implementation
int wmain ( int argc, wchar_t argv[] )
{
std::unique_ptr<char> charPtr ( new char[25] );
delete getPointerRef(charPtr);
getPointerRef(charPtr) = 0;
return charPtr.get() != 0;
}

Lambdas with captured variables

Consider the following line of code:
private void DoThis() {
int i = 5;
var repo = new ReportsRepository<RptCriteriaHint>();
// This does NOT work
var query1 = repo.Find(x => x.CriteriaTypeID == i).ToList<RptCriteriaHint>();
// This DOES work
var query1 = repo.Find(x => x.CriteriaTypeID == 5).ToList<RptCriteriaHint>();
}
So when I hardwire an actual number into the lambda function, it works fine. When I use a captured variable into the expression it comes back with the following error:
No mapping exists from object type
ReportBuilder.Reporter+<>c__DisplayClass0
to a known managed provider native
type.
Why? How can I fix it?
Technically, the correct way to fix this is for the framework that is accepting the expression tree from your lambda to evaluate the i reference; in other words, it's a LINQ framework limitation for some specific framework. What it is currently trying to do is interpret the i as a member access on some type known to it (the provider) from the database. Because of the way lambda variable capture works, the i local variable is actually a field on a hidden class, the one with the funny name, that the provider doesn't recognize.
So, it's a framework problem.
If you really must get by, you could construct the expression manually, like this:
ParameterExpression x = Expression.Parameter(typeof(RptCriteriaHint), "x");
var query = repo.Find(
Expression.Lambda<Func<RptCriteriaHint,bool>>(
Expression.Equal(
Expression.MakeMemberAccess(
x,
typeof(RptCriteriaHint).GetProperty("CriteriaTypeID")),
Expression.Constant(i)),
x)).ToList();
... but that's just masochism.
Your comment on this entry prompts me to explain further.
Lambdas are convertible into one of two types: a delegate with the correct signature, or an Expression<TDelegate> of the correct signature. LINQ to external databases (as opposed to any kind of in-memory query) works using the second kind of conversion.
The compiler converts lambda expressions into expression trees, roughly speaking, by:
The syntax tree is parsed by the compiler - this happens for all code.
The syntax tree is rewritten after taking into account variable capture. Capturing variables is just like in a normal delegate or lambda - so display classes get created, and captured locals get moved into them (this is the same behaviour as variable capture in C# 2.0 anonymous delegates).
The new syntax tree is converted into a series of calls to the Expression class so that, at runtime, an object tree is created that faithfully represents the parsed text.
LINQ to external data sources is supposed to take this expression tree and interpret it for its semantic content, and interpret symbolic expressions inside the tree as either referring to things specific to its context (e.g. columns in the DB), or immediate values to convert. Usually, System.Reflection is used to look for framework-specific attributes to guide this conversion.
However, it looks like SubSonic is not properly treating symbolic references that it cannot find domain-specific correspondences for; rather than evaluating the symbolic references, it's just punting. Thus, it's a SubSonic problem.