Debugging an obfuscated .NET core application with DotPeek

Debugging an obfuscated .NET core application with DotPeek - asp.net-core

I am hunting for a possible logic bomb in the code deployed to production by our vendor software factory.
For sake of curiosity of the readers, here is a brief recap. The application stopped working with an infinite wait at some point. Decompiling the obfuscated code, I found an odd Thread.sleep that should never be in an MVC API, where the amount is computed by difference of the current ticks to a value computed somehow. I.e.
private long SomeFunction(long param) {
if (param > 0)
Thread.Sleep(param);
return param;
}
private long GetSomeLongValue() {
//Simplified. There is a lot of long to string and back
return SomeFunction(Manipulate(DateTime.Now.Ticks - GetMysteryNumber()));
}
private long Manipulate(long param){
if (param < 0)
return param;
else
# Compute a random number of days between 0 and param / 86400000,
# and return its milliseconds value, always positive
}
And by running experiments with system clock, I found that there is a magic DateTime.Now value when the application works (before) and stops (right after one second). The experiment was consistent and repeatable.
Back to the question
I have done all this work using JetBrains DotPeek. This was done by looking at the code: human static analysis.
The problem is that what I have called SomeMysteryFunction is too well obfuscated that I really can't get any clue about what it does. I have the full code but I would like to take another approach.
I'd like to exercise that function and try to see if it returns consistent values that may be equal to the guilty timestamp. The function depends on the result of GetCallingAssembly method, so that will be a pain in the back.
I thought about running some sort of Program.cs or unit test that exploits the obfuscated function by reflection, but I'd like to debug using DotPeek. Why?
Disassembly can be a mess
I tried Telerik, but I had a lot more success with DotPeek decompiling async methods not in their StateMachine representation
I have never done this in my work experience. I just need to be sure about this being intentional or not.
How do I set up a test bed environment so that I can debug into a linked DLL decompiled by DotPeek?

This post In the Jungle of .NET Decompilers explains all .NET Decompilers that are worth to use.
Definitely the free and OSS tool dnSpy is the one you want to use for that sort of hacking[I'd like to exercise that function] scenarios.

Related

Scripting Language with "edit and continue" or "hot swap" support? ( Maybe possible in Lua?)

I am making my existing .Net Application Scriptable for non programming users. I added lua, it works like a charm. Then I added debug functionality(pause/continue/step) via debug.sethook. It works also like a charm.
Now I realize that my Application needs edit and continue feature like Visual Studio has it. You pause the execution can edit the code and then continue from your current state with changes applied. This feature is very important to me. I thought this would be easy to do for scripting languages.
Everywhere I read that scripting languages can do this. But even after spending hours of searching I haven't found an Lua implementation yet. It hasn't to be Lua but hot swapping code in Lua would be my first choice.
How can the ability for the user be offered to pause and edit the script and than continue the execution with changes applied?
NOTE: It doesn't have to be Lua every scripting language would be okay
Update
#Schollii
Here is an example:
function doOnX()
if getValue() == "200" then
value = getCalculation()
doSomething() -- many function calls, each can take about 2s
doSomething()
doSomething()
print(value)
doX(value)
end
end
doOnX()
Thank you for your suggestions. This is how it might work:
I will use https://github.com/frabert/NetLua Its a very cool, well written 100% C# Lua Interpreter. It is generating an AST tree first and then directly executing it.
The parser needs to be modified. In Parser.cs public Ast.Block ParseString(string Chunk) there is a parseTree generated first. parseTree.tokens[i].locations are containing the exact position of each token. The Irony.Parsing.ParseTree is then parsed again and is converted to a NetLua.Ast.Block but the location information is missed. I will need to change that so later I will know which Statement is in which line.
Now each Statement from the AST tree is directly executed via EvalBlock. Debug functionality (like I have in my C Binding lua Interpreter DynamicLua via debug.setHook) needs to be added. This can be done in LuaInterpreter.cs internal static LuaArguments EvalBlock(`. Pause/continue/step functions should be no problem. I also can now add current line Highlighting because each statement contains position line information.
When the execution is paused and the code was edited the current LuaContxct is saved. It contains all variables. Also the last Statement with the last execution line is saved.
Now the code String is parsed again to a new AST tree. It gets executed. But all statements are skipped until the saved statement with the line statement is reached. The saved LuaContext is restored and execution can continue with all changes applied.
New variables could be also added after the last executed line, because a new NetLua.Ast.Assignment Statement could just add a new variable to the current LuaContext and everything should just work fine.
Will this work?

I think this is quite challenging and triicky to do right.
Probably the only way you could do that is if you recompile the chunk of code completely. In a function this would mean the whole function regardless of where edit is in function. Then call the function again. Clearly the function must be re-entrant else its side effects (like having incremented a global or upvalue) would have to be undone which isn't possible. If it is not reentrant it will still work just not give expected results (for example if the function increments a global variable by 1 calling it again will result in the global variable having been increased by 2 once the function finally returns).
But finding the lines in the script where the chunknstarts and ends would be tricky if truly generic solution. For specific solution you would have to post specific examples of scripts you want to run and examples of lines you would want to edit. If the whole user script gets recompiled and rerun then this is not a problem, but the side effects is still an issue, examples could help there too.

how to sync the Culture of microsoft.visualbasic.compatibility.vb6.format with the application.currentCulture?

Context: a program written in VB.NET, developed/maintained in VisualStudio2012, targeting framework v3.5.
A few years ago, the program was in VB(6) and we "translated" it to VB.NET. As a result of the transformation, which was mostly automated, we still have quite a few places in the code where formatting of doubles (and dates/...) for textual presentation is processed as in:
Dim sValue As String = Microsoft.VisualBasic.Compatibility.VB6.Format(dblValue, "0.00")
Conversely, when we need to extract a Double value from such a string, we use
Dim dblValue As Double = CDbl(sValue)
CDbl "listens to" the System.Globalization.CultureInfo.CurrentCulture of the applications Thread, and this does NOT change when - during the run of the code - you change the Regional Settings through the Control Panel.
However, the VB6.Format as executed in the code starts out conforming to the currentCulture of the application (as you might expect), BUT apparently (I didn't know this, but accidentally found out) listens to CHANGES in the Regional Settings and responds immediately to any changes you make there during the program execution. This implies that the CDbl() and VB6.Format() become mutually inconsistent.
Of course, changing the Regional Settings during program execution is awkward, and moreover, if you wish to support it, you can manage it by catching the SystemEvents.UserPreferenceChanged (and -Changing) events and act upon their occurrences.
However, the "different behaviour" of VB6.FORMAT versus "normal" casts as CDbl(someString) regarding changes in the Culture/Regional Settings, strikes me as undesirable. Preferably you would have VB6.Format to comply ALWAYS with the application/thread-CurrentCulture, and you may THEN choose how you want your code to respond to userpreference changes. Furthermore, I'd like to gain some more insight in the issue.
My question, therefore, is:
Is there a way to compile/arrange/... things such that the (Microsoft.VisualBasic.Compatibility.)VB6.Format listens to the application-CurrentCulture and NOT respond - without "our consent" - to changes in Regional Settings?
Additional information:
The program is compiled with - for the visualbasic stuff - a reference in the project (VisualStudio2012) to:
C:\Windows\Microsoft.Net\Framework\V2.0.50727\Microsoft.VisualBasic.Compatibility.dll (and ...Data.dll).
Any "educational" information or suggestion is welcome. The issue is not causing any real problems in our program, but I feel that we should/might have a better understanding and maybe even methods to make things more robust.

The VB6 Format() function is actually an operating system function under the hood. VarFormat(), a function exported by oleaut32.dll. Backgrounder answer is here. The MSDN library article is here.
As you can tell from the MSDN article, the function doesn't permit specifying a culture or culture specific settings, other than the day-of-week rules. This function dates from 1996, life was much simpler back then. So what you see is now easy to explain, it cannot know anything about the .NET Thread.CurrentCulture setting.

Unreleased DirectShow CSource filter makes program crash at process shutdown

I'm developing a DirectShow CSource capture filter. It works fine, but when I close the program that is using the filter (in this case I'm testing with VLC, but the same happens with other programs), the program crashes (if I'm debugging it in Visual Studio then a breakpoint is triggered).
I've been hunting down this problem for some time now and found that both, my source filter and my source stream are not being released; this is, their reference counter is 1 at the end of the program, DllCanUnloadNow() function reports that there are 2 objects still in use, and, when CoUninitialize() is invoked, the program crashes.
I'm pretty sure that the reference counters are being handled correctly since I'm using the base classes implementation. The only unusual thing in my software that I can think of is that I'm using my own version of DllGetClassObject(): I configured the .DEF file to have MyDllGetClassObject() exported instead of DllGetClassObject() so I could insert some code before invoking the default implementation. I don't think this is a problem since I've checked that the reference counter of all objects I return at the end of MyDllGetClassObject() is 1.
I guess I'm missing something about the lifecycle of the filter, but can't figure out what (this is the very first capture filter I'm developing). Any suggestion?
Thank you in advance,
Guillermo

I finally figured out what was going on. The static method InitializeInstance in my source filter is invoked with bLoading == false and rclsid == <the GUID of my filter> at process shutdown. That seems to be the appropriate place to release that remaining reference counter from the filter instance.
I got the key idea of how important is to release all COM objects before CoUninitialize some time ago from another post on StackOverflow entitled DirectShow code crashes after exit (PushSourceDesktop sample). All I needed was just a little bit more knowledge on DirectShow filters lifecycle.
Anyway, thank you for your efforts, Roman, I know how vague this thread sounded from the beginning :)

Fix bugs in library code, or abandom them?

Assume i have a function in a code library with a bug in it i've found a bug in a code library:
class Physics
{
public static Float CalculateDistance(float initialDistance, float initialSpeed, float acceleration, float time)
{
//d = d0 + v0t + 1/2*at^2
return initialDistance + (initialSpeed*time)+ (acceleration*Power(time, 2));
}
}
Note: The example, and the language, are hypothetical
i cannot guarantee that fixing this code will not break someone.
It is conceivable that there are people who depend on the bug in this code, and that fixing it might cause them to experience an error (i cannot think of a practical way that could happen; perhaps it has something to do with them building a lookup table of distances, or maybe they simply throw an exception if the distance is the wrong value not what they expect)
Should i create a 2nd function:
class Physics
{
public static Float CalculateDistance2(float initialDistance, float initialSpeed, float acceleration, float time) { ... }
//Deprecated - do not use. Use CalculateDistance2
public static Float CalculateDistance(float initialDistance, float initialSpeed, float acceleration, float time) { ... }
}
In a language without a way to formally deprecate code, do i just trust everyone to switch over to CalculateDistance2?
It's also sucky, because now the ideally named function (CalculateDistance) is forever lost to a legacy function that probably nobody needs, and don't want to be using.
Should i fix bugs, or abandon them?
See also
How to work in untestable legacy code- in bug fixing
Should we fix that bug?
Should this bug be fixed?
Working Effectively With Legacy Code

You'll never succeed at catering to every existing project using your library. Attempting to do so may create a welcomed sense of predictability, but it will also lead to it being bloated and stagnant. Eventually, this will leave it prone to replacement by a much more concise library.
Like any other project, it should be expected to go through iterations of change and rerelease. As most of your current user base are programmers that should be familiar with that, change really shouldn't come as a surprise to them. As long as you identify releases with versioning and document the changes made between, they should know what to expect when they go to update, even if it means they decide to stay with the version they already have.
Also, as a possible new user, finding your library to have an ever-growing number of lines of legacy code due to the blatant unwillingness to fix known bugs tells me that the maintainability and sustainability of the project are both potentially poor.
So, I would honestly just say to fix it.

Good question. I'm looking forward to some other answers. Here's my 2 cents on the issue:
Generally, if you suspect that many people indeed rely on the bug, then that's an argument for not fixing the bug and instead creating a new function CalculateDistance2.
On the other hand, and I think that's the better option, don't forget that people who rely on the bug can always continue using a specific, older version of your library. You can still document the removal of the bug (and therefore the modified behaviour or your library function) in the release notes.
(If your class happens to be a COM component, the conventional wisdom would be to create a new interface ICalculateDistance2 that makes the original interface obsolete, but preserves it for backwards compatibility.)

Another option is to fix the bug, but leave the old code around as a LegacyCalculateDistance method that's available if anybody really needs it.
You could even implement the ability to select the "legacy" implementation based on (for example) a configuration file or an environment variable setting, if you're concerned about offering a compatibility solution to users who may not be able to make code-level changes.

I once fought for a number of days with some MFC code that was behaving in an entirely unexpected way. When I finally figured out it was an error in the Microsoft supplied library, I checked the knowledge base. It was documented as (approximately) "this is a known bug we found 2 OS versions ago. We aren't fixing it because someone is probably depending on it".
I was a little mad...
I'd say that you should deprecate it. If you're upgrading the library your code depends on, you should test the code with the new library. If it's legacy code, then there's a known configuration it works on. Advise your users and move forward...

As you aready described, this is a tradeoff between satisfying two different user groups:
Existing users who have build their software based on a bug in your library
New users who will be using your library in the future
There is no ideal solution, nor do I think that there is a universal answer. I think this depends entirely on the bug and function in question.
I think you have to ask yourself
"Does the function make any sense with the currently existing bug?"
If it does, then leave I'd it in the library. Otherwise, I'd probably toss it out.

Just for the sake of argument, let's assume your hypothetical language was C++ (even though it looks a lot more like Java). In that case, I'd use namespaces to create a fork that was (reasonably) easy to deal with from a viewpoint of both new and legacy code:
namespace legacy {
class physics {
Float CalculateDistance(float initialDistance, float initialSpeed, float acceleration, float time)
{
// original code here
}
}
}
namespace current {
class physics {
Float CalculateDistance(float initialDistance, float initialSpeed, float acceleration, float time)
{
// corrected code here
}
}
From there you have a couple of options for choosing between the two. For example, existing code could add a using directive: using legacy::physics;, and they'd continue to use the existing code without any further modification. New code could add a using current::physics; instead, to get the current code. When you did this, you'd (probably) deprecate the legacy::physics class, and schedule it for removal after some given period of time, number of revisions, or whatever. This gives your customers a chance to check their code and switch over to the new code in an orderly fashion, while keeping the legacy namespace from getting too polluted with old junk.
If you really want to get elaborate with this, you can even add a version numbering scheme to your namespaces, so instead of just legacy::physics, it might be v2_7::physics. This allows for the possibility that even when you "fix" code, it's remotely possible that there might still be a bug or two left, so you might end up revising it again, and somebody might end up depending on some arbitrary version of it, not necessarily just the original or the current one.
At the same time, this restricts "awareness" of the version to be used to one (fairly) small part of the code (or at least a small part of each module) instead of it being spread throughout all the code. It also gives a fairly painless way for somebody to compile a module using the new code, check for errors, switch back to the old code if needed, etc., without having to deal directly with every individual invocation of the function in question.

Is use of Mid(), Instr(), LBound(), UBound() etc. in VB.Net not recommended?

I come from a C# background but am now working mostly with VB.Net. It seems to me that the above functions (and others - eg. UCase, LCase) etc. are carryovers from VB6 and before. Is the use of these functions frowned upon in VB.Net, or does it purely come down to personal preference?
My personal preference is to stay well away from them, but I'm wondering if that is just my C# prejudice.
I've come across a couple of issues - particularly with code converted from VB6 to VB.Net, where the 0 indexing of collections has meant that bugs have been introduced into code, and am therefore wary of them.

The reason that those functions are there in the first place is of course that they are part of the VB language, inherited from VB 6.
However, they are not just wrappers for methods in the framework, some of them have some additional logic that makes them different in some ways. The Mid function for example allows that you specify a range that is outside the string, and it will silently reduce the range and return the part of the string that remains. The String.Substring method instead throws an exception if you specify a range outside the string.
So, the functions are not just wrappers, they represent a different approach to programming that is more in line with Visual Basic, where you can throw just about anything at a function and almost always get something out. In some ways that is easier, as you don't have to think about all the special cases, but on the other hand you might want to get an exception instead of getting a result when you feed something unreasonable to a function. When debugging, it's often easier if you get the exception as early as possible instead of trying to trace back where a faulty value comes from.

Those options are for backward compatibility.
But, it will be better for people to use framework classes/methods to ensure consistency.
Having said that, VB6 functions are easy to understand. So, it should not be an issue for someone who has the VB background.
EDIT: Also, some of the overloads available with framework classes, might not be available with an equivalent of a simple VB6 like statement. I cannot remember of any, as of now - But this is what I think, could be a better reason to use framework classes/methods.

There will be special cases, but, Hands down, use the VB6 versions, unless you care about the difference between a string being "" and Nothing.
I was working on a big project where different programmers using both ways, the code where people used MyString.SubString(1) was blowing up while Mid(MyString,2) was working.
The two main errors for this example: (Which apply in various ways to others as well)
(1) String can be nothing and you have to check before running a method on it. Limitation of the OO notation: You can't call a member method if the object is nothing, even if you want 'nothing' or (empty object) back. Even if this were solved by using nullable/stub objects for strings (which you kind of can using "" or string.empty), you'd still have to ensure they're initialized properly - or, as in our case - convert Nothing to "" when receiving strings from library calls beyond our control.
You are going to have strings that are Nothing. 90% of the time you'll want it to mean "". with .SubString, you always have to check for nothing. With the VB versions, only the 10% about which you'll care.
(2) Specifically with the Mid example, again, 90% of the time if you want chars 3-10 of a 2 char string, you'll want to see "" returned, not have it throw an exception! In fact, you'll rarely want an exception: you'll have to check first for the proper length and code how it should behave (there is usually a defined behaviour, at the very least, a data entry error, for which you don't want to throw an exception).
So you're checking 100% of the time with the .Net versions and rarely with the VB versions.
.Net wanted to keep everything into the object-oriented philosophy. But strings are a little different than most objects used in subtle ways. MS-Basic wasn't thinking about this when they made the functions, it just got lucky - one of the strengths of functions is that they can handle null objects.
For our project, one may ask how Nothing strings got into our flow in the first place. But in the end, the decision of some programmers to use the .Net functions meant avoidable service calls, emergency bug fixes, and patches. Save yourself the trouble.

I would avoid them. Since you've mentioned them it sounds as though you've inherited some VB6 code that was possibly converted to a VB.NET project. Otherwise, if it was a new VB.NET project, I see no value in using the VB6 methods.
I've been on a few VB6 to VB.NET conversion projects. While I am familiar with the names and the difference in 0 based indexing, any code I came across got refactored to use their .NET equivalents. This was partially for consistency and to get the VB6 programmers on that project familiar with the framework. However, the other benefit I've found in refactoring is the ability to chain method calls together.
Consider the following:
Dim input As String = "hello world"
Dim result As String = input.ToUpper() ' .NET
Dim result As String = UCase(input) ' VB6
Next thing you know, I need to do more work to satisfy other requirements. Let's say I need to take the substring and get "hello," which results in the code getting updated to:
Dim result As String = input.ToUpper().Substring(0, 5) ' .NET
Dim result As String = Mid(UCase(input), 1, 5) ' VB6
Which is clearer and easier to modify? In .NET I just chain it. In VB6 I had to start at the beginning of the method, then go to the end of it and add the closing parenthesis. If it changes again or I need to remove it, in .NET I just chop off the end, but in VB6 I need to backtrack to the start and end.
I think there's value in using the .NET methods since other .NET developers that join the project later, especially those from a C# background, can easily pick it up.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas