Proto-buf serialization with Obfuscation - serialization

I am looking for some guidance as to what is going on when using proto-buf net with obfuscation (Dotfuscator). One half of the project is a DLL and the other is an EXE elsewhere and using proto-buf NET they exchange data flawlessly. Until I obfuscate the DLL.
At that point P-BN fails without raising an exception, returning variously a 0 length byte array or a foreshortened one depending on what I have fiddled with. The class is fairly simple (VB):
<ProtoContract(Name:="DMailer")> _
Friend Class DMailer
Private _Lic As Cert
Private _Sys As Sys
Private _LList As List(Of LItem)
..
..
End Class
There are 3 props all decorated with ProtoMember to get/set the constituent class objects. Snipped for brevity.
Again, it works GREAT until I obfuscate the DLL. Then, Dotfuscator renames each of these to null, apparently since they are all Friend, and that seems to choke proto-buff. If I exempt the class from renaming (just the class name, not props/members), it seems to work again. It makes sense that P-BN would only be able to act on objects with a proper name, though when asked to serialize a null named object, it seems like an exception might be in order.
On the other hand, much of the charm of PB-N is supposed to be serialization independent of .NET names working from attributes - at least as I understand it. Yet in this case it only seems to work with classes with names. I tried using the Name qualifier or argument as shown above, to no avail - it apparently doesnt do what I thought it might.
So, I am curious if:
a) ...I have basically surmised the problem correctly
b) ...There is some other attribute or flag that might facilitate serializing
a null named object
c) ...if there are any other insights that would help.
If I exempt all 3 or 4 classes from Dotfuscator renaming (LList is not actually implemented yet, leaving DMailer, Cert and Sys), the DLL seems to work again - at least the output is the correct size. I can live with that, though obscured names would be better: Dotfuscator (CE) either exempts them or sets the names to Null - I cant seem to find a way to force them to be renamed.
Rather than exempt 3 or 4 classes from renaming, one alternative I am considering is to simply store the Serializer output for Cert and Sys as byte arrays or Base64 strings in DMailer instead of classes. Then have the receiver Deserialize each object individually. It is kind of nice to be able to unpack just one thing and have your toys right there as if by magic though.
(many)TIA

Interesting. I confess I have never tried this scenario, but if you can walk me through your process (or better: maybe provide a basic repro example with "run this, then this, then this: boom") I'll happily investigate.
Note: the Name on ProtoContract is mainly intended for GetProto() usage; it is not needed by the core serializer, and can be omitted to reduce your exposure. Also, protobuf-net isn't interested in fields unless those fields are decorated with the attributes, so that shouldn't be an issue.
However! there's probably a workaround here that should work now; you can pre-generate a static serialization dll; for example in a separate console exe (just as a tool; I really need to wrap this in a standalone utility!)
So if you create a console exe that references your unobfuscated library and protobuf-net.dll:
var model = RuntimeTypeModel.Create();
model.Add(typeof(DMailer), true); // true means "use the attributes etc"
// and other types needed, etc
model.Compile("MailSerializer", "MailSerializer.dll");
this should write MailSerializer.dll, which you can then reference from your main code (in addition to protobuf-net), and use:
var ser = new MailSerializer(); // our pre-genereated serializer
ser.Serialize(...); // etc
Then include MailSerializer.dll in your obfuscation payload.
(this is all v2 specific, btw)
If this doesn't work, I'll need to investigate the main issue, but I'm not an obfuscation expert so could do with your repro steps.

Since there were a few upticks of interest, here is what looks like will work:
a) No form of reflection will be able to get the list of properties for an obfuscated type.
I tried walking thru all the types to find the ones with ProtoContract on it, I could find them
but the property names are all changed to a,m, b, j, g.
I also tried Me.GetType.GetProperties with the same result.
You could implement a map from the output to indicate that Employee.FirstName is now a0.j, but distributing this defeats the purpose of obfuscation.
b) What does work to a degree is to exempt the class NAME from obfuscation. Since PB-N looks for the ProtoMember attributes to get the data, you CAN obfuscate the Property/Member names, just not the CLASS/type name. If the name is something like FederalReserveLogIn, your class/type has a bullseye on it.
I have had initial success doing the following:
1) Build a simple class to store a Property Token and value. Store everything as string using ConvertFromInvariantString. Taking a tip from PBN, I used an integer for the token:
<ProtoMember(propIndex.Foo)>
Property Foo As String
An enum helps tie everything together later. Store these in a Dictionary(Of T, NameValuePair)
2) add some accessors. these can perform the type conversions for you:
Public Sub Add(ByVal Key As T, ByVal value As Object)
If _col.ContainsKey(Key) Then
_col.Remove(Key)
End If
_col.Add(Key, New TValue(value))
End Sub
Public Function GetTItem(Of TT)(key As T) As TT
If _col.ContainsKey(key) Then
Return CType(_col(key).TValue, TT)
Else
Return Nothing
End If
End Function
T is whatever key type you wish to use. Integer results in the smallest output and still allows the subscribing code to use an Enum. But it could be String.
TT is the original type:
myFoo = props.GetTItem(Of Long)(propsEnum.Foo)
3) Expose the innerlist (dictionary) to PBN and bingo, all done.
Its also very easy to add converters for Point, Rectangle, Font, Size, Color and even bitmap.
HTH

Related

Do I understand not using getters and setters correctly

After reading this piece by Yegor about not using getters and setters, it sounds like something that makes sense to me.
Please note this question is not about whether doing it is better/worst, only if I am implementing it correctly
I was wondering in the following two examples in VBA, if I understand the concept correctly, and if I am applying it correctly.
The standard way would be:
Private userName As String
Public Property Get Name() As String
Name = userName
End Property
Public Property Let Name(rData As String)
userName = rData
End Property
It looks to me his way would be something like this:
Private userName As String
Public Function returnName() As String
returnName = userName
End Function
Public Function giveNewName(newName As String) As String
userName = newName
End Function
From what I understand from the two examples above is that if I wanted to change the format of userName (lets say return it in all-caps), then I can do this with the second method without changing the name of the method that gives the name through - I can just let returnName point to a userNameCaps property. The rest of my code in my program can still stay the same and point to the method userName.
But if I want to do this with the first example, I can make a new property, but then have to change my code everywhere in the program as well to point to the new property... is that correct?
In other words, in the first example the API gets info from a property, and in the second example the API gets info from a method.
Your 2nd snippet is neither idiomatic nor equivalent. That article you link to, is about Java, a language which has no concept whatsoever of object properties - getFoo/setFoo is a mere convention in Java.
In VBA this:
Private userName As String
Public Property Get Name() As String
Name = userName
End Property
Public Property Let Name(rData As String)
userName = rData
End Property
Is ultimately equivalent to this:
Public UserName As String
Not convinced? Add such a public field to a class module, say, Class1. Then add a new class module and add this:
Implements Class1
The compiler will force you to implement a Property Get and a Property Let member, so that the Class1 interface contract can be fulfilled.
So why bother with properties then? Properties are a tool, to help with encapsulation.
Option Explicit
Private Type TSomething
Foo As Long
End Type
Private this As TSomething
Public Property Get Foo() As Long
Foo = this.Foo
End Property
Public Property Let Foo(ByVal value As Long)
If value <= 0 Then Err.Raise 5
this.Foo = value
End Property
Now if you try to assign Foo with a negative value, you'll get a runtime error: the property is encapsulating an internal state that only the class knows and is able to mutate: calling code doesn't see or know about the encapsulated value - all it knows is that Foo is a read/write property. The validation logic in the "setter" ensures the object is in a consistent state at all times.
If you want to break down a property into methods, then you need a Function for the getter, and assignment would be a Sub not a Function. In fact, Rubberduck would tell you that there's a problem with the return value of giveNewName being never assigned: that's a much worse code smell than "OMG you're using properties!".
Functions return a value. Subs/methods do something - in the case of an object/class, that something might imply mutating internal state.
But by avoiding Property Let just because some Java guy said getters & setters are evil, you're just making your VBA API more cluttered than it needs to be - because VBA understands properties, and Java does not. C# and VB.NET do however, so if anything the principles of these languages would be much more readily applicable to VBA than Java's, at least with regards to properties. See Property vs Method.
FWIW public member names in VB would be PascalCase by convention. camelCase public member names are a Java thing. Notice how everything in the standard libraries starts with a Capital first letter?
It seems to me that you've just given the property accessors new names. They are functionally identical.
I think the idea of not using getters/setters implies that you don't try to externally modify an object's state - because if you do, the object is not much more than a user-defined type, a simple collection of data. Objects/Classes should be defined by their behavior. The data they contain should only be there to enable/support that behavior.
That means you don't tell the object how it has to be or what data you want it to hold. You tell it what you want it to do or what is happening to it. The object itself then decides how to modify its state.
To me it seems your example class is a little too simple to work as an example. It's not clear what the intended purpose is: Currently you'd probably better off just using a variable UserName instead.
Have a look at this answer to a related question - I think it provides a good example.
Regarding your edit:
From what I understand from the two examples above is that if I wanted
to change the format of userName (lets say return it in all-caps),
then I can do this with the second method without changing the name of
the method that gives the name through - I can just let returnName
point to a userNameCaps property. The rest of my code in my program
can still stay the same and point to the method iserName.
But if I want to do this with the first example, I can make a new
property, but then have to change my code everywhere in the program as
well to point to the new property... is that correct?
Actually, what you're describing here, is possible in both approaches. You can have a property
Public Property Get Name() As String
' possibly more code here...
Name = UCase(UserName)
End Property
or an equivalent function
Public Function Name() As String
' possibly more code here...
Name = UCase(UserName)
End Function
As long as you only change the property/function body, no external code needs to be adapted. Keep the property's/function's signature (the first line, including the Public statement, its name, its type and the order and type of its parameters) unchanged and you should not need to change anything outside the class to accommodate.
The Java article is making some sort of philosophic design stance that is not limited to Java: The general advise is to severely limit any details on how a class is implemented to avoid making one's code harder to maintain. Putting such advice into VBA terms isn't irrelevant.
Microsoft popularized the idea of a Property that is in fact a method (or two) which masquerade as a field (i.e. any garden-variety variable). It is a neat-and-tidy way to package up a getter and setter together. Beyond that, really, behind the scenes it's still just a set of functions or subroutines that perform as accessors for your class.
Understand that VBA does not do classes, but it does do interfaces. That's what a "Class Module" is: An interface to an (anonymous) class. When you say Dim o As New MyClassModule, VBA calls some factory function which returns an instance of the class that goes with MyClassModule. From that point, o references the interface (which in turn is wired into the instance). As #Mathieu Guindon has demonstrated, Public UserName As String inside a class module really becomes a Property behind the scenes anyway. Why? Because a Class Module is an interface, and an interface is a set of (pointers to) functions and subroutines.
As for the philosophic design stance, the really big idea here is not to make too many promises. If UserName is a String, it must always remain a String. Furthermore, it must always be available - you cannot remove it from future versions of your class! UserName might not be the best example here (afterall, why wouldn't a String cover all needs? for what reason might UserName become superfluous?). But it does happen that what seemed like a good idea at the time the class was being made turns into a big goof. Imagine a Public TwiddlePuff As Integer (or instead getTwiddlePuff() As Integer and setTwiddlePuff(value As Integer)) only to find out (much later on!) that Integer isn't sufficient anymore, maybe it should have been Long. Or maybe a Double. If you try to change TwiddlePuff now, anything compiled back when it was Integer will likely break. So maybe people making new code will be fine, and maybe it's mostly the folks who still need to use some of the old code who are now stuck with a problem.
And what if TwiddlePuff turned out to be a really big design mistake, that it should not have been there in the first place? Well, removing it brings its own set of headaches. If TwiddlePuff was used at all elsewhere, that means some folks may have a big refactoring job on their hands. And that might not be the worst of it - if your code compiles to native binaries especially, that makes for a really big mess, since an interface is about a set of function pointers layed out and ordered in a very specific way.
Too reiterate, do not make too many promises. Think through on what you will share with others. Properties-getters-setters-accessors are okay, but must be used thoughtfully and sparingly. All of that above is important if what you are making is code that you are going to share with others, and others will take it and use it as part of a larger system of code, and it may be that these others intend to share their larger systems of code with yet even more people who will use that in their even larger systems of code.
That right there is probably why hiding implementation details to the greatest extent possible is regarded as fundamental to object oriented programming.

.NET Obfuscation [SmartAssembly]

Just a quick question in relation to SmartAssembly and .NET applications. - I am experimenting with the software at the moment and it seems to be obfuscating code but My.Settings is still visible in plain text?
So previous to obfucating my code (using .NET reflector) I could literally see almost everything. Including the My.Settings class containing lots of info such as passwords, ip's, MySQL connection strings etc..
So I obfuscated the code using RedGate's SmartAssembly and sure enough all the classes/functions etc appeared with random symbols, however several items (again including My.Settings) remained untouched?
SmartAssembly Screenshot
Obfuscated result in .NET reflector
There are limitations to what most obfuscation tools can do, and this is one of them. Settings values are not stored as string literals or in backing fields, but as an attribute value:
Global.System.Configuration.DefaultSettingValueAttribute("bar")> _
Public Property Foo() As String
Get
Return CType(Me("Foo"), String)
End Get
Set(value As String)
Me("Foo") = value
End Set
End Property
VB/VS generates the Property getter/setter, but as you can see it uses an attribute to store the initial value as opposed to:
Private _foo As String = "bar"
In most cases there is no reason to hide the string content used in Attributes because they are usually instructions to the compiler about the class or property:
<Description("Bar String")>
<DefaultValue("bar")>
<DesignerSerializationVisibility(DesignerSerializationVisibility.Visible)>
Property BarString As String
None of these Attribute literals needs to be hidden because most Attributes contains neither runtime data nor sensitive information. As a result, My.Settings is a fringe case and is the result of how it is implemented. Note that this only applies to the default, initial values you enter in the IDE. If you update them at runtime, they are not written back to the Attributes, but saved to a file.
Since you have a trivial number of settings there, just write a small class to manage them yourself and save to a file in Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData)

How do I set up Protobuf for my VB.net application?

So, this may seem very elementary for you guys but I am officially stumped. I am trying to save some data in my application to a file using protobuf (suggested to me by some peers) but I can't seem to find any documentation for it and what I can find always gives me some weird error. I have an array declared as follows:
Private Terrain(,,) As TiledTerrain
The TiledTerrain class looks like this:
Public Class TiledTerrain
Public X As Integer
Public Y As Integer
Public Texture_X As Integer
Public Texture_Y As Integer
End Class
Pretty dog-on simple right? Well, I can't seem to figure out how to save my Terrain array to a file using Protobuf?
The Terrain array is just a simple 3 dimensional array (about 100x100x2). Each cell of the array may or may not actually contain a value (TiledTerrain) and if it doesn't it will contain "Nothing".
Can anybody explain to me in full on how I should go about doing this? I've currently referenced protobuf-net.dll and protobuf-net.Extensions.dll because I don't really know which to use...
Thanks for any help!
-A Moron Among Geniuses :)
first read Getting Started which describes the simplest scenario, using attributes. VB has slightly different syntax for attributes, which you are probably more familiar with than me - but the concept is the same.
There are alternatives, note:
in v2 the model can be configured entirey at runtime if you want, without the need for any attributes
if the type looks like an obvious "tuple" (including, importantly, a constructor that takes a parameter that matches every public member), it will use the constructor order to infer a contract
There is a problem though; protobuf-net does not currently support multi-dimensional arrays. It can of course be added, but as with all features: it doesn't exist until it gets written. The reason this isn't supported directly is that the underlying protobuf specification (by Google) does not support this. It would work if flattened into a vector (1-dimensional zero-based array). If you want help with an example, let me know.

Newbie question: how do I create a class to hold data in Visual Basic Studio?

I'm really sorry. This must seem like an incredibly stupid question, but unless I ask I'll never figure it out. I'm trying to write a program to read in a csv file in Visual Basic (tried and gave up on C#) and I asked a friend of mine who is much better at programming than I am. He said I should create a class to hold the data that I read in.
The problem is, I've never created a class before, not in VB, Java, or anything. I know all the terms associated with classes, I understand at a high level how classes work no problem. But I suck at the actual details of making one.
So here's what I did:
Public Class TsvData
Property fullDataSet() As Array
Get
Return ?????
End Get
Set(ByVal value As Array)
End Set
End Property
End Class
I got as far as the question marks and I'm stuck.
The class is going to hold a lot of data, so I made it an array. That could be a bad move. I don't know. All i know is that it can't be a String and it certainly can't be an Integer or a Float.
As for the Getter and Setter, the reason I put the question marks in is because I want to return the whole array there. The class will eventually have other properties which are basically permutations of the same data, but this is the full set that I will use when I want to save it out or something. Now I want to return the whole Array, but typing "Return fullDataSet()" doesn't seem like a good idea. I mean, the name of the property is "fullDataSet()." It will just make some kind of loop. But there is no other data to return.
Should I Dim yet another array inside the property, which already is an array, and return that instead?
Instead of writing your own class, you could get yourself familiar with the pre-defined class System.Data.DataTable and then use that for holding CSV data.
In the last few years that I've been programming, I've never actually used a multi-dimensional array, and I'd advise you not to use them, either. There's usually ways of achieving the same with a better data structure. For example, consider creating a class (let's call it CsvRecord) that holds only one record; that is, only one line from the CSV file. Then use any of the standard collection types from the System.Collections.Generic namespace (e.g. List(Of CsvRecord)) to hold the entire data (ie. all lines) in the CSV file. This effectively reduces the problem to, "How do I read in one line of CSV data?"
If you want to take suggestion #2 even further, do as cHao says and don't simply lay out the information you've read as a CsvRecord; instead, create an object that reflects the actual content. For example, if your CSV file contains product–price information, call your CSV record class ProductInfo or something more fitting.
If, however, you want to go on with your current approach, you will need a backing field for the property, as demonstrated by Philipp's answer. Your property then becomes a "façade" that only delegates to this backing field. This is not absolutely necessary: You could simply make the backing field Public and let the user of your class access it directly, though that is not considered a good practice.
Ideally, you ought to have a class representing the specific data you want to read in. Setting an entire array at once is asking for trouble; some programs that read {C,T}SV files will freak out if all rows don't have the same number of columns, which is exceedingly easy to do if you can set the data to be an array of arbitrary length.
If you're trying to represent arbitrary data, frankly, you'd do just as well to use a List(Of String). If it's meant to be a table, you could instead read in the first line and make it a list as above (let's call it "headers"), and then make each row a Dictionary(Of String, String). (Let's call each row "row", and the collection (a list of these dictionary objects) "rows".) Just read in the line, split it like you did the first, and say something like row(headers(column number)) = value for each column, and then stuff it into 'rows'.
Or, you could use the data classes (System.Data.DataTable and System.Data.DataSet would do wonders here).
Usually you use a private member to store the actual data:
Public Class TsvData
Private _fullDataSet As String()
Public Property FullDataSet() As String()
Get
Return _fullDataSet
End Get
Set(ByVal value As String())
_fullDataSet = value
End Set
End Property
Note that this is an instance of bad design since it couples a concept to a concrete representation and allows the clients of the class to modify the internals without any error checking. Returning a ReadOnlyCollection or some dedicated container would be better.

Converting Value Type of a Dictionary in VB.net

Public Class A
Public Class B : Inherits A
Dim DictA As Dictionary(Of Integer, A)
Dim DictB As New Dictionary(Of Integer, B)
DictA = DictB
This doesn't work, as the type can't be converted. Is this somehow possible?
No. You're running into the problem of generic variance, which isn't supported in .NET except in a few very particular ways.
Here's the reason: after your last line, you've got a single dictionary with two "views" onto it, effectively. Now imagine you wrote:
DictA.Add(1, New A())
(Apologies if my VB is slightly off.) That's inserted a "non-B" into the dictionary, but DictB thinks that all the values will be instances of B.
One of the main purposes of generics is to catch potential type failures at compile time rather than execution time - which means this code should fail to compile, which indeed it does precisely because of the lack of variance.
It's a bit counter-intuitive when you're used to normal inheritance, but it does make sense. A bunch of bananas isn't just a collection of fruit - it's a collection of bananas.
Java takes a somewhat different stance on this using wildcarding from the caller's side - it would let you see just the "adding" side of the dictionary for DictB, for instance, as anything you add to DictB would be fine in DictA. Java's generics are very different to .NET generics though...
This is a interesting, and quite complex aspect of Static type systems.
The terms you are looking for is Covariance and Contravariance.
The c# language will get this for certain sorts of operations on Generic types in 4.0 though in a somewhat limited form.
You have it to a certain degree already in the form of
object[] x = new string[1];
Which is allowed because arrays are treated covariantly, which means that you could then do the following:
x[0] = new object();
which would throw an Exception at runtime. Do you see how the same would apply to you Dictionaries...
The c# 4.0 spec is still non final so is subject to change but for some discussions and explanations see this question
Unfortunately not. This is a known absence of covariance/contravariance support. The CLR supports it, but the compilers for C# and VB.Net do not.
The next version of C# (4.0) actually does support it now, but not sure about vb.net.
Also, with System.Linq, one of the extension methods is Cast() which would allow you to cast from a collection of one type to another. However it will still return IEnumerable so you need to manipulate a little bit, but I think key-value types will be harder than collections with only one generic type.