How to get differing value type out of an concrete implementation if only Interface / abstract class is known? - vb.net

what I am using:
VB.NET, NET 3.5, OpenXML SDK 2.0
what I want to do:
I am creating an xlsx reader / writer for my application (based on OpenXML SDK 2.0). I want to read xlsx files and store the data contained in each row in a DTO/PONO. Further I want to read the xlsx file and then modify it and save it.
my thoughts:
Now my problem is not with the OpenXML SDK, I can do what I need to do.
My problem is on how to structure my components. Specifically I have problems with the polymorphism at the lowest level of a Spreadsheet, the cell.
A cell in Excel/OpenXML can have different types of data associated with it. Like a Time, Date, Number, Text or Formula. These different type need to be handled differently when read/written from/to a spreadsheet.
I decided to have a common interface for all subtypes like TextCell, NumberCell, DateCell etc.
Now when I read the cell from the spreadsheet the Method/Factory can decide which type of cell to create.
Now because the cell is an abstract from the real implementation it does not know / does not need to know of what type it is. For writing / modifying the cell I solve this problem by calling .write(ICellWriter) on the cell I want to persist. As the cell itself knows what type of data it contains, it knows which method of ICellWriter it needs to call (static polymorpism).
My problem:
Writing to the xlsx file is no problem. My problem is, how do I get the data out of my cell into my DTO/PONO without resorting to type checking -> If TypeOf variable is ClassX then doesomething End If. As Methods / Properties have to have different Signatures and differentiating by only using a different return type is not allowed.
Edit:
The holder (collection, in this case a row of a table/spreadsheet) of the objects (refering to the cells) does not know the concrete implementations. So for writing a cell I pass it a Cellwriter. This Cellwriter has overloaded methods like Write(num as Integer), Write(text as String), Write(datum as Date). The cell object that gets this passed to it then calls the Write() method with the data type it holds. This works, as no return value is passed back.
But what do I do when I need the concrete data type returned?
Any ideas, advice, insight?
Thanks
Edit:
Glossary:
DTO: Data Transfere Object
PONO: Plain Old .Net Object
xlsx: referring to file ending of excel workbook files
Edit:
The Cell "subtypes" implement a common interface and do not inherit from a common superclass.
Edit:
After some thinking about the problem I came to realize that it’s not possible without reflection or knowledge of what type of cell I am expecting. Basically I was trying to recreate a spreadsheet or something with similar functionality and way too abstract/configurable for my needs. Thanks for your time & effort put in to writing the answer. I accepted the answer that was closest to what I realized.

I don't think you can.
If I'm understanding correctly, you have a different types of cells (StringCell, IntCell) and each of those concrete classes returns an object of type 'Object'. When you are using the base class 'Cell' and getting it's value - it's of type Object.
To work with it as a String, or Integer, Or Date, etc...etc... I think you need to inspect the type of that object, one way or another. You can use TypeOf like you demonstrated; I've also seen things like '.GetValueAsString()/.GetValueAsInteger()' on the base class. But you still need knowledge enough to say 'Dim myInt as Integer = myCell.GetValueAsInteger()'
Generally speaking, at least if you subscribe to the SOLID principals, you shouldn't care.
It states that, in a computer program if S is a subtype of T, then objects of type T may be replaced with objects of type S (i.e., objects of type S may be substitutes for objects of type T), without altering any of the desirable properties of that program (correctness, task performed, etc.)
http://en.wikipedia.org/wiki/Liskov_substitution_principle
If you have subtypes of cells, but you can't use them interchangeably, it's a good candidate for not using inheritance.
I don't know what you intending to do with the values in the cells that would require you to have the concrete class instead of using the base; but it might be possible to expose that functionality in the base itself. IE - if you need to add two cells, you can accomplish that treating them as generic cells (perhaps. At least provided they are of compatible types) without knowing what subtype they are. You should be able to return the base class in your DTO, regardless.
At least, I that's my understanding. I'd certainly wait for more people to chime in before listening to me.

Related

What are the pros and cons of typing interface method parameters generic to make the interface more accessible?

I have an application that creates xml strings from abap structures using many different simple transformations and consequently many different abap structures.
I want to use a more flexible OO approach with an Interface to do this but due to the different structures the method signatures importing parameter is always different.
What are the pros and cons of typing the importing parameter generic instead so I can implement one method from one interface in each class handling the different transformations?
INTERFACE if_transformer.
METHODS transform_xml
IMPORTING isource_structure TYPE REF TO data
RETURNING VALUE(rxml_string) TYPE string.
ENDINTERFACE.
...
CLASS material_transformer definition.
PUBLIC SECTION.
INTERFACES if_transformer.
ENDCLASS.
CLASS material_transformer IMPLEMENTATION.
METHOD if_transformer~transform_xml.
FIELD-SYMBOLS <structure> TYPE concrete_structure.
ASSIGN isource_structure->* TO <structure>.
...
ENDMETHOD.
ENDCLASS.
The more specific your types, the earlier you notice errors, the easier your interfaces can be understood, and the less overhead you have in validating and casting/converting to the concrete types you want to handle.
For example, consider you type the method add_number as methods add_number importing n type i. The compiler will then reject the wrong statement add_number( 'xyz' ) and this class of errors will never make it into executable code. In turn, anybody reading that method's declaration can easily see that the method accepts only integer numbers, not floats, not packeds, and definitely not strings that contain numbers. Within the method, you can probably directly take the input n and do something with it, such as result = sum_so_far + n, without having to first validate the input or convert it to something else.
In contrast, consider you type the same method add_number as methods add_number importing n type ref to data. The compiler will gladly accept add_number( ref #( 'xyz' ) ), although it is complete nonsense; this class of error will thus only be detected at runtime, with type conversion exceptions that the code around this will have to react to in a meaningful way. People reading the method's declaration have to consult its docu, unit tests, and/or code to find out what kind of input it accepts; there is no way to guess it from the specification alone. Finally, within the method, you will first have to validate and convert the input before you can process it, such as is_integer( n ), cast, assign, and the like; in case the input is unacceptable, you need to find suitable error handling mechanisms, such as throwing nice exceptions.
With softly typed languages like JavaScript, using generic types is the default. However, history shows that people often prefer stronger typing, at least on the server side, leading to follow-up evolutions such as TypeScript or Deno. With strongly-typed languages like ABAP, the rule of thumb is to choose a data type as precise as possible.
Note that there are several levels of relaxation in generic types. For example, you should consider resorting to "partially" generic types like simple, which accepts ABAP structures, or standard table, which accepts tables, before resorting to the maximum-generic type data.
Excellent answer by Florian. I might add that an approach I've used in similar scenarios is to use a factory class that inspects the input data and instantiates the appropriate class to deal with it.

How do I set up Protobuf for my VB.net application?

So, this may seem very elementary for you guys but I am officially stumped. I am trying to save some data in my application to a file using protobuf (suggested to me by some peers) but I can't seem to find any documentation for it and what I can find always gives me some weird error. I have an array declared as follows:
Private Terrain(,,) As TiledTerrain
The TiledTerrain class looks like this:
Public Class TiledTerrain
Public X As Integer
Public Y As Integer
Public Texture_X As Integer
Public Texture_Y As Integer
End Class
Pretty dog-on simple right? Well, I can't seem to figure out how to save my Terrain array to a file using Protobuf?
The Terrain array is just a simple 3 dimensional array (about 100x100x2). Each cell of the array may or may not actually contain a value (TiledTerrain) and if it doesn't it will contain "Nothing".
Can anybody explain to me in full on how I should go about doing this? I've currently referenced protobuf-net.dll and protobuf-net.Extensions.dll because I don't really know which to use...
Thanks for any help!
-A Moron Among Geniuses :)
first read Getting Started which describes the simplest scenario, using attributes. VB has slightly different syntax for attributes, which you are probably more familiar with than me - but the concept is the same.
There are alternatives, note:
in v2 the model can be configured entirey at runtime if you want, without the need for any attributes
if the type looks like an obvious "tuple" (including, importantly, a constructor that takes a parameter that matches every public member), it will use the constructor order to infer a contract
There is a problem though; protobuf-net does not currently support multi-dimensional arrays. It can of course be added, but as with all features: it doesn't exist until it gets written. The reason this isn't supported directly is that the underlying protobuf specification (by Google) does not support this. It would work if flattened into a vector (1-dimensional zero-based array). If you want help with an example, let me know.

Overextending object design by adding many trivial fields?

I have to add a bunch of trivial or seldom used attributes to an object in my business model.
So, imagine class Foo which has a bunch of standard information such as Price, Color, Weight, Length. Now, I need to add a bunch of attributes to Foo that are rarely deviating from the norm and rarely used (in the scope of the entire domain). So, Foo.DisplayWhenConditionIsX is true for 95% of instances; likewise, Foo.ShowPriceWhenConditionIsY is almost always true, and Foo.PriceWhenViewedByZ has the same value as Foo.Price most of the time.
It just smells wrong to me to add a dozen fields like this to both my class and database table. However, I don't know that wrapping these new fields into their own FooDisplayAttributes class makes sense. That feels like adding complexity to my DAL and BLL for little gain other than a smaller object. Any recommendations?
Try setting up a separate storage class/struct for the rarely used fields and hold it as a single field, say "rarelyUsedFields" (for example, it will be a pointer in C++ and a reference in Java - you don't mention your language.)
Have setters/getters for these fields on your class. Setters will check if the value is not the same as default and lazily initialize rarelyUsedFields, then set the respective field value (say, rarelyUsedFields.DisplayWhenConditionIsX = false). Getters they will read the rarelyUsedFields value and return default values (true for DisplayWhenConditionIsX and so on) if it is NULL, otherwise return rarelyUsedFields.DisplayWhenConditionIsX.
This approach is used quite often, see WebKit's Node.h as an example (and its focused() method.)
Abstraction makes your question a bit hard to understand, but I would suggest using custom getters such as Foo.getPrice() and Foo.getSpecialPrice().
The first one would simply return the attribute, while the second would perform operations on it first.
This is only possible if there is a way to calculate the "seldom used version" from the original attribute value, but in most common cases this would be possible, providing you can access data from another object storing parameters, such as FooShop.getCurrentDiscount().
The problem I see is more about the Foo object having side effects.
In your example, I see two features : display and price.
I would build one or many Displayer (who knows how to display) and make the price a component object, with a list of internal price modificators.
Note all this is relevant only if your Foo objects are called by numerous clients.

Proto-buf serialization with Obfuscation

I am looking for some guidance as to what is going on when using proto-buf net with obfuscation (Dotfuscator). One half of the project is a DLL and the other is an EXE elsewhere and using proto-buf NET they exchange data flawlessly. Until I obfuscate the DLL.
At that point P-BN fails without raising an exception, returning variously a 0 length byte array or a foreshortened one depending on what I have fiddled with. The class is fairly simple (VB):
<ProtoContract(Name:="DMailer")> _
Friend Class DMailer
Private _Lic As Cert
Private _Sys As Sys
Private _LList As List(Of LItem)
..
..
End Class
There are 3 props all decorated with ProtoMember to get/set the constituent class objects. Snipped for brevity.
Again, it works GREAT until I obfuscate the DLL. Then, Dotfuscator renames each of these to null, apparently since they are all Friend, and that seems to choke proto-buff. If I exempt the class from renaming (just the class name, not props/members), it seems to work again. It makes sense that P-BN would only be able to act on objects with a proper name, though when asked to serialize a null named object, it seems like an exception might be in order.
On the other hand, much of the charm of PB-N is supposed to be serialization independent of .NET names working from attributes - at least as I understand it. Yet in this case it only seems to work with classes with names. I tried using the Name qualifier or argument as shown above, to no avail - it apparently doesnt do what I thought it might.
So, I am curious if:
a) ...I have basically surmised the problem correctly
b) ...There is some other attribute or flag that might facilitate serializing
a null named object
c) ...if there are any other insights that would help.
If I exempt all 3 or 4 classes from Dotfuscator renaming (LList is not actually implemented yet, leaving DMailer, Cert and Sys), the DLL seems to work again - at least the output is the correct size. I can live with that, though obscured names would be better: Dotfuscator (CE) either exempts them or sets the names to Null - I cant seem to find a way to force them to be renamed.
Rather than exempt 3 or 4 classes from renaming, one alternative I am considering is to simply store the Serializer output for Cert and Sys as byte arrays or Base64 strings in DMailer instead of classes. Then have the receiver Deserialize each object individually. It is kind of nice to be able to unpack just one thing and have your toys right there as if by magic though.
(many)TIA
Interesting. I confess I have never tried this scenario, but if you can walk me through your process (or better: maybe provide a basic repro example with "run this, then this, then this: boom") I'll happily investigate.
Note: the Name on ProtoContract is mainly intended for GetProto() usage; it is not needed by the core serializer, and can be omitted to reduce your exposure. Also, protobuf-net isn't interested in fields unless those fields are decorated with the attributes, so that shouldn't be an issue.
However! there's probably a workaround here that should work now; you can pre-generate a static serialization dll; for example in a separate console exe (just as a tool; I really need to wrap this in a standalone utility!)
So if you create a console exe that references your unobfuscated library and protobuf-net.dll:
var model = RuntimeTypeModel.Create();
model.Add(typeof(DMailer), true); // true means "use the attributes etc"
// and other types needed, etc
model.Compile("MailSerializer", "MailSerializer.dll");
this should write MailSerializer.dll, which you can then reference from your main code (in addition to protobuf-net), and use:
var ser = new MailSerializer(); // our pre-genereated serializer
ser.Serialize(...); // etc
Then include MailSerializer.dll in your obfuscation payload.
(this is all v2 specific, btw)
If this doesn't work, I'll need to investigate the main issue, but I'm not an obfuscation expert so could do with your repro steps.
Since there were a few upticks of interest, here is what looks like will work:
a) No form of reflection will be able to get the list of properties for an obfuscated type.
I tried walking thru all the types to find the ones with ProtoContract on it, I could find them
but the property names are all changed to a,m, b, j, g.
I also tried Me.GetType.GetProperties with the same result.
You could implement a map from the output to indicate that Employee.FirstName is now a0.j, but distributing this defeats the purpose of obfuscation.
b) What does work to a degree is to exempt the class NAME from obfuscation. Since PB-N looks for the ProtoMember attributes to get the data, you CAN obfuscate the Property/Member names, just not the CLASS/type name. If the name is something like FederalReserveLogIn, your class/type has a bullseye on it.
I have had initial success doing the following:
1) Build a simple class to store a Property Token and value. Store everything as string using ConvertFromInvariantString. Taking a tip from PBN, I used an integer for the token:
<ProtoMember(propIndex.Foo)>
Property Foo As String
An enum helps tie everything together later. Store these in a Dictionary(Of T, NameValuePair)
2) add some accessors. these can perform the type conversions for you:
Public Sub Add(ByVal Key As T, ByVal value As Object)
If _col.ContainsKey(Key) Then
_col.Remove(Key)
End If
_col.Add(Key, New TValue(value))
End Sub
Public Function GetTItem(Of TT)(key As T) As TT
If _col.ContainsKey(key) Then
Return CType(_col(key).TValue, TT)
Else
Return Nothing
End If
End Function
T is whatever key type you wish to use. Integer results in the smallest output and still allows the subscribing code to use an Enum. But it could be String.
TT is the original type:
myFoo = props.GetTItem(Of Long)(propsEnum.Foo)
3) Expose the innerlist (dictionary) to PBN and bingo, all done.
Its also very easy to add converters for Point, Rectangle, Font, Size, Color and even bitmap.
HTH

Newbie question: how do I create a class to hold data in Visual Basic Studio?

I'm really sorry. This must seem like an incredibly stupid question, but unless I ask I'll never figure it out. I'm trying to write a program to read in a csv file in Visual Basic (tried and gave up on C#) and I asked a friend of mine who is much better at programming than I am. He said I should create a class to hold the data that I read in.
The problem is, I've never created a class before, not in VB, Java, or anything. I know all the terms associated with classes, I understand at a high level how classes work no problem. But I suck at the actual details of making one.
So here's what I did:
Public Class TsvData
Property fullDataSet() As Array
Get
Return ?????
End Get
Set(ByVal value As Array)
End Set
End Property
End Class
I got as far as the question marks and I'm stuck.
The class is going to hold a lot of data, so I made it an array. That could be a bad move. I don't know. All i know is that it can't be a String and it certainly can't be an Integer or a Float.
As for the Getter and Setter, the reason I put the question marks in is because I want to return the whole array there. The class will eventually have other properties which are basically permutations of the same data, but this is the full set that I will use when I want to save it out or something. Now I want to return the whole Array, but typing "Return fullDataSet()" doesn't seem like a good idea. I mean, the name of the property is "fullDataSet()." It will just make some kind of loop. But there is no other data to return.
Should I Dim yet another array inside the property, which already is an array, and return that instead?
Instead of writing your own class, you could get yourself familiar with the pre-defined class System.Data.DataTable and then use that for holding CSV data.
In the last few years that I've been programming, I've never actually used a multi-dimensional array, and I'd advise you not to use them, either. There's usually ways of achieving the same with a better data structure. For example, consider creating a class (let's call it CsvRecord) that holds only one record; that is, only one line from the CSV file. Then use any of the standard collection types from the System.Collections.Generic namespace (e.g. List(Of CsvRecord)) to hold the entire data (ie. all lines) in the CSV file. This effectively reduces the problem to, "How do I read in one line of CSV data?"
If you want to take suggestion #2 even further, do as cHao says and don't simply lay out the information you've read as a CsvRecord; instead, create an object that reflects the actual content. For example, if your CSV file contains product–price information, call your CSV record class ProductInfo or something more fitting.
If, however, you want to go on with your current approach, you will need a backing field for the property, as demonstrated by Philipp's answer. Your property then becomes a "façade" that only delegates to this backing field. This is not absolutely necessary: You could simply make the backing field Public and let the user of your class access it directly, though that is not considered a good practice.
Ideally, you ought to have a class representing the specific data you want to read in. Setting an entire array at once is asking for trouble; some programs that read {C,T}SV files will freak out if all rows don't have the same number of columns, which is exceedingly easy to do if you can set the data to be an array of arbitrary length.
If you're trying to represent arbitrary data, frankly, you'd do just as well to use a List(Of String). If it's meant to be a table, you could instead read in the first line and make it a list as above (let's call it "headers"), and then make each row a Dictionary(Of String, String). (Let's call each row "row", and the collection (a list of these dictionary objects) "rows".) Just read in the line, split it like you did the first, and say something like row(headers(column number)) = value for each column, and then stuff it into 'rows'.
Or, you could use the data classes (System.Data.DataTable and System.Data.DataSet would do wonders here).
Usually you use a private member to store the actual data:
Public Class TsvData
Private _fullDataSet As String()
Public Property FullDataSet() As String()
Get
Return _fullDataSet
End Get
Set(ByVal value As String())
_fullDataSet = value
End Set
End Property
Note that this is an instance of bad design since it couples a concept to a concrete representation and allows the clients of the class to modify the internals without any error checking. Returning a ReadOnlyCollection or some dedicated container would be better.