Best approach to File Conversion problems

Best approach to File Conversion problems - oop

I have a task that i have to complete. I know the solution but i want to make sure that my solution is according to proper OOP/Design Patterns. Here is the scenerio
1- I have 2 files with different formats. (Lets say FormatA and FormatB).
2- I want to convert data in FormatA to FormatB.
3- FormatA is plain text file with keys. Each new line is a new
key/value pair. FormatB is XML.
4- The keys in FormatA file can be same as the keys in FormatB but can also be different. Sometimes we might need to do some calculations to convert the
value to FormatB.
5- There are some chances that more keys will
be added in future to either the old one or the new one.
My solution:
I want the solution to be generic and no hardcoding. So, that if in future some key mapping change then i should not change the code.
1- First i created a "Mapping" XML file that has all the data that which key in FormatA maps to which field in FormatB. The XML structure is something like this
//oldKey = name of the key in the old file format
//newKey = name of the key in the new format
//ignore = optional. set it to true if you want to ignore this field during conversion
//function = optional.Name of a function that will be called. This function will have all the logic to do the calculations
//functionparams = optinal .key names from the old file that need calculation
//defaultvalue = optional parameter. This value will be replaced no matter what if given.
<field oldKey="abc" newKey="def" ignore="false" function="MultiplyBy2" functionparams="abc" defaultvalue="4">
2- I created a class named "TextFileParser" that loads the text file and creates a dictionary with all the keys.
3- I created a class named "MappingXMLParser" which loads the Mapping XML file and populates a dictionary with all the data.
3- I created a class "TextFileToXML" that uses the above 2 classes to write the data in the XML file. No composition is used.
4- I created a class named "Conversion". If the Mapping XML file has declared some function (like "MultiplyBy2") then the definition of functions will be in this class. I will use reflection to call the methods of this class from the class "TextFileToXML".
This is my design but i donot know that it is correct in terms of OOP/Design Patterns/architecture. Can you point out the mistakes ? What can be done better or any good approach ?

As you probably understand, there is no such thing as correct design. I would suggest following standard practices. I would follow these steps:
Translate the key-value file to a simple XML file following the simplest possible XML schema
I would use an XSLT to describe the translation between the simple XML files generated in step 1, and the final result.
Execute the transformation described in XSLT using an XML operations library. I suppose you could find one, on the specific programming language you use.
This way, in case of changing something (adding more keys in future, changing the target XML schema etc), you would only have to change the translation process described in XSLT, which is not application code but an XML file. You have nothing regarding the specific translation process hard-coded in your application.
Regarding the overall design, I would choose to use the Factory pattern. I think it matches perfectly your situation.
Hope I helped!

Related

Modeling archive/current repository in doctrine

Assume there is a abstract doctrine class named "file" and child classes "archiveFile" and "currentFile" which are represented by a discriminator column "file_type" ... in my application, only one "currentFile" can exist and multiple "archiveFile". If new files are imported, the newest of this imported file is now the current file and the previous file should move to an archive file ...
In theory, it works fine. But in practice i cannot change the type neither of the currentFile nor the archiveFile ... is there a better approach for my intention or have i to deal with plain sql to set the discrimination column?

RavenDb GenerateDocumentKey

I need to generate Id for child object of my document. What is the current syntax for generating document key?
session.Advanced.Conventions.GenerateDocumentKey(document) is not there anymore. I've found _documentSession.Advanced.DocumentStore.Conventions.GenerateDocumentKey method but its' signature is weird: I am okay with default key generation algorithm I just want to pass an object and receive an Id.

The default implementation of GenerateDocumentKey is to get the "dynamic tag name" for the class, and append a slash. For example, class Foo would turn into Foos/ which then goes through the HiLoKeyGenerator so that ids can be assigned on the client-side without having to consult the server each time.
If you really want this behavior, you could try to use the HiLoKeyGenerator on your own, but have you considered something simpler? I don't know what your model is, but if the child thing is fully owned by the containing document (which it should be, to be in the same document) have you have several much easier options:
Just use the index within the collection
Keep a int NextChildThingId property on the document and increment that every time you add a ChildThing
Just use a Guid, although those are no fun to read, type, look at, compare, or speak to someone over the phone.

Yii CActiveRecord with Column Named "attributes"

I used the CRUD generator from a legacy database. When searching for a column value I get the following error:
htmlspecialchars() expects parameter 1 to be string, array given (/usr/local/share/yii/framework/web/helpers/CHtml.php:103)
The problem is that the model has an existing column named "attributes" which is creating a conflict. I removed the entry from the _search.php and commented out all instances in the model hoping to at least get it working but no luck. Any suggestions would be appreciated.
Thanks.

Every CActiveRecord instance (or CModel instance for that matter) has a getter/setter named attributes with which all the attributes can be set. This leads to a conflict because the generated crud code uses the attributes attribute expecting it works as described before.
The controller does something like:
$model->attributes=$_POST['ModelClassName'];
// or
$model->attributes=$_GET['ModelClassName'];
This is meant to set al the (safe) attributes of the model at once. Instead this overwrites the database attribute attributes of your legacy DB model.
This in turn leads to the error you describe, because $_GET['ModelClassName'] and $_POST['ModelClassName'] typically contain arrays of data.
I guess the easiest fix would be to directly call the setter function for the "normal" attributes behavior which would lead to replacing the lines mentioned above with something like the following:
// in the controller
$model->setAttributes($_POST['ModelClassName']);
// and
$model->setAttributes($_GET['ModelClassName']);
I think rest of the generated CRUD code (the views) could and should be left untouched to make it work.
If you want to know how and why this works, it's best to do some research into the __get and __set magic functions and how they're used in the yii framework.

Dependency Injection/config object

I have an object that is responsible for exporting a file to csv.
It works well but i am looking at ways to refactor it.
This question pertains to the constructor which takes a number of arguments, relating to how the csv is to be exported:
For example, file name, delimiter, etc. etc.
Also, lately I have been reading about dependency injection but cant decide if this is a case where I should:
A. Leave the constructor as is.
B. Create a new class that gets passed to the constructor that simple holds config values for the file name etc
C. Something else altogether?
Here is the existing constructor (in PHP)
public function __construct($file,$overwriteExistingFile, $enclosure, $delim, $headerRow)
{
//set all properties here
}

Each of those values represents data that is an input to some process. $enclosure, $delimiter, and $headerRow pertain to generating the CSV content, while $file and $overwriteExistingFile pertain to persisting the content to disk.
A hallmark of a DI-style refactoring is to identify the various responsibilities (generate, persist) and encapsulate each of them in its own type. This shifts the refactoring from "how do I best get the values to this class?" to "how do I remove knowledge of these values from this class?"
To answer that, we would define two new concepts, each of which takes one of the responsibilities, and pass those into the existing constructor:
public function __construct($csvGenerator, $csvFileWriter)
{
...save dependencies...
}
...at some point, generate the CSV content and pass it to the file writer...
In this way, the original class becomes the orchestrator of the interaction between the generation and file writing, without having intimate knowledge of either activity. We have elevated the class to a higher level of abstraction, simplifying it as well as isolating its responsibilities into its collaborators.
Now, you would define two new classes, constructing them with the relevant parameters:
Generator
public function __construct($enclosure, $delimiter, $headerRow)
File Writer
public function __construct($file, $overwriteExistingFile)
With these elements in place, you can compose them together by creating the generator, then the file writer, then passing both to the orchestrator.

I would create a CSVFormatter that you can setup the deliminator on and unit test the formatting independently with.
Inject the formatter into a CSVWriter which writes the formatted output to a file.
The reason you would do this is to unit test the formatting logic or if you need to do multiple kinds of formatting or write to different kinds of output streams. If the code is very small and simple then you don't need to break it into multiple classes.

Proto-buf serialization with Obfuscation

I am looking for some guidance as to what is going on when using proto-buf net with obfuscation (Dotfuscator). One half of the project is a DLL and the other is an EXE elsewhere and using proto-buf NET they exchange data flawlessly. Until I obfuscate the DLL.
At that point P-BN fails without raising an exception, returning variously a 0 length byte array or a foreshortened one depending on what I have fiddled with. The class is fairly simple (VB):
<ProtoContract(Name:="DMailer")> _
Friend Class DMailer
Private _Lic As Cert
Private _Sys As Sys
Private _LList As List(Of LItem)
..
..
End Class
There are 3 props all decorated with ProtoMember to get/set the constituent class objects. Snipped for brevity.
Again, it works GREAT until I obfuscate the DLL. Then, Dotfuscator renames each of these to null, apparently since they are all Friend, and that seems to choke proto-buff. If I exempt the class from renaming (just the class name, not props/members), it seems to work again. It makes sense that P-BN would only be able to act on objects with a proper name, though when asked to serialize a null named object, it seems like an exception might be in order.
On the other hand, much of the charm of PB-N is supposed to be serialization independent of .NET names working from attributes - at least as I understand it. Yet in this case it only seems to work with classes with names. I tried using the Name qualifier or argument as shown above, to no avail - it apparently doesnt do what I thought it might.
So, I am curious if:
a) ...I have basically surmised the problem correctly
b) ...There is some other attribute or flag that might facilitate serializing
a null named object
c) ...if there are any other insights that would help.
If I exempt all 3 or 4 classes from Dotfuscator renaming (LList is not actually implemented yet, leaving DMailer, Cert and Sys), the DLL seems to work again - at least the output is the correct size. I can live with that, though obscured names would be better: Dotfuscator (CE) either exempts them or sets the names to Null - I cant seem to find a way to force them to be renamed.
Rather than exempt 3 or 4 classes from renaming, one alternative I am considering is to simply store the Serializer output for Cert and Sys as byte arrays or Base64 strings in DMailer instead of classes. Then have the receiver Deserialize each object individually. It is kind of nice to be able to unpack just one thing and have your toys right there as if by magic though.
(many)TIA

Interesting. I confess I have never tried this scenario, but if you can walk me through your process (or better: maybe provide a basic repro example with "run this, then this, then this: boom") I'll happily investigate.
Note: the Name on ProtoContract is mainly intended for GetProto() usage; it is not needed by the core serializer, and can be omitted to reduce your exposure. Also, protobuf-net isn't interested in fields unless those fields are decorated with the attributes, so that shouldn't be an issue.
However! there's probably a workaround here that should work now; you can pre-generate a static serialization dll; for example in a separate console exe (just as a tool; I really need to wrap this in a standalone utility!)
So if you create a console exe that references your unobfuscated library and protobuf-net.dll:
var model = RuntimeTypeModel.Create();
model.Add(typeof(DMailer), true); // true means "use the attributes etc"
// and other types needed, etc
model.Compile("MailSerializer", "MailSerializer.dll");
this should write MailSerializer.dll, which you can then reference from your main code (in addition to protobuf-net), and use:
var ser = new MailSerializer(); // our pre-genereated serializer
ser.Serialize(...); // etc
Then include MailSerializer.dll in your obfuscation payload.
(this is all v2 specific, btw)
If this doesn't work, I'll need to investigate the main issue, but I'm not an obfuscation expert so could do with your repro steps.

Since there were a few upticks of interest, here is what looks like will work:
a) No form of reflection will be able to get the list of properties for an obfuscated type.
I tried walking thru all the types to find the ones with ProtoContract on it, I could find them
but the property names are all changed to a,m, b, j, g.
I also tried Me.GetType.GetProperties with the same result.
You could implement a map from the output to indicate that Employee.FirstName is now a0.j, but distributing this defeats the purpose of obfuscation.
b) What does work to a degree is to exempt the class NAME from obfuscation. Since PB-N looks for the ProtoMember attributes to get the data, you CAN obfuscate the Property/Member names, just not the CLASS/type name. If the name is something like FederalReserveLogIn, your class/type has a bullseye on it.
I have had initial success doing the following:
1) Build a simple class to store a Property Token and value. Store everything as string using ConvertFromInvariantString. Taking a tip from PBN, I used an integer for the token:
<ProtoMember(propIndex.Foo)>
Property Foo As String
An enum helps tie everything together later. Store these in a Dictionary(Of T, NameValuePair)
2) add some accessors. these can perform the type conversions for you:
Public Sub Add(ByVal Key As T, ByVal value As Object)
If _col.ContainsKey(Key) Then
_col.Remove(Key)
End If
_col.Add(Key, New TValue(value))
End Sub
Public Function GetTItem(Of TT)(key As T) As TT
If _col.ContainsKey(key) Then
Return CType(_col(key).TValue, TT)
Else
Return Nothing
End If
End Function
T is whatever key type you wish to use. Integer results in the smallest output and still allows the subscribing code to use an Enum. But it could be String.
TT is the original type:
myFoo = props.GetTItem(Of Long)(propsEnum.Foo)
3) Expose the innerlist (dictionary) to PBN and bingo, all done.
Its also very easy to add converters for Point, Rectangle, Font, Size, Color and even bitmap.
HTH

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Best approach to File Conversion problems - oop

Related

Modeling archive/current repository in doctrine

RavenDb GenerateDocumentKey

Yii CActiveRecord with Column Named "attributes"

Dependency Injection/config object

Proto-buf serialization with Obfuscation

Categories

Resources