What's the "best practice" for when to create a FileHelpers engine? - filehelpers

I have a class in a Windows Forms application that is processing at peak 10 - 20 records per minute, but at non-peak it might go two or three hours without doing anything.
As I see it, you can structure the FileHelpers engine lifecycle two ways:
Create a new engine within the function where you need it.
Create an engine in the class constructor and use that engine in all the member functions.
Now, if a member function runs as a separate task it makes sense to me that it might not be task safe to share the engine, but let's ignore that and assume it's all running in the UI thread.
So which is better?

I am not sure that this is really a question which fits normal StackOverflow requirements, as this is not really a code problem but a question of preference or needs, and therefore can be subjective. The reasons I would say this is because you have to take into account many things including what your processing and whether your planning for it to be scalable.
For scability, you have to take into account how much memory is utilised, what the initialisation time is, and whether the engine would operate safely across your classes if you are attempting to use a singleton.
The other issues you'd have to cover is whether you are using the generic engine with runtime typing, or you are using the type-based engine.
Personally, for safety, I would initiate a new engine for each processing request if you are using tasks or multi-threading as long as the initialisation time is short. Otherwise, if the initialisation time is long, create a singleton with a queue but then you need to balance how big that queue can get.
Effectively, you are trying to balance it so that you are picking Init+((Process)x#) or Init+((Queue+Process)x#) is the shorter and safer to implement.
Not sure if that truly answers your question but hopefully it's a start.

Related

Ruminations on highly-scalable and modular distributed server side architectures

Mine is not really a question, it's more of a call for opinions - and perhaps this isn't even the right place to post it. Nevertheless, the community here is very informed, and there's no harm in trying...
I was thinking about ways to create a highly scalable and, above all, highly modular back-end architecture. For example, an entire back-end ecosystem for a large site that had the potential for future-proof evolution into a massive site.
This would entail a very high degree of separation of concerns, to the extent that not only could (say) the underling DB be replaced (ie from Oracle to MySQL) but the actual type of database could be replaced (ed SQL to KV, or vice versa).
I envision a situation where each sub-system exposes its own API within the back-end ecosystem. In this way, the API could remain constant, whilst the implementation could change (even radically) over time.
The system must be heterogeneous in that it's not tied to a specific language. It must be able to accommodate modules or entire sub-systems using different languages.
It then occurred to me that what I was imagining was simply the architecture of the web itself.
So here is my discussion point: apart from the overhead of using (mainly) text-based protocols is there any overriding reason why a complex back-end architecture should not be implemented in the manner I describe, or is there some strong rationale I'm missing for using communication protocols such as Twisted, AMQP, Thrift, etc?
UPDATE: Following a comment from #meagar, I should perhaps reformulate the question: are the clear advantages of using a very simple, flexible and well-understood architecture (ie all functionality exposed as a series RESTful APIs) enough to compensate the obvious performance hit incurred when using this architecture in a back-end context?
[code]the actual type of database could be replaced (ed SQL to KV, or vice versa).[/code]
And anyone who wrote a join between two tables will be sad. If you want the "ability" to switch to KV, then you should not expose an API richer than what KV can support.
The answer to your question depends on what it is you're trying to accomplish. You want to keep each module within reasonable reins. Use proper physical layering of code, use defined interfaces with side-effect contracts, use test cases for each success and failure case of each interface. That way, you can depend on things like "when user enters blah page, a user-blah fact is generated so that all registered fact listeners will be invoked." This allows you to extend the system without having direct calls from point A to point B, while still having some kind of control over widely disparate dependencies. (I hate code bases where you can't find-all to find all possible references to a symbol!)
However, the fact that we put lots of code and classes into a single system is because calling between systems is often very, very expensive. You want to think in terms of code modules making requests of each other where you can. The difference in timing between a function call and a REST call is something like one to a million (maybe you can get it as low as one to ten thousand, if you only count cycles, not wallclock time -- but I'm not so sure). Also, anything that goes on a wire in a datacenter may potentially suffer from packet loss, because there is no such thing as a 100% loss-free data center, no matter how hard you try. Packet loss means random latency spikes in the response time for your application.

Compromising design & code quality to integrate with existing modules

Greetings!
I inherited a C#.NET application I have been extending and improving for a while now. Overall it was obviously a rush-job (or whoever wrote it was seemingly less competent than myself). The app pulls some data from an embedded device & displays and manipulates it. At the core is a communications thread in the main application form which executes a 600+ lines of code method which calls functions all over the place, implementing a state machine - lots of if-state-then-do type code. Interaction with the device is done by setting the state/mode globally and letting the thread do it's thing. (This is just one example of the badness of the code - overall it is not very OO-like, it reminds of the style of embedded C code the device firmware is written in).
My problem is that this piece of code is central to the application. The software, communications protocol or device firmware are not documented at all. Obviously to carry on with my work I have to interact with this code.
What I would like some guidance on, is whether it is worth scrapping this code & trying to piece together something more reasonable from the information I can reverse engineer? I can't decide! The reason I don't want to refactor is because the code already works, and changing it will surely be a long, laborious and unpleasant task. On the flip side, not refactoring means I have to sometimes compromise the design of other modules so that I may call my code from this state machine!
I've heard of "If it ain't broke don't fix it!", so I am wondering if it should apply when "it" is influencing the design of future code! Any advice would be appreciated!
Thanks!
Also, the longer you wait, the worse the codebase will smell. My suggestion would be first create a testsuite that you can evaluate your refactoring against. This makes it a lot easier to see if you are refactoring or just plain breaking things :).
I would definitely recommend you to refactor the code if you feel its junky. Yes, during the process of refactoring you may have some inconsistencies/problems at the start. But that is why we have iterations and testing. Since you are going to build up on this core engine in future, why not make the basement as stable as possible.
However, be very sure on what you are going to do. Because at times long lines of code does not necessarily mean evil. On the other hand they may be very efficient in running time. If/else blocks are not bad if you ask me, as they are very intelligent in branching from a microprocessor's perspective. So, you will have to be judgmental and very clear before you touch this.
But once you refactor the code, you will definitely have fine control over it. And don't forget to document it!! Tomorrow, someone might very well come and say about you on whatever you've told about this guy who have written that core code.
This depends on the constraints you are facing, it's a decision to be based on practical basis, not on theoretical ones. You need three things to consider.
Time: you need to have enough time to learn it, implement it, and test it, without too many other tasks interrupting you
Boss #1: if you are working for someone, he needs to know and approve the time and effort you will spend immediately, required to rebuild your solution
Boss #2: your boss also needs to know that the advantage of having new and clean software will come at the price of possible regressions, and therefore at the beginning of the deployment there may be unexpected bugs
If you have those three, then go ahead and refactor it. It will be surely be worth it!
First and foremost, get all the business logic out of the Form. Second, locate all the parts where the code interacts with the global state (e.g. accessing the embedded system). Delegate all this access to methods. Then, move these methods into a new class and create an instance in the class's constructor. Finally, inject an instance for the class to use.
Following these steps, you can move your embedded system logic ("existing module") to a wrapper class you write, so the interface can be nice and clean and more manageable. Then you can better tackle refactoring the monster method because there is less global state to worry about (only local state).
If the code works and you can integrate your part with minimal changes to it then let the code as it is and do your integration.
If the code is simply a big barrier in your way to add new functionality then it is best for you to refactor it.
Talk with other people that are responsible for the project, explain the situation, give an estimation explaining the benefits gained after refactoring the code and I'm sure (I hope) that the best choice will be made. It is best to speak about what you think, don't keep anything inside, especially if this affects your productivity, motivation etc.
NOTE: Usually rewriting code is out of the question but depending on situation and amount of code needed to be rewritten the decision may vary.
You say that this is having an impact on the future design of the system. In this case I would say it is broken and does need fixing.
But you do have to take into account the business requirements. Often reality gets in the way!
Would it be possible to wrap this code up in another class whose interface better suits how you want to take the system forward? (See adapter pattern)
This would allow you to move forward with your requirements without the poor design having an impact.
It gives you an interface that you understand which you could write some unit tests for. These tests can be based on what your design requires from this code. It ensures that your assumptions about what it is doing is correct. If you say that this code works, then any failing tests may be that your assumptions are incorrect.
Once you have these tests you can safely refactor - one step at a time, and when you have some spare time or when it is needed - as per business requirements.
Quite often I find the best way to truly understand a piece of code is to refactor it.
EDIT
On reflection, as this is one big method with multiple calls to the outside world, you are going to need some kind of inverse Adapter class to wrap this method. If you can inject dependencies into the method (see Dependency Inversion such that the method calls methods in your classes then you can route these to the original calls.

Single or multiple process, what criteria decides it?

In embedded systems using linux for an application; under which condition will you divide an application into two/three processes. My main doubt is; is it required to divide a single application component into multiple processes and then run multiple processes to achieve the required application functionality.
By experience I tend to isolate possibly problematic pieces of code. For example if you're depending on a sensor which ships with 3rd party libraries that you do not trust, making it a separate process will make your application more robust and fault-tolerant because you'll be (hopefully) able to restarts only parts of it.
Also for integration purposes it might be easier. Suppose your process A runs fine, then you can plug in the process B easily instead of adding new parts to process A. It might not seem like a big plus right now but it depends a lot on your project.
It does come with some overhead however, as dealing with synchronization and message passing can be more complicated and add to the design.
You don't have to do anything like that however.
You don't tell much about the circumstances that led you to your question, so I can only guess what kind of answer you are interested in.
Linux offers multi-threading functionality since ages, so concurrent programming can be done without multiple processes.
There is rarely a functional reason to divide integral components of an application into processes.
My suggestion is to write a single-process application. Should the requirement arise, that is: a problem can only be solved by managing runtime resources in separate processes, you can still take on the heavy work of solving inter-process communication and resource sharing, without having to change much in your business logic.

Code generators or ORMs?

What do you suggest for Data Access layer? Using ORMs like Entity Framework and Hibernate OR Code Generators like Subsonic, .netTiers, T4, etc.?
For me, this is a no-brainer, you generate the code.
I'm going to go slightly off topic here because there's a bigger underlying fallacy at play. The fallacy is that these ORM frameworks solve the object/relational impedence mismatch. This claim is a barefaced lie.
I find the best way to resolve the object/relational impedance mismatch is to either use OOP exclusively and use an object database or use the idioms of the relational database exclusively and ignore OOP.
The abstraction "everything is a table" is to me, much more powerful than the abstraction "everything is a class." It takes less code, less intellectual effort and leads to faster code when you code to the database rather than to an object model.
To me this seems obvious. If your application is data driven then surely your code should be data driven too? Yet to say this is hugely controversial.
The central problem here is that OOP becomes a really leaky abstraction when used in conjunction with a database. Code that look perfectly sensible when written to the idioms of OOP looks completely insane when you see the traffic that code generates at the database. When that messiness becomes a performance problem, OOP is the first casualty.
There is really no way to resolve this. Databases work with sets of data. OOP focus on instances of classes. Trying to marry the two is always going to end in divorce.
So to answer your question, I believe you should generate your classes and try and make them map the underlying database structure as closely as possible.
Perhaps controversially, I've always felt that using code generators for the ADO.NET plumbing is fundamentally solving the wrong problem.
At some point, hopefully not too long after learning about Connection Strings, SqlCommands, DataAdapters, and all that, we notice that:
Such code is ugly
It is very boring to write
It's very easy to miss something if you're doing it by hand
It has to be repeated every time you want to access the database
So, the problem to solve is "how to do the same thing lots of times fast"?
I say no.
Using code generators to make this process quick still means that you have a ton of code, all the same, all over your business classes (or your data access layer, if you separate that from your business logic).
And then, if you want to do something generically (like track stored procedure usage, for instance), you end up having to customise your code generator if it doesn't already have the feature you want. And even if it does, you still have to regenerate everything all the time.
I like to do things once, not many times, no matter how fast I can do them.
So I rolled my own Data Access class that knows how to add parameters, set up and close connections, manage transactions, and other cool stuff. It only had to be written once, and calling its methods from a Business object that needs some database stuff done consists of one line of code.
When I needed to make the application support multithreaded database accesses, it required a change to the Data Access class only, and all the business classes just do what they already did.
There is no right answer it all depends on your project. As Simon points out if your application is all data driven, then it might make sense depending on the size and complexity of the domain to use non oop paradigm. I had a lot of success building a system using a Transaction Script pattern, which passed XML Messages around the system.
However this system started to break down after five or six years as the application grew in size and complexity (5 or 6 webs, several web services, tons of COM+ components, legacy and .net code, 8+ databases with 800+ tables 4,000+ procedures). No one knew what anything was, and duplication was running rampant.
There are other ways to alleviate the maintance then OOP; however, if you have a very complex domain then hainvg a rich domain model is ideal IMHO, as it allows for the business rules to be expressed in nice encapsulated components.
To answer your question, avoid code generators if you can. Code generators are a recipe for disaster, but if you do go with code generation do not modify the generated code. Also be sure to have a good process in place that is easy for developers to get the new generated code.
I recommend using either the following: ORM or hand roll a lightweight DAL. I am currently transitioning a project over to nHibernate off my hand rolled DAL and am having a lot of success; however, I like having the option of using either option. Further if you properly seperate your concerns (Data Access from Business Layer from Presentation) you can have a single service layer that might talk to a Dao (Data Access Object) that for one object is an ORM but for another is hand rolled). I like this flexibility as it allows to apply the best tool to the job.
I like nHibernate over a hand rolled DAL because while my DAL does abstract away most of the ADO.Net code you still have to write the code that takes a data reader to an object or an object and creates the parameters.
I've always preferred to go the code generator route, especially in C# where you can make use of extended classes to add functionality to the basic data objects.
Hate to say this, but it depends. If you find an ORM tool that fits your needs go for it. We wrote our own system in small steps while developing the application. We are using C++ and there are not that many tools out there anyway. Ours ended up being a XML description of the database, from that the SQL generation script and the basic object layer and metadata were generated.
Do your homework and evaluate theses tools and you will find a good fit for your needs.

How to convince my co-workers not to use datasets for enterprise development (.NET 2.0+)

Everyone I work with is obsessed with the data-centric approach to enterprise development and hates the idea of using custom collections/objects. What is the best way to convince them otherwise?
Do it by example and tread lightly. Anything stronger will just alienate you from the rest of the team.
Remember to consider the possibility that they're onto something you've missed. Being part of a team means taking turns learning & teaching.
No single person has all the answers.
If you are working on legacy code (e.g., apps ported from .NET 1.x to 2.0 or 3.5) then it would be a bad idea to depart from datasets. Why change something that already works?
If you are, however, creating a new apps, there a few things that you can cite:
Appeal to experiencing pain in maintaining apps that stick with DataSets
Cite performance benefits for your new approach
Bait them with a good middle-ground. Move to .NET 3.5, and promote LINQ to SQL, for instance: while still sticking to data-driven architecture, is a huge, huge departure to string-indexed data sets, and enforces... voila! Custom collections -- in a manner that is hidden from them.
What is important is that whatever approach you use you remain consistent, and you are completely honest with the pros and cons of your approaches.
If all else fails (e.g., you have a development team that utterly refuses to budge from old practices and is skeptical of learning new things), this is a very, very clear sign that you've outgrown your team it's time to leave your company!
Remember to consider the possibility that they're onto something you've missed. Being part of a team means taking turns learning & teaching.
Seconded. The whole idea that "enterprise development" is somehow distinct from (and usually the implication is 'more important than') normal development really irks me.
If there really is a benefit for using some technology, then you'll need to come up with a considered list of all the pros and cons that would occur if you switched.
Present this list to your co workers along with explanations and examples for each one.
You have to be realistic when creating this list. You can't just say "Saves us lots of time!!! WIN!!" without addressing the fact that sometimes it is going to take MORE time, will require X months to come up to speed on the new tech, etc. You have to show concrete examples where it will save time, and exactly how.
Likewise you can't just skirt over the cons as if they don't matter, your co-workers will call you on it.
If you don't do these things, or come across as just pushing what you personally like, nobody is going to take you seriously, and you'll just get a reputation for being the guy who's full of enthusiasm and energy but has no idea about anything.
BTW. Look out for this particular con. It will trump everything, unless you have a lot of strong cases for all your other stuff:
Requires 12+ months work porting our existing code. You lose.
Of course, "it depends" on the situation. Sometimes DataSets or DataTables are more suited, like if it really is pretty light business logic, flat hierarchy of entities/records, or featuring some versioning capabilities.
Custom object collections shine when you want to implement a deep hierarchy/graph of objects that cannot be efficiently represented in flat 2D tables. What you can demonstrate is a large graph of objects and getting certain events to propagate down the correct branches without invoking inappropriate objects in other branches. That way it is not necessary to loop or Select through each and every DataTable just to get the child records.
For example, in a project I got involved in two and half years ago, there was a UI module that is supposed to display questions and answer controls in a single WinForms DataGrid (to be more specific, it was Infragistics' UltraGrid). Some more tricky requirements
The answer control for a question can be anything - text box, check box options, radio button options, drop-down lists, or even to pop up a custom dialog box that may pull more data from a web service.
Depending on what the user answered, it can trigger more sub-questions to appear directly under the parent question. If a different answer is given later, it should expose another set of sub-questions (if any) related to that answer.
The original implementation was written entirely in DataSets, DataTables, and arrays. The amount of looping through the hundreds of rows for multiple tables was purely mind-bending. It did not help the programmer came from a C++ background attempting to ref everything (hello, objects living in the heap use reference variables, like pointers!). Nobody, not even the originally programmer, could explain why the code is doing what it does. I came into the scene more than six months after this, and it was stil flooded with bugs. No wonder the 2nd-generation developer I took over from decided to quit.
Two months of tying to fix the chaotic mess, I took it upon myself to redesign the entire module into an object-oriented graph to solve this problem. yeap, complete with abstract classes (to render different answer control on a grid cell depending on question type), delegates and eventing. The end result was a 2D dataGrid binded to a deep hierarchy of questions, naturally sorted according to the parent-child arrangement. When a parent question's answer changed, it would raise an event to the children questions and they would automatically show/hide their rows in the grid according to the parent's answer. Only question objects down that path were affected. The UI responsiveness of this solution compared to the old method was by orders of magnitude.
Ironically, I wanted to post a question that was the exact opposite of this. Most of the programmers I've worked with have gone with the custom data objects/collections approach. It breaks my heart to watch someone with their SQL Server table definition open on one monitor, slowly typing up a matching row-wrapper class in Visual Studio in another monitor (complete with private properties and getters-setters for each column). It's especially painful if they're also prone to creating 60-column tables. I know there are ORM systems that can build these classes automagically, but I've seen the manual approach used much more frequently.
Engineering choices always involve trade-offs between the pros and cons of the available options. The DataSet-centric approach has its advantages (db-table-like in-memory representation of actual db data, classes written by people who know what they're doing, familiar to large pool of developers etc.), as do custom data objects (compile-type checking, users don't need to learn SQL etc.). If everyone else at your company is going the DataSet route, it's at least technically possible that DataSets are the best choice for what they're doing.
Datasets/tables aren't so bad are they?
Best advise I can give is to use it as much as you can in your own code, and hopefully through peer reviews and bugfixes, the other developers will see how code becomes more readable. (make sure to push the point when these occurrences happen).
Ultimately if the code works, then the rest is semantics is my view.
I guess you can trying selling the idea of O/R mapping and mapper tools. The benefit of treating rows as objects is pretty powerful.
I think you should focus on the performance. If you can create an application that shows the performance difference when using DataSets vs Custom Entities. Also, try to show them Domain Driven Design principles and how it fits with entity frameworks.
Don't make it a religion or faith discussion. Those are hard to win (and is not what you want anyway)
Don't frame it the way you just did in your question. The issue is not getting anyone to agree that this way or that way is the general way they should work. You should talk about how each one needs to think in order to make the right choice at any given time. give an example for when to use dataSet, and when not to.
I had developers using dataTables to store data they fetched from the database and then have business logic code using that dataTable... And I showed them how I reduced the time to load a page from taking 7 seconds of 100% CPU (on the web server) to not being able to see the CPU line move at all.. by changing the memory object from dataTable to Hash table.
So take an example or case that you thing is better implemented differently, and win that battle. Don't fight the a high level war...
If Interoperability is/will be a concern down the line, DataSet is definitely not the right direction to go in. You CAN expose DataSets/DataTables over a service but whether you SHOULD or is debatable. If you are talking .NET->.NET you're probably Ok, otherwise you are going to have a very unhappy client developer from the other side of the fence consuming your service
You can't convince them otherwise. Pick a smaller challenge or move to a different organization. If your manager respects you see if you can do a project in the domain-driven style as a sort of technology trial.
If you can profile, just Do it and profile. Datasets are heavier then a simple Collection<T>
DataReaders are faster then using Adapters...
Changing behavior in an objects is much easier than massaging a dataset
Anyway: Just Do It, ask for forgiveness not permission.
Most programmers don't like to stray out of their comfort zones (note that the intersection of the 'most programmers' set and the 'Stack Overflow' set is the probably the empty set). "If it worked before (or even just worked) then keep on doing it". The project I'm currently on required a lot of argument to get the older programmers to use XML/schemas/data sets instead of just CSV files (the previous version of the software used CSV's). It's not perfect, the schemas aren't robust enough at validating the data. But it's a step in the right direction. The code I develop uses OO abstractions on the data sets rather than passing data set objects around. Generally, it's best to teach by example, one small step at a time.
There is already some very good advice here but you'll still have a job to convince your colleagues if all you have to back you up is a few supportive comments on stackoverflow.
And, if they are as sceptical as they sound, you are going to need more ammo.
First, get a copy of Martin Fowler's "Patterns of Enterprise Architecture" which contains a detailed analysis of a variety of data access techniques.
Read it.
Then force them all to read it.
Job done.
data-centric means less code-complexity.
custom objects means potentially hundreds of additional objects to organize, maintain, and generally live with. It's also going to be a bit faster.
I think it's really a code-complexity vs performance question, which can be answered by the needs of your app.
Start small. Is there a utility app you can use to illustrate your point?
For instance, at a place where I worked, the main application had a complicated build process, involving changing config files, installing a service, etc.
So I wrote an app to automate the build process. It had a rudimentary WinForms UI. But since we were moving towards WPF, I changed it to a WPF UI, while keeping the WinForms UI as well, thanks to Model-View-Presenter. For those who weren't familiar with Model-View-Presenter, it was an easily-comprehensible example they could refer to.
Similarly, find something small where you can show them what a non-DataSet app would look like without having to make a major development investment.