Single or multiple process, what criteria decides it? - process

In embedded systems using linux for an application; under which condition will you divide an application into two/three processes. My main doubt is; is it required to divide a single application component into multiple processes and then run multiple processes to achieve the required application functionality.

By experience I tend to isolate possibly problematic pieces of code. For example if you're depending on a sensor which ships with 3rd party libraries that you do not trust, making it a separate process will make your application more robust and fault-tolerant because you'll be (hopefully) able to restarts only parts of it.
Also for integration purposes it might be easier. Suppose your process A runs fine, then you can plug in the process B easily instead of adding new parts to process A. It might not seem like a big plus right now but it depends a lot on your project.
It does come with some overhead however, as dealing with synchronization and message passing can be more complicated and add to the design.
You don't have to do anything like that however.

You don't tell much about the circumstances that led you to your question, so I can only guess what kind of answer you are interested in.
Linux offers multi-threading functionality since ages, so concurrent programming can be done without multiple processes.
There is rarely a functional reason to divide integral components of an application into processes.
My suggestion is to write a single-process application. Should the requirement arise, that is: a problem can only be solved by managing runtime resources in separate processes, you can still take on the heavy work of solving inter-process communication and resource sharing, without having to change much in your business logic.

Related

Automatisation&Piping of diverse tasks

I am looking for recommendations for a very generic automation/task execution tool. The scope is somewhat between a script, a build system like make and orchestration tools like Ansible or Puppet. The best I can do is describe my rather vague 'requirements' and hope for clues how others have solved these problems. Sorry for the long description, I guess I don't really know what exactly I want he solution to do. I profit from programming answers on SO all the time but I am not entirely sure if my open ended question is acceptable here.
--
We work as data analysts/system validators in a corporate setting. We perform a range of diverse tasks and interact with lots of ever changing systems. Each little step we do is arguably mundane/easy, but the bigger picture only forms if lots of iterations with slightly different inputs or combinations are repeated. It is a bit like looking for a needle in a hay stack, but the concrete problem is slightly different every time. This makes it hard to use a normal script or automation tool, which require more structure to work. But doing things semi-manual without a big team does not allow us to cover all the analysis/cases we want/need.
To give an applied example: a typical tasks could involve setting up a big calculation in a vendor system, extracting their ASCII output from a web server and parsing it. Then we would suck raw input data from a set of configuration files and data bases. This is piped into some of our home grown replication tools/models living in C++. Then both the system's results and our replication is scanned for interesting outliers (e.g. regression tested) and only this subset is uploaded for human analysts to investigate, nicely presented in an Excel sheet.
We can do all these things easily by hand for a once-off or maybe using ad-hoc tools/scripts. We just can't do it repeatedly for ever so slightly different settings. We seem to need a library for 'common tasks' that are just specialized by some few inputs (e.g. task it to download a time series and scan for outliers - parameters would be db access/login and maybe parameters defining what an outlier is in that context). And then I need to chain these tasks together to make complex tasks repeatable and simple to build up from atomic steps.
I have not found anything really do something like this. There seems to be specialist scripting or tools for each niche available, but not something combining all the different tasks I need to perform.
I have been so far toying on and off with a minimalist sqlite database which controls a set of python 'scripts'/wrappers. These scripts take input parameters from the data base, and they are chained/piped based on the database. The scripts write their results back to the database, mostly as plain text and floats/ints. This kind of db interface is very error prone and complicated for humans; the idea is to have (template) scripts writing (concrete/parametrised) scripts to the db for execution, like rolling itself out before executing. Not sure if this is a smart idea, but the db is driving the scripts, without much interacting among these building block script; rather than having the conventional bunch of scripts calling each other and dumping some data into db as an after thought. So far we have lots of separate wrappers (scripts) to talk to all the systems and do the work, what is really missing is something tying it all together an controlling it.
I am interested (obviously) more in data/flow transparency, repeatability and chaining mini-programs together to bigger units, rather than speed or scaling to larger data sets. All the heavier lifting is either done in the systems we interact with, or it is delegated to C++ called from these python scripts. This is not a production system with more stability and fixed goals but rather a flexible analysis/investigation helper.
I really hope someone here has previously run into exactly that problem severely limiting our productivity, and we can just piggy back off your solution or ideas.
I would suggest that you consider staf (Software Test Automation Framework). It's open source, distributed, and cross-platform. It will run just about any task on just about any platform. It has a variety of plugin "Services" available for specific purposes, or you can create your own custom Service. You can also extend the functionality through scripting (jython) It's also well documented and reasonably well supported through user forums by IBM.

MPI vs. Microsoft WCF vs. Microsoft TPL

I have a scientific program written in F# which I want to parallelize and run on 1 server with multiple processors (64) and for the future also in the cloud (Windows Azure?). The program will have a simple 1-1 communication between the nodes (no broadcast etc.).
If I used WCF, would it be as fast as MPI? What has MPI that WCF does not? There exists Pure MPI .NET written for WCF which puzzles me even more. I do not know if to use WCF or MPI.NET or Pure Mpi running on WCF.
PS: I guess that TPL is out of the game for 64 processors and more, right?
It is difficult to give a concrete answer, because it all depends on the specific aspects of your application, its current architecture (I suppose you already have some app) etc.
As you mention MPI and WCF, I assume that the application is written as several components that communicate with each other. The best way to structure this kind of application is to use F# agents.
As far as I understand, you want to run the application on a single server first. If you write it using agents, the agents can just communicate directly with each other (so you don't need MPI or WCF).
TPL should work well on a single-server (with lots of CPUs), but it will not scale to the distributed setting - you cannot run Task on another machine. However, you can use it inside individual components (e.g. agents) that will be distributed.
Regarding MPI vs. WCF - I don't have enough experience to answer that. However, if you use agent-based architecture, it should be easy to try various options. You may also check out fracture and related projects, which aims to implement high-performance sockets for F# (and possibly distributed agents in the future).
If you're doing it on 1 server you could just execute one process and execute the code in parallel. That way you could share memory more easily and faster than doing it through messages like MPI and WCF. Although the overhead of communication might not be that much, depending on your problem + solution.
Also the changes to your code would be much less that way, F# can usually be turned into prallel code with little effort. Going to MPI/WCF would require you to rewrite large portions.
Googling for F# + parallel gives plenty useful info that you should read first, like this for a good start:
http://blogs.msdn.com/b/dsyme/archive/2010/01/09/async-and-parallel-design-patterns-in-f-parallelizing-cpu-and-i-o-computations.aspx
So on 1 server, I woudl use the parallel features of F#, it's designed to prallelize easily.
Later when you want to go for cloud, that would be turning it into cleint-server. That's a different problem then parallization. I would treat and solve them seperately.
On the MPI vs WCF. WCF is designed as a RPC technology, i.e. you call remote procedures and get answers. If you want to use it for parallel programming with separate processes, you would have to create the boilerplate code for that. (Keep track of subsribed clients etc.)
MPI was designed to run that kind of architecture and handles it much more easily. (the first process gets number 0 and is the master, the other are slaves get numbered incrementally etc.)
Howver I don't think MPI will be very good to go cloud, since that invloves http, protocols, security etc. Not sure how well MPI works for those kind of things, WCF will handle that very well indeed.
The fact that there is an MPI.NET for WCF is because MPI is about a certain style of parallizing code that a lot of people are familiar with. So you can use the programming concepts and use them on the .NET platform leveraging WCF for the communications.
Something else you might want to look into if you need to exchange a lot of data over the wire is protocol-buffers (see protobuf-net for instance). That can easily be combined with WCF for communication and is very lean in serializing structured data so you can send over the wire efficiently.
Gert-Jan
WCF and MPI are different concepts. WCF is like a person A asks a person B to do something where as MPI is like a person A creates clones of himself (all clone have same ability/logic) and then these clones work on specific parts of the problem to be solved and once done they combine their results.
So choosing between which one fits your specific application depends on the problem your application is trying to solve. It may even be a combination of both WCF and MPI. Where your client application asks the WCF to do some task and the WCF create clones of the "problem solver" using MPI and when the clone are done with solving the problem (in parallel) they return the aggregated result back to the WCF and then that result is sent to client application.
You might also want to take at the 'mbrace' product, which provides a cloud monad (http://blogs.msdn.com/b/dsyme/archive/2011/08/23/m-brace-f-in-the-cloud.aspx). It's still at a fairly early stage though. I'm no expert but it may be that you can run an mbrace-based solution as effectively a private cloud on your 64-processor setup. When you outgrow that, a move to Azure would be seamless.

Why use AMQP/ZeroMQ/RabbitMQ

as opposed to writing your own library.
We're working on a project here that will be a self-dividing server pool, if one section grows too heavy, the manager would divide it and put it on another machine as a separate process. It would also alert all connected clients this affects to connect to the new server.
I am curious about using ZeroMQ for inter-server and inter-process communication. My partner would prefer to roll his own. I'm looking to the community to answer this question.
I'm a fairly novice programmer myself and just learned about messaging queues. As i've googled and read, it seems everyone is using messaging queues for all sorts of things, but why? What makes them better than writing your own library? Why are they so common and why are there so many?
what makes them better than writing your own library?
When rolling out the first version of your app, probably nothing: your needs are well defined and you will develop a messaging system that will fit your needs: small feature list, small source code etc.
Those tools are very useful after the first release, when you actually have to extend your application and add more features to it.
Let me give you a few use cases:
your app will have to talk to a big endian machine (sparc/powerpc) from a little endian machine (x86, intel/amd). Your messaging system had some endian ordering assumption: go and fix it
you designed your app so it is not a binary protocol/messaging system and now it is very slow because you spend most of your time parsing it (the number of messages increased and parsing became a bottleneck): adapt it so it can transport binary/fixed encoding
at the beginning you had 3 machine inside a lan, no noticeable delays everything gets to every machine. your client/boss/pointy-haired-devil-boss shows up and tell you that you will install the app on WAN you do not manage - and then you start having connection failures, bad latency etc. you need to store message and retry sending them later on: go back to the code and plug this stuff in (and enjoy)
messages sent need to have replies, but not all of them: you send some parameters in and expect a spreadsheet as a result instead of just sending and acknowledges, go back to code and plug this stuff in (and enjoy.)
some messages are critical and there reception/sending needs proper backup/persistence/. Why you ask ? auditing purposes
And many other use cases that I forgot ...
You can implement it yourself, but do not spend much time doing so: you will probably replace it later on anyway.
That's very much like asking: why use a database when you can write your own?
The answer is that using a tool that has been around for a while and is well understood in lots of different use cases, pays off more and more over time and as your requirements evolve. This is especially true if more than one developer is involved in a project. Do you want to become support staff for a queueing system if you change to a new project? Using a tool prevents that from happening. It becomes someone else's problem.
Case in point: persistence. Writing a tool to store one message on disk is easy. Writing a persistor that scales and performs well and stably, in many different use cases, and is manageable, and cheap to support, is hard. If you want to see someone complaining about how hard it is then look at this: http://www.lshift.net/blog/2009/12/07/rabbitmq-at-the-skills-matter-functional-programming-exchange
Anyway, I hope this helps. By all means write your own tool. Many many people have done so. Whatever solves your problem, is good.
I'm considering using ZeroMQ myself - hence I stumbled across this question.
Let's assume for the moment that you have the ability to implement a message queuing system that meets all of your requirements. Why would you adopt ZeroMQ (or other third party library) over the roll-your-own approach? Simple - cost.
Let's assume for a moment that ZeroMQ already meets all of your requirements. All that needs to be done is integrating it into your build, read some doco and then start using it. That's got to be far less effort than rolling your own. Plus, the maintenance burden has been shifted to another company. Since ZeroMQ is free, it's like you've just grown your development team to include (part of) the ZeroMQ team.
If you ran a Software Development business, then I think that you would balance the cost/risk of using third party libraries against rolling your own, and in this case, using ZeroMQ would win hands down.
Perhaps you (or rather, your partner) suffer, as so many developers do, from the "Not Invented Here" syndrome? If so, adjust your attitude and reassess the use of ZeroMQ. Personally, I much prefer the benefits of Proudly Found Elsewhere attitude. I'm hoping I can proud of finding ZeroMQ... time will tell.
EDIT: I came across this video from the ZeroMQ developers that talks about why you should use ZeroMQ.
what makes them better than writing your own library?
Message queuing systems are transactional, which is conceptually easy to use as a client, but hard to get right as an implementor, especially considering persistent queues. You might think you can get away with writing a quick messaging library, but without transactions and persistence, you'd not have the full benefits of a messaging system.
Persistence in this context means that the messaging middleware keeps unhandled messages in permanent storage (on disk) in case the server goes down; after a restart, the messages can be handled and no retransmit is necessary (the sender does not even know there was a problem). Transactional means that you can read messages from different queues and write messages to different queues in a transactional manner, meaning that either all reads and writes succeed or (if one or more fail) none succeeds. This is not really much different from the transactionality known from interfacing with databases and has the same benefits (it simplifies error handling; without transactions, you would have to assure that each individual read/write succeeds, and if one or more fail, you have to roll back those changes that did succeed).
Before writing your own library, read the 0MQ Guide here: http://zguide.zeromq.org/page:all
Chances are that you will either decide to install RabbitMQ, or else you will make your library on top of ZeroMQ since they have already done all the hard parts.
If you have a little time give it a try and roll out your own implemntation! The learnings of this excercise will convince you about the wisdom of using an already tested library.

BPMS or just plain programming?

What do you prefer (from your developer's point of view) when it comes to implement a business process?
A Business Process Management System (BPMS) or just your favorite IDE with the needed tools and frameworks (a reporting tool for example)?
What is from your point of view the greatest Benefit of a BPMS compared to an IDE with your personal tools and frameworks?
OK. Maybe I should be more specific... I got to know one specific BPMS which should make it easy to implement a business process by configuring rules. But for me as a developer it is hard to work with the system. I would like to work with text files which I can refactor and I would like to be able to choose the right technology or framework for the job I have to do. Instead the system forces me to configure.
There are rules where I can use java, but even then I have to stick to the systems editor without intellisense etc.
So this leads me to the answer of my own question - I would like to use the tools I am used to instead of having to learn how to work with a BPMS (at least the one I know) because it limits me more than it helps. The BPMS I know is a framework from which it is hard to escape! At this time, I would prefer a framework like Grail over any BPMS I know.
So maybe the more specific question is: do you feel the same or are there BPMSes which support you in beeing a developer and think like a developer or do most of them force you to do your job a different way?
In my experience the development environments provided by BPMS systems are third rate, unproductive, and practically force you to write hard to maintain, poorly designed code (due to their limitations). Almost all the "features" (UI, integrations, etc) provided by the BPMS system I'm familiar with (the one sold by that company named for its database) were not worth the money we paid.
If you're forced to use BPMS, as a developer, my advice would be to build as much of your application in a conventional development environment, such as Java or .Net, build as little as possible in the BPMS environment itself, and integrate the two. The only things that should go in the BPMS is the minimum to make the business process work.
Not sure what exactly you ask, but the choice BPM vs. plain programming will depend on the requirements. A "business process" is a relatively vague term in software engineering.
Here are a few criterion to evaluate your needs:
complexity of the rules - Are the decisions/rules embodied in your process simple, complicated, configurable, hard-coded?
volatility of the process - How frequently does your process change? Who should be able to make the change?
integration need - Is your process realized using multiple heterogenous services, or is all implemented in the same language?
synchronous/asynchrounous - Is your process "long-running" with the need to handle asynchronous actions?
human tasks - Does your process involves human interaction, with task being assigned/routed to people according to their roles/responsibilities?
monitoring of the process - What is the level of control you want on the existing process instances being executed? Do you need to audit the actions, etc. ?
error handling - Depending on the previous points, how do you plan to deal with errors, or retry of faulty process execution?
Depending on the answer to these questions, you may realize that your process is closer to a simple state chart with a few actions and decisions that can be executed in a sequence, or you may realize that you need something more elaborated, and that you don't want to re-implement all that yourself.
Between plain programming and a full-fledge BPM solution (e.g. Oracle BPM suite which contains BPEL, rule engine, etc.), there are intermediate solutions such as jBPM or Windows Workflow Foundation and probably a lot of others. These intermediate solution are frequently good trade-off.
I have worked with Biztalk in the past and more recently with JBPM. My opinion is biased against BPMs for the following reasons:
Steep learning curve : To make a process work, I have to understand how the system and the editor works. It is hard enough for a developer to understand the system, let alone a business user. The drag and drop and visual representation is a great demo tool. It certainly impresses managers (who ultimately pay for it), but a developer's productivity just drops.
Non developers changing the workflow : I haven't seen one BPM solution do it flawlessly. Though it doesn't look like code, right click on the box and you do have to put some code, otherwise it is not going to work. So you definitely need a developer to do it. The best part is that it is neither developer friendly nor business user friendly, just demo user friendly.
Testablity and refactoring : It is virtually impossible to test drive a BPMS. You do have 'unit test frameworks' advertised, but most of them are hacks and hard to use. Recently I tried the JBPM one; I ended up writing a lot of glue code and fake workflow handlers to make it work. The deal breaker for me though is refactoring. If the business radically changes it's mind about how a business process should look, then good luck re-arranging the boxes, because just re-arranging them won't work, all the variables bound to the boxes also need to be re-arranged. I would prefer the power of the IDE and tests to refactor my business process.
If your application has workflow, then you could try a workflow library (with or without persistent state). It will still manage your workflows without all the bloat that comes with a BPM. If a business user needs to understand the code, then let the business prepare good process flowcharts and translate them into good domain driven code. Use cucumber style acceptance tests to make bring the developers and business together. A BPM is just something that tries to do too many things and ends up doing all those things badly.
BPMS-- a lot of common business case, use case are already implemented. So you just have to know how to use it. For common workflow, you don't even need to write a single line of code, though mostly you would have to write some scripts to cover things that are not yet implemented.
Plain programming-- just use the IDE to hack out the code. The positive side: more control. The negative? A lot of times are spent on rewriting boilerplate code. And you have to maintain them.
So in a nutshell, I would prefer a Business Process Management System. One that I would recommend is ProcessMaker. It features an intuitive process designer that allows you to design workflow with drag and drop. And you can always write trigger to extend the process functionalities. It's open source as well.

How do you organize code in embedded projects?

Highly embedded (limited code and ram size) projects pose unique challenges for code organization.
I have seen quite a few projects with no organization at all. (Mostly by hardware engineers who, in my experience are not typically concerned with non-functional aspects of code.)
However, I have been trying to organize my code accordingly:
hardware specific (drivers, initialization)
application specific (not likely to be reused)
reusable, hardware independent
For each module I try to keep the purpose to one of these three types.
Due to limited size of embedded projects and the emphasis on performance, it is often keep this organization.
For some context, my current project is a limited DSP application on a MSP430 with 8k flash and 256 bytes ram.
I've written and maintained multiple embedded products (30+ and counting) on a variety of target micros, including MSP430's. The "rules of thumb" I have been most successful with are:
Try to modularize generic concepts as much as possible (e.g. separate driver code from application code). -- It makes for easier maintenance and reuse/porting of a project to another target micro in the future.
DO NOT start by worrying about optimized code at the very beginning. Try to solve the domain's problem first and optimize second. -- Your target micro can handle a lot more "stuff" than you might expect.
Work to ensure readability. Although most embedded projects seem to have short development-cycles, the projects often live longer than you might expect and another developer will undoubtedly have to work with your code.
I've worked on 8-bit PIC processors with similar limitations.
One restriction you don't have is how many comments you make or what you choose to name your methods, variables, etc.. Take advantage. Speed and size constraints do sometimes trump organization, but you can always explain.
Another tip is to break up a logical source file into even more pieces than you need, then bind them by #includeing them in a compilation unit. This allows you to have lots of reusable code (even one routine per file) but combine in whatever order you need. This is useful e.g. when trying to meet compilation unit size restrictions, or to pick and choose which common subroutines you need on the next project.
I try to organize it as if I had unlimited RAM and ROM, and it usually works out fine. As mentioned elsewhere, do not try to optimize it until you absolutely need to.
If you can get a pin-compatible processor that has more resources, it's better to get it working on that, concentrating on good structure and layout, then optimize for size later when you understand the code better.
Except under exceptional circumstances (see note), the organisation of your code will have no impact on the final product. (contents of the code are obviously a different matter)
So with that in mind you should organise your code as you would any other project.
With that said, the following are fairly typical:
If this is a processor that you've worked on before, or will be working on in the future, you will usually want to keep a dedicated hardware abstraction layer that can be shared between projects in the future. Typically this module would contain items like routines for managing any uarts, timers etc.
Usually it's reasonable to maintain a set of platform specific code for initialisation and setup that performs all of the configuration and initialisation up to the point where your executive takes over and runs your application. It will also include platform specific hal routines.
The executive/application is probably maintained as a separate module. All of the hardware specific code should be hidden in the hal (as mentioned above).
By splitting your code up like this you also have the option of compiling and running your application as a simulation, on a completely different platform, just by replacing the hardware specific code with routines that mimic the hardware.
This can be good for unit testing and debugging and algorithmic problems you might have.
Exceptional circumstances as might be imposed by unusual compiler restrictions. eg. I've come across some compilers that expect all interrupt service routines to be compiled within a single object file.
I've worked with some sensors like the Tmote Sky, I too have seen poor organization, and I have to admit i have contributed to it. Anyway I'd say that some confusion has to be, because loading too much modules or too much part of program will be (imho) resource killing too, so try to be aware of a threshold between organization and usability on the low resources.
Obviously this don't mean let caos begin, but for example try to get a look on the organization of the tinyOS source code and applications, it's an idea on what I'm trying to say.
Although it is a bit painful, one organization technique that is somewhat common with embedded C libraries is to split every single function and variable into a separate C source file, and then aggregate the resulting collection of O files into a library file.
The motivation for doing this is that for most normal linkers the unit of linkage is an object, for every object you either get the whole object or none of it. Since there is a 1-1 relationship between C files and object files, putting each symbol in it's own C file gives each one it's own object. This in turn lets the linker pull in only that subset of functions and variables that are actually used.
This sort of game doesn't help at all for headers they can happily be left as single files.