Merging 2 Processes -> parent adopting a child - process

Init can reap orphaned processes. Zombies can cause issues with system resources. These are situations where UNIX/Linux can and has done a good job over the years.
With the advent of Docker/Kubernetes, Podman, and other container-based clusters. Managing dependencies between related processes are critical for the horizontal scaling of a system. The logic for each microservice is required to be encapsulated within the service itself. Systems morph over time, functionality, interfaces, inter-dependencies change -- the microservice loses its encapsulation. Horizontal scaling is severely impacted, if not entirely muted.
What if you could write a plugin, launch it and it could stand-alone or it could bind/detach to/from a parent process in real-time as the run-time state requires?
Tightly bound, it could easily extend the parent's capabilities.
What if it could morph the parent too?
Over the life-cycle of the process, could real-time dynamic process adaptation guarantee all-new inter-dependencies would be encapsulated?
If so, existing processes could be extended in novel ways without any new code developed for the parent.
Does anyone know how a parent process can attach/detach a child process using the UNIX/Linux OS?

Related

Running Akka.Net on different processes on same node

I am currently evaluating using akka.net for a planned (complex) application, which is currently considered to be a desktop application but should be open later to offer services also and scale horizontally.
There is the challenge that the application communicates and integrates hardware devices, which drivers may crash and corrupt the whole process. Thus, it is a requirement that those drivers run in a different .NET process/VM so that there is no danger that the main application gets affected by such crashes.
I see currently the following solutions:
a) Span different akka.clusters on different .NET VMs on the same machine to have this isolation. That seems doable but I am unsure about the overhead. My understanding is that within an akka.net cluster there is no possibility to isolate actors (that is for instance an actor with its children) in an own process. Also would I benefit from any akka.net features concerning fault-tolerance (e.g. could I trigger the recreation of the cluster/VM when it is detected that hardware drivers created chaos? )
b)Do not implement the hardware integration module (which commmunicates with the unreliable drivers) with akka but provide a classic .NET service, which communicates with the akka cluster via WCF or other means. This would be feasible but would result in a complex and not holistic architecture, which I do not like.
I would appreciate much if I could get any guidance here.

Does OS know the process state?

Is all the process state such as new, ready, running, waiting and terminated are recognized by the operating system kernel or is it for the convenience of understanding? If it is recognized by operating system, how will it do it?
The process state you are talking about (in contrast to the context, what is called a process state as well in some literature) is solely needed by the OS itself. It is a bookkeeping instrument. As such, it introduces an overhead in hope to get (a.o) a performance gain at other places. E.g., by considering ready processes only, the OS avoids to switch to processes that would only yield to the next one (what would generate superfluous context switches).
The implementation of the concept may differ. Not always the PCB has an explicit data field for the process state. Frequently, the state is implemented by different queues, where processes are sorted to. Sometimes, OS have even an redundant representation of the process state. The representation is a matter of efficiency: E.g. if the OS seeks for some ready process (not caring which), a queue has a complexity of O(1) while a list of PCB with explicit states would require O(n).
To summarize: If the OS wouldn't be aware of the process states they would be superfluous. In what way the state is implemented and how it used differs from system to system.
The problem with the question here is that process states a entirely system specific.
Your first question is largely correct that system states are largely pedagogical constructs for "convenience of understanding."
The operating system has to know the state of the process. That is likely to be maintainted in a variety of ways, including state variables and queues.

General Techniques for Locking or Synchronizing UI/Model/Disk for Cloud Support

I'm implementing a cloud synchronization method in a shoebox application under Mac OS X Lion, and I'm (as expected) quickly running into threading/synchronization problems.
There are a number of components to the application, but I'm feeling a bit overwhelmed trying to prevent deadlocks while at the same time maintaining good separation between the pieces.
For example, my app has a few critical behaviors:
Saves to disk periodically and when a model change occurs. (To support Automatic/Sudden Termination under OS X Lion.)
Is periodically notified when a new serialized model is available on disk.
Updates the UI when a model on disk is loaded into memory, or when notified of any model changes have been made that were not caused by user actions.
Here's a sample use case that causes temporary brain hurt:
User is viewing a list of model items in the application.
Simultaneously, application is notified of a new model on disk. Application loads the model into memory, storing into a temporary variable.
How can the application swap out the old hierarchical data model with the new one, without corrupting the UI thread? For example, there's a main RootElement object, with many ChildElement elements. If the UI thread is iterating over myRootElement.children, how can the myRootElement object be replaced without interfering? (Eg. mutating a list while being enumerated.)
More than a specific solution, I'm looking for any previous resources/techniques/paradigms to help design an app around this kind of concurrency and thread safety.
The apple dev docs are pretty good on this subject.
http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Multithreading/Introduction/Introduction.html
You will need some NSLock knowledge, in particular. It seems NSLock is the nearest equivalent to a Mutex, or CriticalSection if you know those terms.

Why would I consider using an RTOS for my embedded project?

First the background, specifics of my question will follow:
At the company that I work at the platform we work on is currently the Microchip PIC32 family using the MPLAB IDE as our development environment. Previously we've also written firmware for the Microchip dsPIC and TI MSP families for this same application.
The firmware is pretty straightforward in that the code is split into three main modules: device control, data sampling, and user communication (usually a user PC). Device control is achieved via some combination of GPIO bus lines and at least one part needing SPI or I2C control. Data sampling is interrupt driven using a Timer module to maintain sample frequency and more SPI/I2C and GPIO bus lines to control the sampling hardware (ie. ADC). User communication is currently implemented via USB using the Microchip App Framework.
So now the question: given what I've described above, at what point would I consider employing an RTOS for my project? Currently I'm thinking of these possible trigger points as reasons to use an RTOS:
Code complexity? The code base architecture/organization is still small enough that I can keep all the details in my head.
Multitasking/Threading? Time-slicing the module execution via interrupts suffices for now for multitasking.
Testing? Currently we don't do much formal testing or verification past the HW smoke test (something I hope to rectify in the near future).
Communication? We currently use a custom packet format and a protocol that pretty much only does START, STOP, SEND DATA commands with data being a binary blob.
Project scope? There is a possibility in the near future that we'll be getting a project to integrate our device into a larger system with the goal of taking that system to mass production. Currently all our projects have been experimental prototypes with quick turn-around of about a month, producing one or two units at a time.
What other points do you think I should consider? In your experience what convinced (or forced) you to consider using an RTOS vs just running your code on the base runtime? Pointers to additional resources about designing/programming for an RTOS is also much appreciated.
There are many many reasons you might want to use an RTOS. They are varied & the degree to which they apply to your situation is hard to say. (Note: I tend to think this way: RTOS implies hard real time which implies preemptive kernel...)
Rate Monotonic Analysis (RMA) - if you want to use Rate Monotonic Analysis to ensure your timing deadlines will be met, you must use a pre-emptive scheduler
Meet real-time deadlines - even without using RMA, with a priority-based pre-emptive RTOS, your scheduler can help ensure deadlines are met. Paradoxically, an RTOS will typically increase interrupt latency due to critical sections in the kernel where interrupts are usually masked
Manage complexity -- definitely, an RTOS (or most OS flavors) can help with this. By allowing the project to be decomposed into independent threads or processes, and using OS services such as message queues, mutexes, semaphores, event flags, etc. to communicate & synchronize, your project (in my experience & opinion) becomes more manageable. I tend to work on larger projects, where most people understand the concept of protecting shared resources, so a lot of the rookie mistakes don't happen. But beware, once you go to a multi-threaded approach, things can become more complex until you wrap your head around the issues.
Use of 3rd-party packages - many RTOSs offer other software components, such as protocol stacks, file systems, device drivers, GUI packages, bootloaders, and other middleware that help you build an application faster by becoming almost more of an "integrator" than a DIY shop.
Testing - yes, definitely, you can think of each thread of control as a testable component with a well-defined interface, especially if a consistent approach is used (such as always blocking in a single place on a message queue). Of course, this is not a substitute for unit, integration, system, etc. testing.
Robustness / fault tolerance - an RTOS may also provide support for the processor's MMU (in your PIC case, I don't think that applies). This allows each thread (or process) to run in its own protected space; threads / processes cannot "dip into" each others' memory and stomp on it. Even device regions (MMIO) might be off limits to some (or all) threads. Strictly speaking, you don't need an RTOS to exploit a processor's MMU (or MPU), but the 2 work very well hand-in-hand.
Generally, when I can develop with an RTOS (or some type of preemptive multi-tasker), the result tends to be cleaner, more modular, more well-behaved and more maintainable. When I have the option, I use one.
Be aware that multi-threaded development has a bit of a learning curve. If you're new to RTOS/multithreaded development, you might be interested in some articles on Choosing an RTOS, The Perils of Preemption and An Introduction to Preemptive Multitasking.
Lastly, even though you didn't ask for recommendations... In addition to the many numerous commercial RTOSs, there are free offerings (FreeRTOS being one of the most popular), and the Quantum Platform is an event-driven framework based on the concept of active objects which includes a preemptive kernel. There are plenty of choices, but I've found that having the source code (even if the RTOS isn't free) is advantageous, esp. when debugging.
RTOS, first and foremost permits you to organize your parallel flows into the set of tasks with well-defined synchronization between them.
IMO, the non-RTOS design is suitable only for the single-flow architecture where all your program is one big endless loop. If you need the multi-flow - a number of tasks, running in parallel - you're better with RTOS. Without RTOS you'll be forced to implement this functionality in-house, re-inventing the wheel.
Code re-use -- if you code drivers/protocol-handlers using an RTOS API they may plug into future projects easier
Debugging -- some IDEs (such as IAR Embedded Workbench) have plugins that show nice live data about your running process such as task CPU utilization and stack utilization
Usually you want to use an RTOS if you have any real-time constraints. If you don’t have real-time constraints, a regular OS might suffice. RTOS’s/OS’s provide a run-time infrastructure like message queues and tasking. If you are just looking for code that can reduce complexity, provide low level support and help with testing, some of the following libraries might do:
The standard C/C++ libraries
Boost libraries
Libraries available through the manufacturer of the chip that can provide hardware specific support
Commercial libraries
Open source libraries
Additional to the points mentioned before, using an RTOS may also be useful if you need support for
standard storage devices (SD, Compact Flash, disk drives ...)
standard communication hardware (Ethernet, USB, Firewire, RS232, I2C, SPI, ...)
standard communication protocols (TCP-IP, ...)
Most RTOSes provide these features or are expandable to support them

Spread vs MPI vs zeromq?

In one of the answers to Broadcast like UDP with the Reliability of TCP, a user mentions the Spread messaging API. I've also run across one called ØMQ. I also have some familiarity with MPI.
So, my main question is: why would I choose one over the other? More specifically, why would I choose to use Spread or ØMQ when there are mature implementations of MPI to be had?
MPI was deisgned tightly-coupled compute clusters with fast, reliable networks. Spread and ØMQ are designed for large distributed systems. If you're designing a parallel scientific application, go with MPI, but if you are designing a persistent distributed system that needs to be resilient to faults and network instability, use one of the others.
MPI has very limited facilities for fault tolerance; the default error handling behavior in most implementations is a system-wide fail. Also, the semantics of MPI require that all messages sent eventually be consumed. This makes a lot of sense for simulations on a cluster, but not for a distributed application.
I have not used any of these libraries, but I may be able to give some hints.
MPI is a communication protocol while Spread and ØMQ are actual implementation.
MPI comes from "parallel" programming while Spread comes from "distributed" programming.
So, it really depends on whether you are trying to build a parallel system or distributed system. They are related to each other, but the implied connotations/goals are different. Parallel programming deals with increasing computational power by using multiple computers simultaneously. Distributed programming deals with reliable (consistent, fault-tolerant and highly available) group of computers.
The concept of "reliability" is slightly different from that of TCP. TCP's reliability is "give this packet to the end program no matter what." The distributed programming's reliability is "even if some machines die, the system as a whole continues to work in consistent manner." To really guarantee that all participants got the message, one would need something like 2 phase commit or one of faster alternatives.
You're addressing very different APIs here, with different notions about the kind of services provided and infrastructure for each of them. I don't know enough about MPI and Spread to answer for them, but I can help a little more with ZeroMQ.
ZeroMQ is a simple messaging communication library. It does nothing else than send a message to different peers (including local ones) based on a restricted set of common messaging patterns (PUSH/PULL, REQUEST/REPLY, PUB/SUB, etc.). It handles client connection, retrieval, and basic congestion strictly based on those patterns and you have to do the rest yourself.
Although appearing very restricted, this simple behavior is mostly what you would need for the communication layer of your application. It lets you scale very quickly from a simple prototype, all in memory, to more complex distributed applications in various environments, using simple proxies and gateways between nodes. However, don't expect it to do node deployment, network discovery, or server monitoring; You will have to do it yourself.
Briefly, use zeromq if you have an application that you want to scale from the simple multithread process to a distributed and variable environment, or that you want to experiment and prototype quickly and that no solutions seems to fit with your model. Expect however to have to put some effort on the deployment and monitoring of your network if you want to scale to a very large cluster.