I was reading Linux Kernel development and trying to understand process address space semantics in case of fork(). While I'm reading in context of Kernel v2.6, and in newer versions, any of child or parent may run first, I am confused with following:
Back in do_fork(), if copy_process() returns successfully, the new child is woken up
and run. Deliberately, the kernel runs the child process first. In the common case of the
child simply calling exec() immediately, this eliminates any copy-on-write overhead
Based on my understanding of COW, if an exec () is used, COW will always happen, whether child or parent process runs first. Can someone explain how is COW eliminated in case of child running first? Does 'overhead' refer to an extra overhead that comes with COW instead of 'always copy' semantics?
fork() creates a copy of the parent's memory address space where all memory pages are initially shared between the parent and the child. All pages are markes as read-only, and on the first write to such a page, the page is copied so that parent and child have their own. (This is what COW is about.)
exec() throws away the entire current address space and creates a new one for the new program.
If the child executes first and calls exec(), the none of the shared pages needs to be unshared.
If the parent executes first and modifies some data, then these pages are unshared. If the child then starts executing and calls exec(), the copied pages will be thrown away, i.e., the unsharing was not actually necessary.
Related
I was just introduced to the idea of a process.
The book defines a process as "an instance of the running program".
I am still a little confused as to what this means. It seems to me that a process is a particular instruction that a program is running? Or not?
What is the difference between a function call and a process? For instance let us say we have a function called main, and within it we are calling the printf function. Does printf count as a separate process? Why/why not?
What makes something a child vs parent process? I know that one way to create child processes is by calling fork(). And then based on the integer value that fork returns, we can either be in the child vs in the parent process. But other than is there something that makes something a parent vs a child process?
Also based on the answer on question 2, would the printf count as a child process?
Talking strictly in terms of linux processes are "instances" of the programs as the book mentions. That means that they contain the information that your program needs to "execute".
The process doesn't mean the instruction that the program is running, it means the entire running program. The program you are referring to is I am assuming the code that you write, but that is just one aspect of the process. There are various other attributes like the stack memory space, heap memory space and process ID etc. and all these details are stored in a datastructure called process control block(PCB).
Suppose you have a compiled version of your code "Fibonacci.c" called fibonacci, if you run it from two different terminals it would spawn "two processes" of the same program.
Function calls are something that happen inside a process. printf would happen in the same function. It doesn't count as a separate process as it is executing inside the same entity.
fork can create child processes. As a rule of thumb I would say that any process that is created inside our current process would be a child process. Though this might not be a strict definition. What fork does is duplicate the current process, that means that it creates a new entry by creating a new PCB. It has the same code segment as the process that calls the fork but it will have its own memory space, process ID etc. I will not go deeper into how memory is handled when a fork occurs but you can read more about it in the man pages.
printf also is not a child process. It resides in the current process itself.
I'm trying to de-spaghetti a big UI by creating SubVIs that handle only the controls that are relevant, via control refnums.
Now, when extracting the code from the main VI and re-wiring into the subVIs, things get clutter-y.
To read/write these refnums, I have to do a two-step process. First add a terminal to get the control refnum value and then another to get the value of the control.
Wiring the refnums everywhere is not really an option as that will create more spaghetti if there are more than two of them. (usually 4-10)
Is there a better way?
UPDATE
Guys, this is a low-level question about the picture above, not really a queston about large scale architecture / design patterns. I'm using QMH, classes, et.al. where appropriate.
I just feel there should be a way to get the typed value from a typed control ref in one step. It feels kind of common.
In the caller VI, where the controls/indicators actually live, create all your references, then bundle them into clusters of relevant pieces. Pass the clusters into your subVIs, giving a given subVI only the cluster it needs. This both keeps your conpane cleaned up and and makes it clear the interface that each subVI is talking to. Instead of a cluster, you may want to create a LV class to further encapsulate and define the sub-UI operations, but that's generally only on larger projects where some components of the UI will be reused in other UIs.
I'm not sure there is a low-touch way to de-spaghetti a UI with lots of controls and indicators.
My suggestion is to rework the top-level VI into a queued message handler, which would allow you to decouple the user interaction from the application's response. In other words, rather than moving both the controls and the code that handles their changes to subVIs (as you're currently doing), this would keep the controls where they are (so you don't need to use ref nums and property nodes) and only move the code to subVIs.
This design pattern is built-in to recent versions of LabVIEW: navigate to File » Create Project to make LabVIEW generate a project you can evaluate. For more information about understanding how to extend and customize it, see this NI slide deck: Decisions Behind the Design of the
Queued Message Handler Template.
In general, it is not the best practice to read/write value using refnum in perspective of performance. It requires a thread swap to the UI thread each time (which is a heavy process), whereas the FP Terminal is privileged to be able to update the panel without switching execution threads and without mutex friction.
Using references to access value
Requires to update the front panel item every single time they are called.
They are a pass by reference function as opposed to a pass by value function. This means they are essentially pointers to specific memory locations. The pointers must be de-referenced, and then the value in memory updated. The process of de-referencing the variables causes them to be slower than Controls/Indicators, or Local Variables.
Property Nodes cause the front panel of a SubVI to remain in memory, which increases memory use. If the front panel of a SubVI is not displayed, remove property nodes to decrease memory use.
If after this you want to use this method you can use VI scripting to speed up the process: http://sine.ni.com/nips/cds/view/p/lang/en/nid/209110
My understanding is that when a parent forks, the child becomes an exact copy of the parent. In other words, they have the same process control block (PCB). Is this completely correct? I know that the pid will obviously be different but is that it?
Each process has its own process control block. When the parent forks the child's process control block will normally start as a duplicate of the parent however it is changed (for instance one of the first is the PID) and as the child does its own thing, the child's process control block will become less of a duplicate of the parent.
Here are some slides that describes an abstract operating system process control and the process control block.
The actual specifics will vary depending on the particular operating system.
Today I attended a lecture about linux processes. The teacher stated that:
after fork() returns, child process is ready to be executed
because of Copy On Write mechanism, fork-exec sequence is guaranteed to prevent unnecessary copying of parent's memory
By fork-exec sequence I mean something like that:
if(!fork())
{
exec(...);
}
i = 0;
Which, as far as I know translates into this (written in pseudo-asm):
call fork
jz next
call exec(...)
next:
load 0
store i
Let's assume that parent has been granted enough CPU time to execute all the lines above in one run.
fork returns 0, so line 3 is skipped
when 0 is stored in "i" child haven't yet exec'ed, so COW kicks in
copying (unnecessarily) parent's memory.
So how is unnecessary copying prevented in this case?
It looks like it isn't, but I think linux developers were smart enough to do it ;)
Possible answer: child always runs first (parent is preemted after calling fork())
1. Is that true?
2. If yes, does that guarantee prevention of unnecessary copying in all cases?
Basically two people can read the same book. But if one starts writing notes in the margin then the other person needs a copy of that page before that occurs. The person that has not written into the margin of the page does not want to see the other persons notes in the book.
The answer is essentially that necessary copying - of pages hosting any data which gets changed - happens, while unnecessary copying - of pages which have not been changed by either process since the fork - does not.
The latter would typically include not only unmodified data, but also those holding the program itself and shared libraries it has loaded - typically many pages that can be shared, vs. just a few which must be duplicated.
Once the child calls an exec function, the sharing (and any need for future copy-on-write) is terminated.
I have two instances of NSManagedObjectContext: one is used in main thread and the other is used in a background thread (via an NSOperation.) For thread safety, these two contexts only share an NSPersistentStoreCoordinator.
The problem I'm having is with pending changes in the first context (on the main thread) not being available to the second context until a -save is performed. This is understandable since the shared persistent store won't have copies of the NSManagedObjects being tracked by -insertedObjects, -updatedObjects, and -deletedObjects are persisted.
Unfortunately, this presents a problem with the user experience: any unsaved changes won't appear in the (time consuming) reports that are generated in the background thread.
The only solution I can think of is nasty: take the inserted, updated and deleted objects from the first context and graft them onto the object graph of the second context. There are some pretty complex relations in the dataset, so I'm hesitant to go in this direction. I'm hoping someone here as a better solution.
If this is under 10.7 there are some solutions: one is you can have nested ManagedObjectContexts, so you can “save” in the one being modified and it won’t save all the way to the disk, but it will make the changes available to other children of the master context.
Before 10.7 you will probably have to copy the changes over yourself. This isn’t super-hard since you can just have a single object listen for NSManagedObjectContextObjectsDidChangeNotification and then just re-apply the changes exactly from the main context. (Should be about 20 lines of code.) You never have to save this second context I assume?
Not sure if you have any OS restraints but in iOS 5 / Mac OS 10.7 you can use nested managed object contexts to accomplish this. I believe a child context is able to pull in unsaved changes in the parent by simply doing a new fetch.
Edit: Looks like Wil beat me to it but yeah, prior to iOS 5 / Mac OS 10.7 you'll have to listen for the NSManagedObjectContextDidSaveNotification and take a look at the userInfo dictionary for the added/updated/deleted objects.
An alternate solution might involve using a single managed object context and providing your own thread safety over access to it, or use the context's lock and unlock methods.
I would try to make the main thread do a normal save so the second context can just merge the changes into his context. "fighting" a APIs intended use is never an good idea.
You could mark the newly saved record with an attribute as intermediate and delete later if the user finally cancels the edit.
Solving those problems with attributes in your entities and querying in the background thread with a matching predicated would be easy...
And that would be a stable solution as well. I am coming from a database driven world (oracle) we often use such patterns (status attributes in records) to make data visible/invisible to other DB sessions (which would equal to threads in an cocoa app). Works always without problems. Other threads /sessions do always only see commited changes that's how most RDBMS work.