Visualizing processes created by fork() using directed graph - process

01 int main(){
02 int a=10;
03 fork();
04 fork();
05 printf("%d ", a);
07 return 0;
08 }
I am looking for an elegant way to visualize the spawned processes with the help of a graph. I came up with an idea, but this does not take into account the order of execution of each of the processes. My idea goes like this:
Process states are represented as ProcessName(InstPointer). InstPointer points to the line number to be executed for a process. A parent process always points to its child process.
Program starts, P is the parent process:
P(1)
Line 3 executes, child process A is spawned:
P(4)
|
|
v
A(4)
Line 4 executes, child processes B, C are spawned:
P(5)
| \
| \
v \
A(5) \
| |
| |
v v
B(5) C(5)
It somewhat helps me visualize the spawned processes, but I am still unsure whether this has any serious pitfalls. If anyone has any better ideas, please feel free to share!

Related

Equal loading for parallel task distribution

I have a large number of independent tasks I would like to run, and I would like to distribute them on a parallel system such that each processor does the same amount of work, and maximizes my efficiency.
I would like to know if there is a general approach to finding a solution to this problem, or possibly just a good solution to my exact problem.
I have T=150 tasks I would like to run, and the time each task will take is t=T. That is, task1 takes 1 one unit of time, task2 takes 2 units of time... task150 takes 150 units of time. Assuming I have n=12 processors, what is the best way to divide the work load between workers, assuming the time it takes to begin and clean up tasks is negligible?
Despite my initial enthusiasm for #HighPerformanceMark's ingenious approach, I decided to actually benchmark this using GNU Parallel with -j 12 to use 12 cores and simulated 1 unit of work with 1 second of sleep.
First I generated a list of the jobs as suggested with:
paste <(seq 1 72) <(seq 150 -1 79)
That looks like this:
1 150
2 149
3 148
...
...
71 80
72 79
Then I pass the list into GNU Parallel and pick up the remaining 6 jobs at the end in parallel:
paste <(seq 1 72) <(seq 150 -1 79) | parallel -k -j 12 --colsep '\t' 'sleep {1} ; sleep {2}'
sleep 73 &
sleep 74 &
sleep 75 &
sleep 76 &
sleep 77 &
sleep 78 &
wait
That runs in 16 mins 24 seconds.
Then I used my somewhat simpler approach, which is just to run big jobs first so you are unlikely to be left with any big ones at the end and thereby get imbalance in CPU load because just one big job needs to run and the rest of your CPUs have nothing to do:
time parallel -j 12 sleep {} ::: $(seq 150 -1 1)
And that runs in 15 minutes 48 seconds, so it is actually faster.
I think the problem with the other approach is that after the first 6 rounds of 12 pairs of jobs, there are 6 jobs left the longest of which takes 78 seconds, so effectively 6 CPUs sit there doing nothing for 78 seconds. If the number of tasks was divisible by the number of CPUs, that would not occur but 150 doesn't divide by 12.
The solution I came to was similar to those mentioned above. Here is the pseudo-code if anyone is interested:
N_proc = 12.0
Jobs = range(1,151)
SerialTime = sum(Jobs)
AverageTime = SerialTime / N_proc
while Jobs remaining:
for proc in range(0,N_proc):
if sum(proc) < AverageTime:
diff = AverageTime - sum(proc)
proc.append( max( Jobs <= diff ) )
Jobs.pop( max( Jobs <= diff ) )
else:
proc.append( min(Jobs) )
Jobs.pop( min(Jobs) )
This seemed to be the optimal method for me. I tried it on many different distributions of job run-times, and it seems to do a decent job of evenly distributing the work, so long as N_proc << N_jobs.
This is a slight modification from largest first, in that each processor first tries to avoid doing more than it's "fair share". If it must go over it's fair share, then it will attempt to stay near the fair answer by grabbing the smallest remaining task from the queue.

Spin verifying properties related to channels

I created a simple model in Spin in which two processes S send messages to another process R. Process R then sends responses to both processes. I would like to define the property "if process x sends a message, then process y eventually receives it", as shown below. The problem is that although the simulation is working as expected, verification is not. The property I defined at line 9 is always passing without errors although I injected a fault at line 17 that should make verification fail. Am I missing something here?
1 byte r1PId;
2 byte s1PId;
3 byte s2PId;
4
5 byte nextM = 1;
6
7 chan toS[2] = [2] of {byte};
8
9 ltl p1 {[] ((s[s1PId]:m > 0) -> <>(s[s1PId]:m == r:m))}
10
11 proctype r(byte id; chan stoR)
12 {
13 do
14 :: true
15 byte c; byte m; byte m2;
16 stoR?c, m2;
17 m = 67; //should be m = m2
18
19 byte response = 50;
20
21 toS[c]!response;
22 od
23 }
24
25 proctype s(byte id; chan rtoS; chan stoR)
26 {
27 byte m;
28 atomic
29 {
30 m = nextM;
31 nextM = nextM+1;
32 }
33 stoR!id, m;
34 byte response;
35 rtoS?response;
36 }
37
38 init{
39 chan toR = [10] of {byte, byte};
40 r1PId = run r(10, toR);
41 s1PId = run s(0, toS[0], toR);
42 s2PId = run s(1, toS[1], toR);
43 }
There seems to be a scope problem. When the process s terminates, its local variables will be out of scope. In that case, the reference s[s1PId]:m will be 0.
On the other hand, in the process r, the variable m is declared inside a block. It is initialized to 0 every time before stoR?c, m2.
As a result, the reference r:m will always be 0 after receiving messages twice.
So, <>(s[s1PId]:m == r:m) will always be true.
To quick fix this, you can either (i) move the declaration byte m in r outside the loop; or (ii) add an infinite loop in s to prevent its termination.

iSpin LTL property evaluation only with activated "assertion violations"?

I am trying to get used to iSpin/Promela. I am using:
Spin Version 6.4.3 -- 16 December 2014,
iSpin Version 1.1.4 -- 27 November 2014,
TclTk Version 8.6/8.6,
Windows 8.1.
Here is an example where I try to use LTL. The verification of the LTL property should produce an error if the two steps in the for loop are non-atomic:
1 #define ten ((n !=10) && (finished == 2))
2
3 int n = 0;
4 int finished = 0;
5 active [2] proctype P() {
6 //assert(_pid == 0 || _pid == 1);
7
8 int t = 0;
9 byte j;
10 for (j : 1 .. 5) {
11 atomic {
12 t = n;
13 n = t+1;
14 }
15 }
16 finished = finished+1;
17 }
18
19 ltl alwaysten {[] ! ten }
In the verification tap I just want to test the LTL property, so I disable all safety properties and activate "use claim". The claim name is "alwaysten".
But it seems that the LTL property is just evaluated if I activate "assertion violations". Why? A collegue is using iSpin v1.1.0 and he does not need to activate this? What am I doing wrong? I want to prove assertions and LTL properties independently...
Here is the trace:
pan: elapsed time 0.002 seconds
To replay the error-trail, goto Simulate/Replay and select "Run"
spin -a 1_2_ConcurrentCounters_8.pml
ltl alwaysten: [] (! (((n!=10)) && ((finished==2))))
C:/cygwin/bin/gcc -DMEMLIM=1024 -O2 -DXUSAFE -w -o pan pan.c
./pan -m10000 -E -a -N alwaysten
Pid: 6980
warning: only one claim defined, -N ignored
(Spin Version 6.4.3 -- 16 December 2014)
+ Partial Order Reduction
Full statespace search for:
never claim + (alwaysten)
assertion violations + (if within scope of claim)
acceptance cycles + (fairness disabled)
invalid end states - (disabled by -E flag)
State-vector 36 byte, depth reached 57, errors: 0
475 states, stored
162 states, matched
637 transitions (= stored+matched)
0 atomic steps
hash conflicts: 0 (resolved)
Stats on memory usage (in Megabytes):
0.024 equivalent memory usage for states (stored*(State-vector + overhead))
0.291 actual memory usage for states
64.000 memory used for hash table (-w24)
0.343 memory used for DFS stack (-m10000)
64.539 total actual memory usage
unreached in proctype P
(0 of 13 states)
unreached in claim alwaysten
_spin_nvr.tmp:8, state 10, "-end-"
(1 of 10 states)
pan: elapsed time 0.001 seconds
No errors found -- did you verify all claims?
This is because your LTL is translated into a claim with an assert statement. See the following automaton.
So, without checking for assertion violations, no error can be found.
(A possible explanation of different behaviors: previous versions of Spin might translate this differently, perhaps using accept instead of assert.)

predefined preemption points among processes

I have forked many child processes and assigned priority and core to each of them. Porcess A executes at period of 3 sec and process B at a period of 6 sec. I want them to execute in such a way that the higher priority processes should preempt lower priority ones only at predefined points and tried to acheive it with semaphores. I have used this same code snippets within the 2 processes with different array values in both.
'bubblesort_desc()' sorts the array in descending order and prints it. 'bubblesort_asc()' sorts in ascending order and prints.
while(a<3)
{
printf("time in sort1.c: %d %ld\n", (int)request.tv_sec, (long int)request.tv_nsec);
int array[SIZE] = {5, 1, 6 ,7 ,9};
semaphore_wait(global_sem);
bubblesort_desc(array, SIZE);
semaphore_post(global_sem);
semaphore_wait(global_sem);
bubblesort_asc(array, SIZE);
semaphore_post(global_sem);
semaphore_wait(global_sem);
a++;
request.tv_sec = request.tv_sec + 6;
request.tv_nsec = request.tv_nsec; //if i add 1ms here like an offset to the lower priority one, it works.
semaphore_post(global_sem);
semaphore_close(global_sem); //close the semaphore file
//sleep for the rest of the time after the process finishes execution until the period of 6
clk = clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &request, NULL);
if (clk != 0 && clk != EINTR)
printf("ERROR: clock_nanosleep\n");
}
I get the output like this whenever two processes get activated at the same time. For example at time units of 6, 12,..
time in sort1.c: 10207 316296689
time now in sort.c: 10207 316296689
9
99
7
100
131
200
256
6
256
200
5
131
100
99
1
1
5
6
7
9
One process is not supposed to preempt the other while one set of sorted list is printing. But it's working as if there are no semaphores. I defined semaphores as per this link: http://linux.die.net/man/3/pthread_mutexattr_init
Can anyone tell me what can be the reason for that? Is there a better alternative than semaphores?
Its printf that's causing the ambiguous output. If the results are printed without '\n' then we get a more accurate result. But its always better to avoid printf statements for real time applications. I used trace-cmd and kernelshark to visualize the behaviour of the processes.

Explain the Differential Evolution method

Can someone please explain the Differential Evolution method? The Wikipedia definition is extremely technical.
A dumbed-down explanation followed by a simple example would be appreciated :)
Here's a simplified description. DE is an optimisation technique which iteratively modifies a population of candidate solutions to make it converge to an optimum of your function.
You first initialise your candidate solutions randomly. Then at each iteration and for each candidate solution x you do the following:
you produce a trial vector: v = a + ( b - c ) / 2, where a, b, c are three distinct candidate solutions picked randomly among your population.
you randomly swap vector components between x and v to produce v'. At least one component from v must be swapped.
you replace x in your population with v' only if it is a better candidate (i.e. it better optimise your function).
(Note that the above algorithm is very simplified; don't code from it, find proper spec. elsewhere instead)
Unfortunately the Wikipedia article lacks illustrations. It is easier to understand with a graphical representation, you'll find some in these slides: http://www-personal.une.edu.au/~jvanderw/DE_1.pdf .
It is similar to genetic algorithm (GA) except that the candidate solutions are not considered as binary strings (chromosome) but (usually) as real vectors. One key aspect of DE is that the mutation step size (see step 1 for the mutation) is dynamic, that is, it adapts to the configuration of your population and will tend to zero when it converges. This makes DE less vulnerable to genetic drift than GA.
Answering my own question...
Overview
The principal difference between Genetic Algorithms and Differential Evolution (DE) is that Genetic Algorithms rely on crossover while evolutionary strategies use mutation as the primary search mechanism.
DE generates new candidates by adding a weighted difference between two population members to a third member (more on this below).
If the resulting candidate is superior to the candidate with which it was compared, it replaces it; otherwise, the original candidate remains unchanged.
Definitions
The population is made up of NP candidates.
Xi = A parent candidate at index i (indexes range from 0 to NP-1) from the current generation. Also known as the target vector.
Each candidate contains D parameters.
Xi(j) = The jth parameter in candidate Xi.
Xa, Xb, Xc = three random parent candidates.
Difference vector = (Xb - Xa)
F = A weight that determines the rate of the population's evolution.
Ideal values: [0.5, 1.0]
CR = The probability of crossover taking place.
Range: [0, 1]
Xc` = A mutant vector obtained through the differential mutation operation. Also known as the donor vector.
Xt = The child of Xi and Xc`. Also known as the trial vector.
Algorithm
For each candidate in the population
for (int i = 0; i<NP; ++i)
Choose three distinct parents at random (they must differ from each other and i)
do
{
a = random.nextInt(NP);
} while (a == i)
do
{
b = random.nextInt(NP);
} while (b == i || b == a);
do
{
c = random.nextInt(NP);
} while (c == i || c == b || c == a);
(Mutation step) Add a weighted difference vector between two population members to a third member
Xc` = Xc + F * (Xb - Xa)
(Crossover step) For every variable in Xi, apply uniform crossover with probability CR to inherit from Xc`; otherwise, inherit from Xi. At least one variable must be inherited from Xc`
int R = random.nextInt(D);
for (int j=0; j < D; ++j)
{
double probability = random.nextDouble();
if (probability < CR || j == R)
Xt[j] = Xc`[j]
else
Xt[j] = Xi[j]
}
(Selection step) If Xt is superior to Xi then Xt replaces Xi in the next generation. Otherwise, Xi is kept unmodified.
Resources
See this for an overview of the terminology
See Optimization Using Differential Evolution by Vasan Arunachalam for an explanation of the Differential Evolution algorithm
See Evolution: A Survey of the State-of-the-Art by Swagatam Das and Ponnuthurai Nagaratnam Suganthan for different variants of the Differential Evolution algorithm
See Differential Evolution Optimization from Scratch with Python for a detailed description of an implementation of a DE algorithm in python.
The working of DE algorithm is very simple.
Consider you need to optimize(minimize,for eg) ∑Xi^2 (sphere model) within a given range, say [-100,100]. We know that the minimum value is 0. Let's see how DE works.
DE is a population-based algorithm. And for each individual in the population, a fixed number of chromosomes will be there (imagine it as a set of human beings and chromosomes or genes in each of them).
Let me explain DE w.r.t above function
We need to fix the population size and the number of chromosomes or genes(named as parameters). For instance, let's consider a population of size 4 and each of the individual has 3 chromosomes(or genes or parameters). Let's call the individuals R1,R2,R3,R4.
Step 1 : Initialize the population
We need to randomly initialise the population within the range [-100,100]
G1 G2 G3 objective fn value
R1 -> |-90 | 2 | 1 | =>8105
R2 -> | 7 | 9 | -50 | =>2630
R3 -> | 4 | 2 | -9.2| =>104.64
R4 -> | 8.5 | 7 | 9 | =>202.25
objective function value is calculated using the given objective function.In this case, it's ∑Xi^2. So for R1, obj fn value will be -90^2+2^2+2^2 = 8105. Similarly it is found for all.
Step 2 : Mutation
Fix a target vector,say for eg R1 and then randomly select three other vectors(individuals)say for eg.R2,R3,R4 and performs mutation. Mutation is done as follows,
MutantVector = R2 + F(R3-R4)
(vectors can be chosen randomly, need not be in any order).F (scaling factor/mutation constant) within range [0,1] is one among the few control parameters DE is having.In simple words , it describes how different the mutated vector becomes. Let's keep F =0.5.
| 7 | 9 | -50 |
+
0.5 *
| 4 | 2 | -9.2|
+
| 8.5 | 7 | 9 |
Now performing Mutation will give the following Mutant Vector
MV = | 13.25 | 13.5 | -50.1 | =>2867.82
Step 3 : Crossover
Now that we have a target vector(R1) and a mutant vector MV formed from R2,R3 & R4 ,we need to do a crossover. Consider R1 and MV as two parents and we need a child from these two parents. Crossover is done to determine how much information is to be taken from both the parents. It is controlled by Crossover rate(CR). Every gene/chromosome of the child is determined as follows,
a random number between 0 & 1 is generated, if it is greater than CR , then inherit a gene from target(R1) else from mutant(MV).
Let's set CR = 0.9. Since we have 3 chromosomes for individuals, we need to generate 3 random numbers between 0 and 1. Say for eg, those numbers are 0.21,0.97,0.8 respectively. First and last are lesser than CR value, so those positions in the child's vector will be filled by values from MV and second position will be filled by gene taken from target(R1).
Target-> |-90 | 2 | 1 | Mutant-> | 13.25 | 13.5 | -50.1 |
random num - 0.21, => `Child -> |13.25| -- | -- |`
random num - 0.97, => `Child -> |13.25| 2 | -- |`
random num - 0.80, => `Child -> |13.25| 2 | -50.1 |`
Trial vector/child vector -> | 13.25 | 2 | -50.1 | =>2689.57
Step 4 : Selection
Now we have child and target. Compare the obj fn of both, see which is smaller(minimization problem). Select that individual out of the two for next generation
R1 -> |-90 | 2 | 1 | =>8105
Trial vector/child vector -> | 13.25 | 2 | -50.1 | =>2689.57
Clearly, the child is better so replace target(R1) with the child. So the new population will become
G1 G2 G3 objective fn value
R1 -> | 13.25 | 2 | -50.1 | =>2689.57
R2 -> | 7 | 9 | -50 | =>2500
R3 -> | 4 | 2 | -9.2 | =>104.64
R4 -> | -8.5 | 7 | 9 | =>202.25
This procedure will be continued either till the number of generations desired has reached or till we get our desired value. Hope this will give you some help.