predefined preemption points among processes - process

I have forked many child processes and assigned priority and core to each of them. Porcess A executes at period of 3 sec and process B at a period of 6 sec. I want them to execute in such a way that the higher priority processes should preempt lower priority ones only at predefined points and tried to acheive it with semaphores. I have used this same code snippets within the 2 processes with different array values in both.
'bubblesort_desc()' sorts the array in descending order and prints it. 'bubblesort_asc()' sorts in ascending order and prints.
while(a<3)
{
printf("time in sort1.c: %d %ld\n", (int)request.tv_sec, (long int)request.tv_nsec);
int array[SIZE] = {5, 1, 6 ,7 ,9};
semaphore_wait(global_sem);
bubblesort_desc(array, SIZE);
semaphore_post(global_sem);
semaphore_wait(global_sem);
bubblesort_asc(array, SIZE);
semaphore_post(global_sem);
semaphore_wait(global_sem);
a++;
request.tv_sec = request.tv_sec + 6;
request.tv_nsec = request.tv_nsec; //if i add 1ms here like an offset to the lower priority one, it works.
semaphore_post(global_sem);
semaphore_close(global_sem); //close the semaphore file
//sleep for the rest of the time after the process finishes execution until the period of 6
clk = clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &request, NULL);
if (clk != 0 && clk != EINTR)
printf("ERROR: clock_nanosleep\n");
}
I get the output like this whenever two processes get activated at the same time. For example at time units of 6, 12,..
time in sort1.c: 10207 316296689
time now in sort.c: 10207 316296689
9
99
7
100
131
200
256
6
256
200
5
131
100
99
1
1
5
6
7
9
One process is not supposed to preempt the other while one set of sorted list is printing. But it's working as if there are no semaphores. I defined semaphores as per this link: http://linux.die.net/man/3/pthread_mutexattr_init
Can anyone tell me what can be the reason for that? Is there a better alternative than semaphores?

Its printf that's causing the ambiguous output. If the results are printed without '\n' then we get a more accurate result. But its always better to avoid printf statements for real time applications. I used trace-cmd and kernelshark to visualize the behaviour of the processes.

Related

Redis `SCAN`: how to maintain a balance between newcomming keys that might match and ensure eventual result in a reasonable time?

I am not that familiar with Redis. At the moment I am designing some realtime service and I'd like to rely on it. I expect ~10000-50000 keys per minute to be SET with some reasonable EX and match over them using SCAN rarely enough not to bother with performance bottlenecks.
The thing I doubt is "in/out rate" and possible overflooding with keys that might match some SCAN query and thus it never terminates (i.e. always replies with latest cursor position and forces you to continue; that could happen easily if one consumes x items per second and there are x + y items per second coming in with y > 0).
Obviously, I could set desired SCAN size long enough; but I wonder if there exists a better solution or does Redis itself guarantees that SCAN will grow size automatically in such a case?
First some context, solution at the end:
From SCAN command > Guarantee of termination
The SCAN algorithm is guaranteed to terminate only if the size of the
iterated collection remains bounded to a given maximum size, otherwise
iterating a collection that always grows may result into SCAN to never
terminate a full iteration.
This is easy to see intuitively: if the collection grows there is more
and more work to do in order to visit all the possible elements, and
the ability to terminate the iteration depends on the number of calls
to SCAN and its COUNT option value compared with the rate at which the
collection grows.
But in The COUNT option it says:
Important: there is no need to use the same COUNT value for every
iteration. The caller is free to change the count from one iteration
to the other as required, as long as the cursor passed in the next
call is the one obtained in the previous call to the command.
Important to keep in mind, from Scan guarantees:
A given element may be returned multiple times. It is up to the
application to handle the case of duplicated elements, for example
only using the returned elements in order to perform operations that
are safe when re-applied multiple times.
Elements that were not
constantly present in the collection during a full iteration, may be
returned or not: it is undefined.
The key to a solution is in the cursor itself. See Making sense of Redis’ SCAN cursor. It is possible to deduce the percent of progress of your scan because the cursor is really the bits-reversed of an index to the table size.
Using DBSIZE or INFO keyspace command you can get how many keys you have at any time:
> DBSIZE
(integer) 200032
> info keyspace
# Keyspace
db0:keys=200032,expires=0,avg_ttl=0
Another source of information is the undocumented DEBUG htstats index, just to get a feeling:
> DEBUG htstats 0
[Dictionary HT]
Hash table 0 stats (main hash table):
table size: 262144
number of elements: 200032
different slots: 139805
max chain length: 8
avg chain length (counted): 1.43
avg chain length (computed): 1.43
Chain length distribution:
0: 122339 (46.67%)
1: 93163 (35.54%)
2: 35502 (13.54%)
3: 9071 (3.46%)
4: 1754 (0.67%)
5: 264 (0.10%)
6: 43 (0.02%)
7: 6 (0.00%)
8: 2 (0.00%)
[Expires HT]
No stats available for empty dictionaries
The table size is the power of 2 following your number of keys:
Keys: 200032 => Table size: 262144
The solution:
We will calculate a desired COUNT argument for every scan.
Say you will be calling SCAN with a frequency (F in Hz) of 10 Hz (every 100 ms) and you want it done in 5 seconds (T in s). So you want this finished in N = F*T calls, N = 50 in this example.
Before your first scan, you know your current progress is 0, so your remaining percent is RP = 1 (100%).
Before every SCAN call (or every given number of calls that you want to adjust your COUNT if you want to save the Round Trip Time (RTT) of a DBSIZE call), you call DBSIZE to get the number of keys K.
You will use COUNT = K*RP/N
For the first call, this is COUNT = 200032*1/50 = 4000.
For any other call, you need to calculate RP = 1 - ReversedCursor/NextPowerOfTwo(K).
For example, let say you have done 20 calls already, so now N = 30 (remaining number of calls). You called DBSIZE and got K = 281569. This means NextPowerOfTwo(K) = 524288, this is 2^19.
Your next cursor is 14509 in decimal = 000011100010101101 in binary. As the table size is 2^19, we represent it with 18 bits.
You reverse the bits and get 101101010001110000 in binary = 185456 in decimal. This means we have covered 185456 out of 524288. And:
RP = 1 - ReversedCursor/NextPowerOfTwo(K) = 1 - 185456 / 524288 = 0.65 or 65%
So you have to adjust:
COUNT = K*RP/N = 281569 * 0.65 / 30 = 6100
So in your next SCAN call you use 6100. Makes sense it increased because:
The amount of keys has increased from 200032 to 281569.
Although we have only 60% of our initial estimate of calls remaining, progress is behind as 65% of the keyspace is pending to be scanned.
All this was assuming you are getting all keys. If you're pattern-matching, you need to use the past to estimate the remaining amount of keys to be found. We add as a factor PM (percent of matches) to the COUNT calculation.
COUNT = PM * K*RP/N
PM = keysFound / ( K * ReversedCursor/NextPowerOfTwo(K))
If after 20 calls, you have found only keysFound = 2000 keys, then:
PM = 2000 / ( 281569 * 185456 / 524288) = 0.02
This means only 2% of the keys are matching our pattern so far, so
COUNT = PM * K*RP/N = 0.02 * 6100 = 122
This algorithm can probably be improved, but you get the idea.
Make sure to run some benchmarks on the COUNT number you'll use to start with, to measure how many milliseconds is your SCAN taking, as you may need to moderate your expectations about how many calls you need (N) to do this in a reasonable time without blocking the server, and adjust your F and T accordingly.

Weighted Activity Selection Problem with allowing shifting starting time

I have some activities with weights, and I would like to select non overlapping activities by maximizing the total weight. This is known problem and solution exists.
In my case, I am allowed to shift the start time of activities in some extend while duration remains same. This will give me some flexibility and I might increase my utilization.
Example scenario is something like the following where all activities are supposed to be in interval (0-200):
(start, end, profit)
a1: 10 12 120
a2: 10 13 100
a3: 14 18 150
a4: 14 20 100
a5: 120 125 100
a6: 120 140 150
a7: 126 130 100
Without shifting flexibility, I would choose (a1, a3, a6) and that is it. On the other hand I have shifting flexibility to the left/right by at most t units for any task where t is given. In that case I might come up with this schedule and all tasks can be selected except a7 since conflict cannot be avoided by shift .
t: 5
a1: 8 10 120 (shifted -2 to left)
a2: 10 13 100
a3: 14 18 150
a4: 18 24 100 (shifted +4 to right)
a5: 115 120 100 (shifted -5 to left)
a6: 120 140 150
In my problem, total time I have is very big with respect to activity duration. While activities are like 10sec on average, total time I have would even be 10000sec. However that does not mean all of activities can be selected since shifting flexibility would not be enough for some activities to non-overlap.
Also in my problem, there are clusters of activities which overlaps and very big empty space where no activities and there comes another cluster of overlapping activities i.e a1, a2, a3 and a4 are let say cluster1 and a5, a6 and a7 is cluster2. Each cluster can be expanded in time by shifting some of them to left and right. By doing that, I can select more activities than the original activity selection problem. However, I do not know how to decide which tasks to be shifted to left or right.
My expectation is to find an near-optimal solution where total profit would be somehow local optima. I do not need global optimum value. Also I do not have any criteria about cluster utilization., i.e I do not have a guarantee about a minimum number of activity per cluster etc. Actually, these clusters something I visually describe. There is not defined cluster. However, in time domain, activities are separated as clusters somehow.
Also activity start and end times are all integers since I can dismiss fractions. I would have around 50 activities whose duration would be 10 on average. And time window is like 10000.
Are there any feasible solution to this problem?
You mentioned that you can partition the activities into clusters that don't overlap even if activities within them are shifted to the extent. Each of these clusters can be considered independently, and the optimal results computed for each cluster simply summed up for the final answer. So the first step of the algorithm could be a trial run that extends all activities in both directions, finds which ones form clusters, and process each cluster independently. In the worst case, all of the activities might form a single cluster.
Depending on the maximum size of the remaining clusters, there are several approaches. If it's under 20 (or even 30, depending on whether you want your program to run in seconds or minutes), you could combine search over all subsets of activities in the given cluster with a greedy approach. In other words: if you are processing a subset of N elements, try every one of its 2^N possible subsets (okay, 2^N-1 if we forget the empty subset), check whether the activities in this specific subset can be scheduled in non-overlapping manner, and pick the subset that is eligible and has maximum sum.
How do we check that a given subset of activities can be scheduled in non-overlapping manner? Let's sort them in ascending order of their end and process them from left to right. For every activity, we try to schedule it as early as possible, making sure it does no intersect with activities we already considered. So, the first activity in the cluster is always started time t earlier than originally planned, the second one is started either when the first one ends, or t earlier than originally planned, whichever is larger, and so on. If at any point we can't schedule the next activity in a way that it does not overlap with previous one, then there is no way to schedule the activities in this subset in a non-overlapping manner. This algorithm takes O(NlogN) time, and overall each cluster is processed in O(2^N * NlogN). Once again, note that this function grows very quickly, so if you are dealing with large enough clusters, this approach goes out the window.
===
Another approach is specific to the additional restrictions you provided. If the activities' starts and ends and parameter t are all measured in integer number of seconds, and t is about 2 minutes, then the problem for each cluster is set in a small discrete space. Even though you could position a task to start at a non-integer second value, there always is an optimal solution that uses only integers. (To prove it, consider an optimal solution that does not use integers - since t is integer, you can always shift tasks, starting from the leftmost, to the left a bit so that it starts at an integer value.)
Knowing that the start and end times are discrete, you can build a DP solution: process the activities in the ascending order of their end*, and memoize the maximum possible sum of weights you can obtain from the first 1, 2, ..., N activities for each x from activity_start - t to activity_start + t if a given activity ends at time x. If we denote this memoized function as f[activity][end_time], then the recurrence relation is f[a][e] = weight[a] + max(f[i][j] over all i < a, j <= e - (end[a] - start[a]), which roughly translates to "if activity a ended at time e, the previous activity must have ended at or before start of a - so let's pick the maximum total weight over previous activities and their ends, and add the current activity's weight".
*Again, we can prove that there is at least one optimal answer where this ordering is preserved, even though there might be other optimal answers which do not possess this property
We could go further and eliminate the iteration over previous activities, instead encoding this information in f. Its definition would then change to "f[a][e] is the maximum possible total weight of the first a activities if none of them ends after e", and recurrence relation would become f[a][e] = max(f[a-1][e], weight[a] + max(f[a-1][i] over i <= e - (end[a] - start[a])])), and its computational complexity would be O(X * N), where X is the total span of the discrete space where task starts/ends are placed.
I assume you need to compute not just the maximum possible weight, but also the activities you need to select to obtain it, and possibly even the exact time each of them needs to be started. Thankfully, we can derive all of this from the values of f, or compute it at the same time as we compute f. The latter is easier to reason about, so let's introduce a second function g[activity][end]. g[activity][end] returns a pair (last_activity, last_activity_end), essentially pointing us to the exact activity and its timing that the optimal weight in f[activity][end] uses.
Let's go through the example you provided to illustrate how this works:
(start, end, profit)
a1: 10 12 120
a2: 10 13 100
a3: 14 18 150
a4: 14 20 100
a5: 120 125 100
a6: 120 140 150
a7: 126 130 100
We order the activities by their end time, thereby swapping a7 and a6.
We initialize the values of f and g for the first activity:
f[1][7] = 120, f[1][8] = 120, ..., f[1][17] = 120, meaning that the first activity could end anywhere from 7 to 17, and costs 120. f[1][i] for all other i should be set to 0.
g[1][7] = (1, 7), g[1][8] = (1, 8), ..., g[1][17] = (1, 17), meaning that the last activity that was included in f[1][i] values was a1, and it ended at i. g[1][i] for all i outside [7, 17] is undefined/irrelevant.
That's where something interesting begins. For each i such that a2 cannot end at time i, let's assign f[2][i] = f[1][i], g[2][i] = g[1][i], which essentially means that we wouldn't be using activity a2 in those answers. For all other i, namely, in [8..18] interval, we have:
f[2][8] = max(f[1][8], 100 + max(f[1][0..5])) = f[1][8]
f[2][9] = max(f[1][9], 100 + max(f[1][0..6])) = f[1][9]
f[2][10] = max(f[1][10], 100 + max(f[1][0..7])). This is the first time when the second clause is not just plain 100, as f[1][7]>0. It is, in fact, 100+f[1][7]=220, meaning that we can take activity a2, shift it in a way that puts its end at time 10, and get a total weight of 220. We continue computing f[2][i] this way for all i <= 18.
The values of g are: g[2][8]=g[1][8]=(1, 8), g[2][9]=g[1][9]=(1, 9), g[2][10]=(2, 10), because it was optimal to take activity a2 and end it at time 10 in this case.
I hope the pattern of how this continues is visible - we compute all the values of f and g through the end, and then pick the maximum f[N][e] over all possible end times e of the last activity. Armed with the auxiliary function g, we can traverse the values backwards to figure out the exact activities and times. Namely, the last activity we use and its timing is in g[N][e]. Let's call them A and T. We know that A began at T-(end[A]-start[A]). Then, the previous activity must have ended at that point or before - so let's look at g[A-1][T-(end[A]-start[A]) for it, and so on.
Note that this approach works even if you don't partition anything into clusters, but with the partitioning, the size of the space in which tasks can be scheduled is reduced, and with it the runtime.
You might notice that neither of these solutions is polynomial in the size of input. I have a feeling that your problem doesn't have a general polynomial solution, but I was unable to prove it by reducing another NP-complete problem to it. Would be really curious to read a reduction / better general solution!

Why is the time complexity of Heap Sort, O(nlogn)?

I'm trying to understand time complexity of different data structures and began with heap sort. From what I've read, I think collectively people agree heap sort has a time complexity of O(nlogn); however, I have difficulty understanding how that came to be.
Most people seem to agree that the heapify method takes O(logn) and buildmaxheap method takes O(n) thus O(nlogn) but why does heapify take O(logn)?
From my perspective, it seems heapify is just a method that compares a node's left and right node and properly swaps them depending if it is min or max heap. Why does that take O(logn)?
I think I'm missing something here and would really appreciate if someone could explain this better to me.
Thank you.
It seems that you are confusing about the time complexity about heap sort. It is true that build a maxheap from an unsorted array takes your O(n) time and O(1) for pop one element out. However, after you pop out the top element from the heap, you need to move the last element(A) in your heap to the top and heapy for maintaining heap property. For element A, it at most pop down log(n) times which is the height of your heap. So, you need at most log(n) time to get the next maximum value after you pop maximum value out. Here is an example of the process of heapsort.
18
15 8
7 11 1 2
3 6 4 9
After you pop out the number of 18, you need to put the number 9 on the top and heapify 9.
9
15 8
7 11 1 2
3 6 4
we need to pop 9 down because of 9 < 15
15
9 8
7 11 1 2
3 6 4
we need to pop 9 down because of 9 < 11
15
11 8
7 9 1 2
3 6 4
9 > 4 which means heapify process is done. And now you get the next maximum value 15 safely without broking heap property.
You have n number for every number you need to do the heapify process. So the time complexity of heapsort is O(nlogn)
You are missing the recursive call at the end of the heapify method.
Heapify(A, i) {
le <- left(i)
ri <- right(i)
if (le<=heapsize) and (A[le]>A[i])
largest <- le
else
largest <- i
if (ri<=heapsize) and (A[ri]>A[largest])
largest <- ri
if (largest != i) {
exchange A[i] <-> A[largest]
Heapify(A, largest)
}
}
In the worst case, after each step, largest will be about two times i. For i to reach the end of the heap, it would take O(logn) steps.

labview while loop execution condition

Is there a way to give while loop a condition which makes it give an an output every ten times' executions, however it continues running after this output?
I hope I made myself clear...
Thanks!
Amy
Modulo is useful for this.
As an example; In swift to do modulo you use the % symbol. Essentially modulo outputs the remainder of the given terms.
So;
Value 1 MODULO Value 2 outputs Remainder.
Furthermore;
6 % 2 = 0
6 % 5 = 1
6 % 4.5 = 1.5
Essentially you want every nth element to output a value, with n being the rate. You need to track how many loops of the while you have gone through already.
The code below will run through the while 1000 times, and print out every 10 times ( for a total of 100 prints of output. )
var execution : Int = 0
while ( execution != 1000 ) {
if ( execution % 10 == 0 ) {
print("output")
}
execution = execution + 1
}
Here is the same answer as given by adam but then in Labview.

Equal loading for parallel task distribution

I have a large number of independent tasks I would like to run, and I would like to distribute them on a parallel system such that each processor does the same amount of work, and maximizes my efficiency.
I would like to know if there is a general approach to finding a solution to this problem, or possibly just a good solution to my exact problem.
I have T=150 tasks I would like to run, and the time each task will take is t=T. That is, task1 takes 1 one unit of time, task2 takes 2 units of time... task150 takes 150 units of time. Assuming I have n=12 processors, what is the best way to divide the work load between workers, assuming the time it takes to begin and clean up tasks is negligible?
Despite my initial enthusiasm for #HighPerformanceMark's ingenious approach, I decided to actually benchmark this using GNU Parallel with -j 12 to use 12 cores and simulated 1 unit of work with 1 second of sleep.
First I generated a list of the jobs as suggested with:
paste <(seq 1 72) <(seq 150 -1 79)
That looks like this:
1 150
2 149
3 148
...
...
71 80
72 79
Then I pass the list into GNU Parallel and pick up the remaining 6 jobs at the end in parallel:
paste <(seq 1 72) <(seq 150 -1 79) | parallel -k -j 12 --colsep '\t' 'sleep {1} ; sleep {2}'
sleep 73 &
sleep 74 &
sleep 75 &
sleep 76 &
sleep 77 &
sleep 78 &
wait
That runs in 16 mins 24 seconds.
Then I used my somewhat simpler approach, which is just to run big jobs first so you are unlikely to be left with any big ones at the end and thereby get imbalance in CPU load because just one big job needs to run and the rest of your CPUs have nothing to do:
time parallel -j 12 sleep {} ::: $(seq 150 -1 1)
And that runs in 15 minutes 48 seconds, so it is actually faster.
I think the problem with the other approach is that after the first 6 rounds of 12 pairs of jobs, there are 6 jobs left the longest of which takes 78 seconds, so effectively 6 CPUs sit there doing nothing for 78 seconds. If the number of tasks was divisible by the number of CPUs, that would not occur but 150 doesn't divide by 12.
The solution I came to was similar to those mentioned above. Here is the pseudo-code if anyone is interested:
N_proc = 12.0
Jobs = range(1,151)
SerialTime = sum(Jobs)
AverageTime = SerialTime / N_proc
while Jobs remaining:
for proc in range(0,N_proc):
if sum(proc) < AverageTime:
diff = AverageTime - sum(proc)
proc.append( max( Jobs <= diff ) )
Jobs.pop( max( Jobs <= diff ) )
else:
proc.append( min(Jobs) )
Jobs.pop( min(Jobs) )
This seemed to be the optimal method for me. I tried it on many different distributions of job run-times, and it seems to do a decent job of evenly distributing the work, so long as N_proc << N_jobs.
This is a slight modification from largest first, in that each processor first tries to avoid doing more than it's "fair share". If it must go over it's fair share, then it will attempt to stay near the fair answer by grabbing the smallest remaining task from the queue.