How to make monit start processes in order? - monit

In the monit config file, we have a list of processes we expect monit to check for. Each one looks like:
check process process_name_here
with pidfile /path/to/file.pid
start program = "/bin/bash ..."
stop program = "/bin/bash ..."
if totalmem is greater than X MB for Y cycles then alert
if N restarts within X cycles then alert
group group_name
Since we have about 30-40 processes in this list that we monitor, I have two questions:
1) If we restart the services (kill them all), can we have monit start all processes at the same time instead of the way it's done now (sequentially, one by one).
2) Can we specify the order in which we would like the processes to start? How is the order determined? Is it the order that they appear in the conf file? Is it by process name? Anything else? This is especially important if #1 above is not possible...

You can use the depends on syntax. I use this for custom Varnish builds.
For example, process a, process b, and process c. Process a needs to start first, then followed by b and c.
Your first process won't depend on anything. In your check for process b, you'll want:
depends on process a
Then in your process c check, you'll want:
depends on process b
This should make sure that the processes are started in the correct order. Let me know if this works for you.

Going only by documentation, there is nothing related to point one other than the fact that monit runs single-threaded.
As for point two, under "SERVICE POLL TIME":
Checks are performed in the same order as they are written in the .monitrc file, except if dependencies are setup between services, in which case the services hierarchy may alternate the order of the checks.
Note that if you have an include string that matches multiple files they are included in no specific order.
If you require a specific order you should use DEPENDS where possible

Related

How to make one node as owner of particular entry in infinispan distribution mode?

I am currently working on project with infinispan 8.1.3. I want to make sure that the node who created object must be owner of that entry all the time in distribution mode .Is there any option to meet my requirement??. I heard the flag LOCAL_MODE.but, it stores entry in local only .I dont know if that node down, local cahe entry will be shared to another node??. thanks
Don't use flags unless you exactly know what you're doing. Flag.CACHE_MODE_LOCAL means that you won't execute any RPC when doing that operation, but in case that the key does not route to this node, a write will result in a noop and read will return null.
It's not possible to tie the entry to the node exclusively - what would you do if this node crashes?
However, if the cluster is stable enough, there's the Key Affinity Service that will give you a key that belongs to this node. See next chapter about grouping, too, it might fit your use case.
EDIT: Instead moving data to the executing node, you can move the execution towards the data. With Grouping API you can find the data by the group, using
Address owningNode = cache.getAdvancedCache().getDistributionManager()
.getCacheTopology().getDistributionInfo(group).primary();
ClusterExecutor executor = cache.getCacheManager().executor()
.filterTargets(Collections.singleton(owningNode));
executor.submit(...)

Error on Write operation (code 22) after calling Truncate. - C# client

When I try to use Aerospike client Write() I obtain this error:
22 AS_PROTO_RESULT_FAIL_FORBIDDEN
The error occurs only when the Write operation is called after a Truncate() and only on specific keys.
I tried to:
change the key type (string, long, small numbers, big numbers)
change the Key type passed (Value, long, string)
change the retries number on WritePolicy
add a delay (200ms, 500ms) before every write
generate completely new keys (GUID.NewGuid().ToString())
None solved the case so I think the unique cause is the Truncate operation.
The error is systematic; for the same set of keys fail exactly on the same keys.
The error occurs also when after calling the Truncate I wait X seconds and checking the Console Management the Objects number on the Set is "0" .
I have to wait minutes (1 to 5) to be sure that running the process the problem is gone.
The cluster has 3 nodes with replica factor of 2. SSD persistence
I'm using the NuGet C# Aerospike.Client v 3.4.4
Running the process on a single local node (docker, in memory) does not give any error.
How can I know when the Truncate() process (the delete operation behind it) is completely terminated and I can safely use the Set ?
[Solution]
As suggested our devops checked the timespan synchronization. He found that the NTP was not enabled on the machine images (by mistake).
Enabled it. Tested again. No more errors.
Thanks,
Alex
Sounds like a potential issue with time synchronization across nodes, make sure you have ntp setup correctly... That would be my only guess at this point, especially as you are mentioning it does work on a single node. The truncate command will capture the current time (if you don't specify a time) and will use that to prevent records written 'prior' to that time from being written. Check under the (from top of my head, sorry if not exactly this) /opt/aerospike/smd/truncate.smd to see on each node the timestamp of the truncated command and check the time across the different nodes.
[Thanks #kporter for the comment. So the time would be the same in all truncate.smd file, but a time discrepancy between machine would then still cause writes to fail against some of the nodes]

MPI RMA: Order of multiple Lock operations

I was wondering what was the possible order of execution when locking multiple RMA Windows with MPI. Here's the (python) pseudo code of what I want to do:
for win in windows:
win.Lock(0, LOCK_TYPE)
for win in windows:
win.Put(something)
for win in windows:
win.Unlock(0)
print 'Done'
My question is: What are the possible order order of execution of these commands ?
The desired behavior is that it first acquires all locks, then puts the something into the window, and finally release the locks. However, since MPI locking is non-blocking (it only ensures that the RMA operations will be done when the lock is acquired), is it possible that we lock, put, and unlock the second window all before acquiring the first one ?
Finally, is it possible for the print statement to be executed before the Put operations ? What if it was moved after the Put(something) line ?
Thank you very much for your help,
Seba-1511
So, after a bit more research it would seem that yes, it is possible that we lock, put, and unlock on the second window, before doing it on the first one. As such, it is possible that process 1 puts in window 3 and that this put is overwritten by process 2 while process 1 is putting in window 2.
What lock and unlock do is to somehow accumulate all the operations on a RMA window, and if you are using an EXCLUSIVE_LOCK, it will perform them atomically on the target. If using a SHARED_LOCK, then other operations from other processes can be executed in between. Note that the operations between a lock and unlock pair are non-blocking, and thus the execution order is not guaranteed.
print would always appear last, since as underlined by #Jeff, unlock can be considered blocking.
Source: https://www.youtube.com/watch?v=HVzVvg__UK4 at around 1:05:00.

How do I do the Delayed::Job equivalent of Process#waitall?

I have a large task that proceeds in several major steps: Step A must complete before Step B can be started, etc. But each major step can be divided up across multiple processes, in my case, using Delayed::Job.
The question: Is there a simple technique for starting Step B only after all the processes have completed working on Step A?
Note 1: I don't know a priori how many external workers have been spun up, so keeping a reference count of completed workers won't help.
Note 2: I'd prefer not to create a worker whose sole job is to busy wait for the other jobs to complete. Heroku workers cost money!
Note 3: I've considered having each worker examine the Delayed::Job queue in the after callback to decide if it's the last one working on Step A, in which case it could initiate Step B. This could work, but seems potentially fraught with gotchas. (In the absence of better answers, this is the approach I'm going with.)
I think it really depends on the specifics of what you are doing, but you could set priority levels such that any jobs from Step A run first. Depending on the specifics, that might be enough. From the github page:
By default all jobs are scheduled with priority = 0, which is top
priority. You can change this by setting
Delayed::Worker.default_priority to something else. Lower numbers have
higher priority.
So if you set Step A to run at priority = 0, and Step B to run at priority = 100, nothing in Step B will run until Step A is complete.
There's some cases where this will be problematic -- in particular, if you have a lot of jobs and are running a lot of workers, you will probably have some workers running Step B before the work in Step A is finished. Ideally in this setup, Step B has some sort of check to make check if it can be run or not.

changing real time process priority in Linux ..?

My query is regarding engineering the priority value of a process. In my system, process A is running in RR at priority 83. Now I have another process B in RR, I want B's priority to be higher than A (i.e. I want B to be scheduled always compared to A).
To do this, what value should I choose for B. I have read in code that there is a penalty/bonus of 5 depending upon process's history.
Also, If I choose value 84 Or 85, is there any chance in some situations that my process is ignored.
Please help in engineering this value.
Now I got it. Real time tasks(FF/RR) are not governed by penalty/bonus rules. With O(1) scheduler, task with higher priority will be chosen. In my case process B will be scheduled if its priority is greater than process A.
Penalty/bonus is for SCHED_OTHER/SCHED_NORMAL.