Can I create One Box Job inside another box job in Autosys by JIL - automation

I want to write a JIL file in which I want One major box then inside this box job another box job and then I want to write a command Job.
I haven't been able to find the syntax and also Is it possible to write one box job inside another box job or not.

Consider the following names:
Major_Box - the parent box
Child_Box - the child box
Job_A - CMD job under the child box
Sample JIL:
insert_job: Major_Box
job_type: BOX
owner: xyz
date_conditions: 1
days_of_week: all
start_times: "06:00"
timezone: IST
insert_job: Child_Box
job_type: BOX
box_name: Major_Box
owner: xyz
date_conditions: 1
days_of_week: all
start_times: "06:15"
timezone: IST
insert_job: Job_A
job_type: CMD
box_name: Child_Box
command: bash /absolute_path/script_name.sh
owner: xyz
machine: hostname
date_conditions: 1
days_of_week: all
start_times: "06:30"
timezone: IST
Here,
Major_Box would be triggered at 6:00, it would activate the Child_box but not trigger it.
At 6:15 the Activated Child_Box would be triggered as the time condition is true, thus activating the Job_A.
At 6:30 the activated Job_A would trigger running the commands as mentioned.
Also, you are include condition of preceding job/box
condition: Success(Another_Job)
Hope this helps. Let me know if you need more.
Good Luck

Related

replace the kscreenlocker_greet

Sorry my English language
I need the following
If you launch the menu -Start -> Shutdown -> Lock
Then you can see the lock screen
Now press CTRL + ALT +F2
let's log in to this terminal under the root user
And let's see the running processes ps -eH | less
We will see the process tree, and who started whom
You can find the following
ksmserver -> kscreenlocker_greet
If you look at how kscreenlocker_greet was launched using ps -aux | grep greet
Then we will see the following
/usr/libexec/kf5/kscreenlocker_greet --immediateLock --ksldfd
Is it possible somewhere (I hope it's not hard code) to replace this command with another command or an application being launched? And if so, where can it be found? Very necessary!
I would like to replace the command call with another application.

SLURM releasing resources using scontrol update results in unknown endtime

I have a program that will dynamically release resources during job execution, using the command:
scontrol update JobId=$SLURM_JOB_ID NodeList=${remaininghosts}
However, this results in some very weird behavior sometimes. Where the job is re-queued. Below is the output of sacct
sacct -j 1448590
JobID NNodes State Start End NodeList
1448590 4 RESIZING 20:47:28 01:04:22 [0812,0827],[0663-0664]
1448590.0 4 COMPLETED 20:47:30 20:47:30 [0812,0827],[0663-0664]
1448590.1 4 RESIZING 20:47:30 01:04:22 [0812,0827],[0663-0664]
1448590 3 RESIZING 01:04:22 01:06:42 [0812,0827],0663
1448590 2 RESIZING 01:06:42 1:12:42 0827,tnxt-0663
1448590 4 COMPLETED 05:33:15 Unknown 0805-0807,0809]
The first lines show everything works fine, nodes are getting released but in the last line, it shows a completely different set of nodes with an unknown end time. The slurm logs show the job got requeued:
requeue JobID=1448590 State=0x8000 NodeCnt=1 due to node failure.
I suspect this might happen because the head node is killed, but the slurm documentation doesn't say anything about that.
Does anybody had an idea or suggestion?
Thanks
In this post there was a discussion about resizing jobs.
In your particular case, for shrinking I would use:
Assuming that j1 has been submitted with:
$ salloc -N4 bash
Update j1 to the new size:
$ scontrol update jobid=$SLURM_JOBID NumNodes=2
$ scontrol update jobid=$SLURM_JOBID NumNodes=ALL
And update the environmental variables of j1 (the script is created by the previous commands):
$ ./slurm_job_$SLURM_JOBID_resize.sh
Now, j1 has 2 nodes.
In your example, your "remaininghost" list, as you say, may exclude the head node that is needed by Slurm to shrink the job. If you provide a quantity instead of a list, the resize should work.

Monit for "cron-like" tasks

Have some batch-type jobs that I would like to move from cron to Monit but am struggling to get them to work properly. These scripts typically run once a day, but on occasion have to be re-ran later in the day. Goal is to take advantage of the monit & m/monit front-ends to re-run as well as be alerted on failure in similar fashion to other things under monit.
The below was my first attempt. I know the docs say to use range/wildcard for minute field but I have my monit daemon set to cycle every 20 seconds so thought I'd be able to get away with this.
check program test.sh
with path "/usr/local/bin/test.sh"
every "0 7 * * *"
if status != 0 then alert
This does not seem to work as it seems like it picks up the exit status of the program on the NEXT run. So I have a zombie process sitting around until 7am the next day, at which time I'll see the status from the previous day's run.
Would be nice if this ran immediate or if there was a way to schedule something as "batch" that would only run once when started (either from command line or web gui). Example below.
check program test.sh
with path "/usr/local/bin/test.sh"
mode batch
if status != 0 then alert
Is it possible to do what I want? Can a 'check program' be scheduled that will only run one time when started or using the 'every [cron]' type syntax supported by monit?
TIA for any suggestions.
The latest version of monit (5.18) now picks up the exit status on the next daemon cycle, not on the next execution of the program like in the past (which might not be until the next day).

Find UID from EUID

I am having a AIX 5.3 host to which we login and when needed uses pbrun tool to become root.Now the question is how do I find from command line as what user I have logged in to get this privileged/root user. If I am not wrong how do I find UID from my current EUID. Tried whoami and who am i both gives output as root.
"who am i" is coming from utmp. If utmp shows you as root then your pbrun tool must be changing it from what it was when you first logged in.
You could do:
ps l $$
which prints out a line with the PID and PPID. Take the PPID and do that again:
ps l <PPID>
The UID column is your numeric user id. If the PPID shows as 1, then pbrun did an exec rather than a folk / exec (which implies that it is a function or alias within your shell). In that case, you could revert to "last" which will show who logged in to which tty at what time.
======
Another idea. You can get the terminal the program is executing on via ps. This is called the controlling terminal. You can also get it via the "tty" command:
tty
/dev/pts/18
Now, feed that to "last" but remove the leading /dev/ part and take the first hit:
last pts/18 | head -1
myname pts/18 myhost.mydomain.com Nov 14 10:22 still logged in.
That is the last person to log into that particular terminal. Will that work?

Sun Grid Engine resubmit job stuck in 'Rq' state

I have what I hope is a pretty simple question, but I'm not super familiar with Sun Grid, so I've been having trouble finding the answer. I am currently submitting jobs to a grid using a bash submission script that generates a command and then executes it. I have read online that if a sun grid job exits with a code of 99, it gets re-submitted to the grid. I have successfully written my bash script to do this:
[code to generate command, stores in $command]
$command
STATUS=$?
if [[ $STATUS -ne 0 ]]; then
exit 99
fi
exit 0
When I submit this job to the grid with a command that I know has a non-zero exit status, the job does indeed appear to be resubmitted, however the scheduler never sends it to another host, instead it just remains stuck in the queue with the status "Rq":
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
2150015 0.55500 GridJob.sh my_user Rq 04/08/2013 17:49:00 1
I have a feeling that this is something simple in the config options for the queue, but I haven't been able to find anything googling. I've tried submitting this job with the qsub -r y option, but that doesn't seem to change anything.
Thanks!
Rescheduled jobs will only get run in queues that have their rerun attribute (FALSE by default) set to TRUE, so check your queue configuration (qconf -mq myqueue). Without this, your job remains in the rescheduled-pending state indefinitely because it has nowhere to go.
IIRC, submitting jobs with qsub -r yes only qualifies them for automatic rescheduling in the event of an exec node crash, and that exiting with status 99 should trigger a reschedule regardless.