What is the difference between a job pid and a process id in *nix? - process

A job is a pipeline of processes. After I execute a command line, for example, sleep 42 & , the terminal will give me some information like this
[1] 31562
Is this 31562 the "job pid" of this job? Is it the same as the process of the ls command?
And if I have a command with a pipe, there will be more than one process created, is the job pid same as the process id of the first process of the pipeline?

A job is a pipeline of processes.
Not necessarily. Though most of the times a job is comprised of a processes pipeline, it can be a single command, or it can be a set of commands separated by &&. For example, this would create a job with multiple processes that are not connected by a pipeline:
cat && ps u && ls -l && pwd &
Now, with that out of the way, let's get to the interesting stuff.
Is this 31562 the "job pid" of this job? Is it the same as the process
of the ls command?
The job identifier is given inside the square brackets. In this case, it's 1. That's the ID you'll use to bring it to the foreground and perform other administrative tasks. It's what identifies this job in the shell.
The number 31562 is the process group ID of the process group running the job. UNIX/Linux shells make use of process groups: a process group is a set of processes that are somehow related (often by a linear pipeline, but as mentioned before, not necessarily the case). At any moment, you can have 0 or more background process groups, and at most one foreground process group (the shell controls which groups are in the background and which is in the foreground with tcsetpgrp(3)).
A group of processes is identified by a process group ID, which is the ID of the process group leader. The process group leader is the process that first created and joined the group by calling setpgid(2). The exact process that does this depends on how the shell is implemented, but in bash, IIRC, it is the last process in the pipeline.
In any case, what the shell shows is the process group ID of the process group running the job (which, again, is really just the PID of the group leader).
Note that the group leader may have died in the past; the process group ID won't change. This means that the process group ID does not necessarily correspond to a live process.

Related

TWS Composer command to extract only active jobs

I have a couple of queries wrt the Tivoli Work Scheduler composer -
Below command extracts all jobstreams from workstation irrespective of their status i.e. both Activbe and Draft/Inactive jobs. How could I extract only the active jobstreams
create jobstreams.txt from sched=WORKSTATION##
Similarly I would need to extract the jobs associated to active jobstreams only.
create jobs.txt from jobs=WORKSTATION##

Running Jobs in Parallel for same project and different branches

What do I need to change in order to run these jobs in parallel?
There is one more runner available on the server, but it's not picking up the "pending" job until the "running" one is finished.
UPDATE
The jobs are picked up by different runners, but in a sequential mode. See ci-runner-1 and ci-runner-2.
See screenshots
The problem was that in config.toml (/etc/gitlab-runner/config.toml in my case) I had:
concurrent = 1.
Changed this to 0 or a value > 1, restart gitlab-runner and all good.
Reference:
https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section

How to execute X times a Job Executor step

Introduction
To keep it simple, let's imagine a simple transformation.
This transformation gets an input of 4 rows, from a Data Grid step.
The stream passes through a Job Executor, referencing to a simple job, with a Write Log component.
Expectations
I would like the simple job executes 4 times, that means 4 log messages.
Results
It turns out that the Job Executor step launches the simple job only once, instead of 4 times : I only have one log message.
Hints
The documentation of the Job Executor component specifies the following :
By default the specified job will be executed once for each input row.
This is parametrized in the "Row grouping" tab, with the following field :
The number of rows to send to the job: after every X rows the job will be executed and these X rows will be passed to the job.
Answer
The step actually works well : an input of X rows will execute a "Job Executor" step X times. The fact is I wasn't able to see it with the logs.
To verify it, I have added a simple transformation inside the "Job Executor" step, which writes into a text file. After I have checked this file, it appeared that the "Job Executor" was perfectly executed X times.
Research
Trying to understand why I didn't have X log messages after the X times execution of "Job Executor", I have added a "Wait for" component inside the initial simple job. Finally, adding two seconds allowed me to see X log messages appearing during the execution.
Hope this helps because it's pretty tricky. Please feel free to provide further details.
A little late to the party, as a side note:
Pentaho is a set of programs (Spoon, Kettle, Chef, Pan, Kitchen), The engine is Kettle, and everything inside transformations is started in parallel. This makes log retrieving a challenging task for Spoon (the UI), you don't actually need a Wait for entry, try outputting the logs into a file (specifying a log file on the Job executor entry properties) and you'll see everything in place.
Sometimes we need to give Spoon a little bit of time to get everything in place, personally that's why I recommend not relying on Spoon Execution Results logging tab, it is better to output the logs to a DB or files.

Calculating time of a job in HPC

I'm start to use Cloud resources
In my project I need to run a job, then I need to calculate the time between the begin of the execution of the job in the queue and the end of the job
To I put the job in the queue, I used the command:
qsub myjob
How can I do this?
Thanks in advance
the simplest (although not most accurate) way is to get the report of your queue system. If you use PBS (which is my first guess from your qsub command), you can insert in your script the options:
#PBS -m abe
#PBS -M your email
This sends you a notification at start (b) end (e) and abort(a). The end notification should have a resources_used.walltime with the information of the wall time spent.
If you use another queue system, there must be some similar option in the manual.

Running same Kettle Job from two different scripts Issue

Is it possible to run a kettle job simultaneously more than once at the same time?
What I am Trying
Say we are running this script twice at a same time,
sh kitchen.sh -rep="development" -dir="job_directory" -job="job1"
If I run it only once at a point of time, data-flow is perfectly fine.
But, when I run this command twice at the same time, it throws error like:
ERROR 09-01 13:34:13,295 - job1 - Error in step, asking everyone to stop because of:
ERROR 09-01 13:34:13,295 - job1 - org.pentaho.di.core.exception.KettleException:
java.lang.Exception: Return code 1 received from statement : mkfifo /tmp/fiforeg
Return code 1 received from statement : mkfifo /tmp/fiforeg
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.execute(MySQLBulkLoader.java:140)
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.processRow(MySQLBulkLoader.java:267)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.lang.Exception: Return code 1 received from statement : mkfifo /tmp/fiforeg
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.execute(MySQLBulkLoader.java:95)
... 3 more
It's important to run the jobs simultaneously twice at a same time. To accomplish this, I can duplicate every job and run the original and the duplicate job at a point of time. But, not a good approach for long run!
Question:
Is Pentaho not maintaining threads?
Am I missing some option, or can I enable some option to make pentaho create different threads for different job instances?
Of course Kettle maintains threads. A great many of them in fact. It looks like the problem is that the MySQL bulk loader uses a FIFO. You have two instances of a FIFO called /tmp/fiforeg. The first instance to run creates the FIFO just fine; the second then tries to create another instance with the same name and that results in an error.
At the start of the job, you need to generate a unique FIFO name for that instance. I think you can do this by adding a transformation at the start of the job that uses a Generate random value step to generate a random string or even a UUID and store it in a variable in the job via the Set variables step.
Then you can use this variable in the 'Fifo file' field of the MySQL bulk loader.
Hope that works for you. I don't use MySQL, so I have no way to make sure.