Subsequent jobs in gitlab-ci - gitlab-ci

Is there any way to run jobs from the stage in subsequent order? I've tried to do it with dependecies
job1:
stage:deploy
...
job2:
stage:deploy
dependencies:
- job1
but it gives me an error "dependency job1 is not defined in prior stages".
Is there any workaround?

No. This is not possible by design. You will have to define more stages.
As the stages docs describe:
Jobs of the same stage are run in parallel.

It might become possible at some point in the future (as of January 2021). Progress is being tracked here

You might have found out the answer by now, but still answering for future audiences coming to post when facing similar issue.
The error itself says "dependency job1 is not defined in prior stages", in your example both jobs having same name i.e "stage: deploy".
so that is the reason its not picking up the dependencies rule, also with new gitlab version, need clause can be used now.
Job1:
Stage: A
Job2:
Stage: B
needs: ["Job1"]
This way, Job2 will get dependent on Job1

Related

How can I have my pipeline ready for merge when using rules instead of when: manual?

I have a CI which allow me to merge my MR when at least one job in each stages is succeded even if all others manual jobs are not executed.
I can do that by using the when: manual condition on jobs.
I changed the when: manual with rules keyword that contains when: manual and now my pipeline is blocked even if one job of each stage is succeded.
I tried with allow_failure: true and it allow to merge my MR. The problem is that I can merge without doing any job... I want at least one job of each stage to be executed to allow merging.
Do you have any idea of doing this using rules ?
Thank you !

Running Jobs in Parallel for same project and different branches

What do I need to change in order to run these jobs in parallel?
There is one more runner available on the server, but it's not picking up the "pending" job until the "running" one is finished.
UPDATE
The jobs are picked up by different runners, but in a sequential mode. See ci-runner-1 and ci-runner-2.
See screenshots
The problem was that in config.toml (/etc/gitlab-runner/config.toml in my case) I had:
concurrent = 1.
Changed this to 0 or a value > 1, restart gitlab-runner and all good.
Reference:
https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section

GitLab-CI: run job only when all conditions are met

In GitLab-CI document, I read the following contents:
In this example, job will run only for refs that are tagged, or if a build is explicitly requested via an API trigger or a Pipeline Schedule:
job:
# use special keywords
only:
- tags
- triggers
- schedules
I noticed the document uses or instead of and, which means the job is run when either one condition is met. But what if I want to configure a job to only run when all conditions are met, for instance, in a Pipeline Schedule and on master branch?
If your specific question is how do I only run a pipeline on master when it was scheduled this should work:
job:
only:
- master
except:
- triggers
- pushes
- external
- api
- web
In this example you exclude all except for the schedules 'trigger' and only run for the master branch.

Running same Kettle Job from two different scripts Issue

Is it possible to run a kettle job simultaneously more than once at the same time?
What I am Trying
Say we are running this script twice at a same time,
sh kitchen.sh -rep="development" -dir="job_directory" -job="job1"
If I run it only once at a point of time, data-flow is perfectly fine.
But, when I run this command twice at the same time, it throws error like:
ERROR 09-01 13:34:13,295 - job1 - Error in step, asking everyone to stop because of:
ERROR 09-01 13:34:13,295 - job1 - org.pentaho.di.core.exception.KettleException:
java.lang.Exception: Return code 1 received from statement : mkfifo /tmp/fiforeg
Return code 1 received from statement : mkfifo /tmp/fiforeg
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.execute(MySQLBulkLoader.java:140)
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.processRow(MySQLBulkLoader.java:267)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.lang.Exception: Return code 1 received from statement : mkfifo /tmp/fiforeg
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.execute(MySQLBulkLoader.java:95)
... 3 more
It's important to run the jobs simultaneously twice at a same time. To accomplish this, I can duplicate every job and run the original and the duplicate job at a point of time. But, not a good approach for long run!
Question:
Is Pentaho not maintaining threads?
Am I missing some option, or can I enable some option to make pentaho create different threads for different job instances?
Of course Kettle maintains threads. A great many of them in fact. It looks like the problem is that the MySQL bulk loader uses a FIFO. You have two instances of a FIFO called /tmp/fiforeg. The first instance to run creates the FIFO just fine; the second then tries to create another instance with the same name and that results in an error.
At the start of the job, you need to generate a unique FIFO name for that instance. I think you can do this by adding a transformation at the start of the job that uses a Generate random value step to generate a random string or even a UUID and store it in a variable in the job via the Set variables step.
Then you can use this variable in the 'Fifo file' field of the MySQL bulk loader.
Hope that works for you. I don't use MySQL, so I have no way to make sure.

Pig step execution details

I am newbee to pig .
I have written a small script in pig , where in i first load the data from two different tables and further right outer join the two tables ,later also i have next join of tables for two different st of data .It works fine .But i want to see
the steps of execution , like in which step my data is loaded that way i can note the time
needed for loading later details of step for data joining like how much time it is
taking for these much records to be joined .
Basically i want to know which part of my pig script is taking longer time to run so
that way i can further optimize my pig script .
Anyway we could println within the script and find which steps got executed which has started to execute .
Through jobtracker details link i could not get much info , just could see mapper is running & reducer is running , but idealy mapper for which part of script is running could not find that.
For example for a hive job run we can see in the jobtracker details link which step is currently getting executed.
Any information will be really helpfull.
Thanks in advance .
I'd suggest you to have a look at the followings:
Pig's Progress Notification Listener
Penny : this is a monitoring tool but I'm afraid that it hasn't been updated in the recent past (e.g: it won't compile for Pig 0.12.0 unless you do some code changes)
Twitter's Ambrose project. https://github.com/twitter/ambrose
On the other, after executing the script you can see a detailed statistics about the execution time of each alias (see: Job Stats (time in seconds)).
Have a look at the EXPLAIN operator. This doesn't give you real-time stats as your code is executing, but it should give you enough information about the MapReduce plan your script generates that you'll be able to match up the MR jobs with the steps in your script.
Also, while your script is running you can inspect the configuration of the Hadoop job. Look at the variables "pig.alias" and "pig.job.feature". These tell you, respectively, which of your aliases (tables/relations) is involved in that job and what Pig operations are being used (e.g., HASH_JOIN for a JOIN step, SAMPLER or ORDER BY for an ORDER BY step, and so on). This information is also available in the job stats that are output to the console upon completion.