How can I have my pipeline ready for merge when using rules instead of when: manual? - gitlab-ci

I have a CI which allow me to merge my MR when at least one job in each stages is succeded even if all others manual jobs are not executed.
I can do that by using the when: manual condition on jobs.
I changed the when: manual with rules keyword that contains when: manual and now my pipeline is blocked even if one job of each stage is succeded.
I tried with allow_failure: true and it allow to merge my MR. The problem is that I can merge without doing any job... I want at least one job of each stage to be executed to allow merging.
Do you have any idea of doing this using rules ?
Thank you !

Related

How to force Nextflow process to recalculate and ignore cache in resumed workflow

I have a series of processes in nextflow pipeline, employing multiple heavy computing steps and database (SQL) insertion/fetch. I need to insert certain (intermediate) process results to the DB and fetch them later for further processing (within the same pipeline). In the most simplified form it will be something like:
process1 (fetch data from DB)
process2 (analyze process1.out)
process3 (inserts process2.out to DB)
The problem is, that when any values are changed in the DB, output from process1 is still cached (when using -resume flag), so changes in DB are not reflected here at all.
Is there any way to force reprocessing process1 while using -resume and ignore cache?
So far, I was manually deleting respective work folder, or adding dummy line to process1, but that is extremely ineffective solution.
Thanks for any help here.
Result caching is enable by default, but this feature can be disabled using the cache directive by setting the value to false. For example:
process process1 {
cache false
...
}
Not sure if we have the full picture here, but updating a database with some set of process results just to fetch them again later on seems wasteful. Or maybe I've just misunderstood. I would instead try to separate the heavy computational work (hours) from the database transactions (minutes) if at all possible. Note that if you need to make per process database transactions, you might be able to achieve this using the beforeScript and afterScript directives (which can be enabled/disabled using a nextflow.config profile, for example). For example, a beforeScript could be used to create a database object that is updated (using an afterScript) once the process has completed. Since both of these scripts are run from inside the workDir, you could use the basename of the current/working directory (i.e. the task UUID) as a key.

Running Jobs in Parallel for same project and different branches

What do I need to change in order to run these jobs in parallel?
There is one more runner available on the server, but it's not picking up the "pending" job until the "running" one is finished.
UPDATE
The jobs are picked up by different runners, but in a sequential mode. See ci-runner-1 and ci-runner-2.
See screenshots
The problem was that in config.toml (/etc/gitlab-runner/config.toml in my case) I had:
concurrent = 1.
Changed this to 0 or a value > 1, restart gitlab-runner and all good.
Reference:
https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section

Subsequent jobs in gitlab-ci

Is there any way to run jobs from the stage in subsequent order? I've tried to do it with dependecies
job1:
stage:deploy
...
job2:
stage:deploy
dependencies:
- job1
but it gives me an error "dependency job1 is not defined in prior stages".
Is there any workaround?
No. This is not possible by design. You will have to define more stages.
As the stages docs describe:
Jobs of the same stage are run in parallel.
It might become possible at some point in the future (as of January 2021). Progress is being tracked here
You might have found out the answer by now, but still answering for future audiences coming to post when facing similar issue.
The error itself says "dependency job1 is not defined in prior stages", in your example both jobs having same name i.e "stage: deploy".
so that is the reason its not picking up the dependencies rule, also with new gitlab version, need clause can be used now.
Job1:
Stage: A
Job2:
Stage: B
needs: ["Job1"]
This way, Job2 will get dependent on Job1

GitLab-CI: run job only when all conditions are met

In GitLab-CI document, I read the following contents:
In this example, job will run only for refs that are tagged, or if a build is explicitly requested via an API trigger or a Pipeline Schedule:
job:
# use special keywords
only:
- tags
- triggers
- schedules
I noticed the document uses or instead of and, which means the job is run when either one condition is met. But what if I want to configure a job to only run when all conditions are met, for instance, in a Pipeline Schedule and on master branch?
If your specific question is how do I only run a pipeline on master when it was scheduled this should work:
job:
only:
- master
except:
- triggers
- pushes
- external
- api
- web
In this example you exclude all except for the schedules 'trigger' and only run for the master branch.

Running same Kettle Job from two different scripts Issue

Is it possible to run a kettle job simultaneously more than once at the same time?
What I am Trying
Say we are running this script twice at a same time,
sh kitchen.sh -rep="development" -dir="job_directory" -job="job1"
If I run it only once at a point of time, data-flow is perfectly fine.
But, when I run this command twice at the same time, it throws error like:
ERROR 09-01 13:34:13,295 - job1 - Error in step, asking everyone to stop because of:
ERROR 09-01 13:34:13,295 - job1 - org.pentaho.di.core.exception.KettleException:
java.lang.Exception: Return code 1 received from statement : mkfifo /tmp/fiforeg
Return code 1 received from statement : mkfifo /tmp/fiforeg
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.execute(MySQLBulkLoader.java:140)
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.processRow(MySQLBulkLoader.java:267)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.lang.Exception: Return code 1 received from statement : mkfifo /tmp/fiforeg
at org.pentaho.di.trans.steps.mysqlbulkloader.MySQLBulkLoader.execute(MySQLBulkLoader.java:95)
... 3 more
It's important to run the jobs simultaneously twice at a same time. To accomplish this, I can duplicate every job and run the original and the duplicate job at a point of time. But, not a good approach for long run!
Question:
Is Pentaho not maintaining threads?
Am I missing some option, or can I enable some option to make pentaho create different threads for different job instances?
Of course Kettle maintains threads. A great many of them in fact. It looks like the problem is that the MySQL bulk loader uses a FIFO. You have two instances of a FIFO called /tmp/fiforeg. The first instance to run creates the FIFO just fine; the second then tries to create another instance with the same name and that results in an error.
At the start of the job, you need to generate a unique FIFO name for that instance. I think you can do this by adding a transformation at the start of the job that uses a Generate random value step to generate a random string or even a UUID and store it in a variable in the job via the Set variables step.
Then you can use this variable in the 'Fifo file' field of the MySQL bulk loader.
Hope that works for you. I don't use MySQL, so I have no way to make sure.