How to run gitlab pipeline with desired title? - gitlab-ci

Currently gitlab pipelines which I execute show me the last commit message as it's description. But when I need to search for some specific pipeline it becomes time consuming. Can we pass some title to pipeline before running it? If so can u please share some sample example.

Related

Get the name of the source branch after an MR is merged in GitlabCI

I have a Pipeline job that needs to run only after an MR has been merged to a certain branch (let’s assume it’s master).
This job is supposed to make an API call to send the name of the merged source branch.
The problem I’m encountering is that CI_MERGE_REQUEST_SOURCE_BRANCH_NAME will not be available on the Pipeline that runs right after the merge (since it’s not a merge request pipeline).
Is there a way (env var) to tell what was the branch that was just merged into master?
Many thanks in advance y’all!
Better you work with hashes not with the names.
Anyway I see two options:
In the pipeline for the merge request save the hash/name in an artifact. The subsequent pipeline can access this artifact an read the hash/name.
Run a independent pipeline and read the hash/name of the previous pipeline. To do this more secure you can add tags and read the previous pipeline only if it has the correct tag.

Log tables for scheduled jobs in dbt

I am super new to dbt and wanted to see if there is a way to log the success/failure of scheduled dbt jobs in a table. I am currently using bigquery as my data warehouse.
Any help would be greatly appreciated thanks.
Thanks!
According to dbt's documentation, events are logged to logs/dbt.log file at your project folder, along with stdout at your terminal. As you're dealing with scheduled jobs, the file-based option would be more appropriate.
You can pass the --debug and the --log-format json arguments to your dbt jobs for structured logging messages, which would look like:
dbt --debug --log-format json run
Along with that, you could parse the success or failure of your jobs easily by looking, when available, at the node_info complex field and its node_status subfield. This will give you the results of your jobs when they eventually finish. The node_name subfield will give you the correspondent dbt model.
For more information, look at the structured logging section of the documentation. For a very detailed view of the jobs' metadata, look at this dbt's code block that generates it.

Trigger pipeline/job when merge request state changes (WIP to "ready")

I am currently trying to implement a pipeline using Gitlab ci. I defined my pipeline in a gitlab-ci.yml file to run my jobs. I am working on pipeline where jobs are triggered by opened merge request. more specifically , non WIP and draft merge request. One of the most important condition is also that I want the job to be triggered and running when merge request changes state from WIP/draft to "ready".
Below is the closest way I found to do such thing.
integrationtest:
stage: integrationtest
only:
- merge_requests
except:
variables:
- $CI_MERGE_REQUEST_TITLE =~ /^WIP:.*/
Unfortunately, Now the only thing missing is indeed the pipeline being triggered when WIP state changes.
Any idea to bypass this problem is more than welcome.
Thank you in advance :)
There is an open issue for you exact use case. There is a workaround with webhook integration mentioned in the last comment of this issue, maybe this will help you.

Pass output from one pipeline run and use as parameter in another pipeline

The way my ADF setup currently works, is that I have multiple pipelines, each containing atleast one activity. Then I have one big pipeline that sort of chains these pipelines together.
However, now in the big "master" pipeline, I would like to use the output of an activity from one pipeline and then pass it to another pipeline. All of this orchestrated from the "master" pipeline.
My "master" pipeline would look something like this:
What I have tried to do is adding a parameter to "Execute Pipeline2", and I have tried passing:
#activity('Execute Pipeline1').output.pipeline.runId.output.runOutput
#activity('Execute Pipeline1').output.pipelineRunId.output.runOutput
#activity('Execute Pipeline1').output.runOutput
How would one go about doing this?
unfortunately we don't have a way to pass the output of an activity across pipelines. Right now pipelines don't have outputs (only activities).
We have a workitem that will allow a user to choose what should be the output for a pipeline (imagine a pipeline with 40 activities, user would be able to choose the output of activity 3 as pipeline output). However, this workitem is in very early stages so don't expect to see this soon.
For now, the only way would be to save the output that you want in storage (blob, for example) and then read it and pass it to the other pipeline. Another method could be a web activity that gets the pipeline run (passing run id) and you get the output using ADF SDK or REST API, and then you pass that to the next Execute Pipeline activity.

Call a pipeline from a pipeline in Amazon Data Pipeline

My team at work is currently looking for a replacement for a rather expensive ETL tool that, at this point, we are using as a glorified scheduler. Any of the integrations offered by the ETL tool we have improved using our own python code, so I really just need its scheduling ability. One option we are looking at is Data Pipeline, which I am currently piloting.
My problem is thus: imagine we have two datasets to load - products and sales. Each of these datasets requires a number of steps to load (get source data, call a python script to transform, load to Redshift). However, product needs to be loaded before sales runs, as we need product cost, etc to calculate margin. Is it possible to have a "master" pipeline in Data Pipeline that calls products first, waits for its successful completion, and then calls sales? If so, how? I'm open to other product suggestions as well if Data Pipeline is not well-suited to this type of workflow. Appreciate the help
I think I can relate to this use case. Any how, Data Pipeline does not do this kind of dependency management on its own. It however can be simulated using file preconditions.
In this example, your child pipelines may depend on a file being present (as a precondition) before starting. A Master pipeline would create trigger files based on some logic executed in its activities. A child pipeline may create other trigger files that will start a subsequent pipeline downstream.
Another solution is to use Simple Workflow product . That has the features you are looking for - but would need custom coding using the Flow SDK.
This is a basic use case of datapipeline and should definitely be possible. You can use their graphical pipeline editor for creating this pipeline. Breaking down the problem:
There are are two datasets:
Product
Sales
Steps to load these datasets:
Get source data: Say from S3. For this, use S3DataNode
Call a python script to transform: Use ShellCommandActivity with staging. Data Pipeline does data staging implicitly for S3DataNodes attached to ShellCommandActivity. You can use them using special env variables provided: Details
Load output to Redshift: Use RedshiftDatabase
You will need to do add above components for each of the dataset you need to work with (product and sales in this case). For easy management, you can run these on an EC2 Instance.
Condition: 'product' needs to be loaded before 'sales' runs
Add dependsOn relationship. Add this field on ShellCommandActivity of Sales that refers to ShellCommandActivity of Product. See dependsOn field in documentation. It says: 'One or more references to other Activities that must reach the FINISHED state before this activity will start'.
Tip: In most cases, you would not want your next day execution to start while previous day execution is still active aka RUNNING. To avoid such a scenario, use 'maxActiveInstances' field and set it to '1'.