What is the difference and relationship between an Azure DevOps Build Definition, and a Pipeline? - api

I am trying to automate a process in Azure DevOps, using the REST API. I think it should go like this (at least, this is the current manual process):
fork repo
create pipeline(s) based using YAML files in newly forked repo
run pipelines in particular way
I am new to the Azure DevOps REST API and I am struggling to understand what I have done and what I should be doing.
Using the REST API, I seem to be able to create what I would call a pipeline, using the pipeline endpoint; I do notice that if I want to run it, I have to interact with its build definition instead.
Also, looking at code other colleagues have written, it seems (though I may be wrong) like they are able to achieve the same by simply creating a build definition, and not explicitly creating pipeline.
This lack of understanding is driving me bonkers so I am hoping someone can enlighten me!
Question
What is the difference, and relationship, between a Build Definition and a Pipeline?
Additional info, I am not interested in working with the older Release Pipelines and I have tried to find the answer among the Azure DevOps REST API docs, but to no avail.

If you want to create a pipeline you can do this using both of this. However, the difference is actually in terms of concept:
build definitions are part of first available flow which consist: build and release where build was responsible for building, testing and publishing artifact for later use in releases to deploy
pipeline are a new approach which leverage YAML designed process for building/testing/deploying code
More info you can find here - Whats the difference between a build pipeline and a release pipeline in Azure DevOps?
And for instance for this pipeline/build
https://dev.azure.com/thecodemanual/DevOps%20Manual/_build?definitionId=157
where definition id is 157
You will get reposnses in both endpoints:
https://dev.azure.com/{{organization}}/{{project}}/_apis/build/definitions/157?api-version=5.1
and
https://dev.azure.com/{{organization}}/{{project}}/_apis/pipelines/157?api-version=6.0-preview.1
and in that term pipeline id = build id

The pipelines endpoint is not very useful:
https://dev.azure.com/{Organization}/{ProjectName}/_apis/pipelines?api-version=6.0-preview.1
It will only give you a list of pipelines with very basic info such as name, ID, folder etc.
To create and update YAML pipelines you need to use the Build definitions endpoint. The IDs you use in the endpoint are the same IDs as the Pipelines endpoint uses.
Get definition, Get list, Create, Update:
https://dev.azure.com/{Organization}/{ProjectName}/_apis/build/definitions?api-version=6.0
(To create a working pipeline you must first Get an existing pipeline, modify the JSON you receive, then POST it as a new definition.)

Related

CICD pipeline for Bigquery Schema/ ddl deployment

I am looking for CI/CD solution for the Google Bigquery script.
The requirement is that I have a list of files with DDL script, design the CI/CD solution which should maintain the version, and deploy the script in Google Bigquery in auto/schedule based.
Since you want to use version control to commit the schema, you can use CI for Data in BigQuery CLI utility Github Repository which will help you in orchestrating the processes. For more information you can check this documentation. For implementing this, you can check this link.
Since you want to use CD, Cloud Build can be used with BigQuery where you can use your own custom builders for your requirement. You can also configure notifications for both BigQuery and GitHub using Cloud Build.
for product recommendation for CI, user cloud source repositories and for CD use cloud build
there are multiple ways to do deploy
option 1: here you are specifying inline query in the cloud build steps, this does not exactly takes into your latest version of SQL. see option 2 for latest version of sql
here you see $PROJECT_ID and $_DATASET these are dynamic variables that you set at run time by environment variables in cloud build also you can use same way to
— name: ‘gcr.io/cloud-builders/gcloud’
entrypoint: ‘bq’
id: ‘create entry min day view’
args:
— query
— --use_legacy_sql=false
— "CREATE OR REPLACE TABLE $PROJECT_ID.$_DATASET.TABLENAME
AS
SELECT 1"
option 2:
there is post for this here
on the last post above link, you can use use bash as entry point and pass bq arguments as args
hope this helps.

Best Practice for deploying ADF pipelines

I am brand new to ADF and am creating my very first data factory. I am using the UI option (if anyone can point me to any documents for using code I'd be most grateful).
I will have 3 different environments - dev/test/prod. Each of these have got slightly different configs (yes I know!). So my datasets and linked services will need to change for each environment. What is the best way to do this? How would you approach this?
(p.s: We also have BitBucket and Jenkins/Octopus for CI/CD, so ideally would like to create scripts to automate this if possible.)
Thank you
You can create data factory using code. You can find code with detailed information here
There are 2 approach to deploy ADF pipeline.
ARM template
Custom approach (Json files, via REST API) - With this approach, we can fully automate CI/CD process as collaboration branch will be our source for deployment. This is the reason why the approach is also known as (direct) deployment from code (JSON files).
Refer this blog by Kamil Nowinski
Scope of the question is broad. But, this video by Mohamed Radwan practically shows how you can deploy and manage 3 different environments i.e. ADF-DEV, ADF-PROD and ADF-UAT.

Adding auxiliary DB data during deployment

My app consists of two containers: the app itself and a database. I'm planning to wrap the app into a chart, thus paving a way for easy reproducible deployment.
Apart from setting/reading environment envs (which helm+kubernetes seems to handle really well), part of app's configuration is:
making sure the database is pre-filled with special auxiliary data (e.g. admin user exists, some user role names required to create new users are there, etc.).
I like the idea of having readable yaml files hold the entire configuration in a human readable format. However at a glance it doesn't seem that helm in any way would help with this (DB records) kind of configuration.
That being said, what is the best place to put code/configuration ensuring that DB contains certain auxiliary records? A config yaml file? An container init script, written in bash?
You are right, Kubernetes or Helm cannot help with preparing your pre-filled database records/schema.
You should probably have your application initialize those pre-filled data. If you don't want to put this logic into your application, you can ship an initialization script and configure an init container with Kubernetes.
Kubernetes makes sure every time your application container is restarted, the init container runs first. In the init container, you can execute a bash/python/... script that makes sure the records you want are there.

Call a pipeline from a pipeline in Amazon Data Pipeline

My team at work is currently looking for a replacement for a rather expensive ETL tool that, at this point, we are using as a glorified scheduler. Any of the integrations offered by the ETL tool we have improved using our own python code, so I really just need its scheduling ability. One option we are looking at is Data Pipeline, which I am currently piloting.
My problem is thus: imagine we have two datasets to load - products and sales. Each of these datasets requires a number of steps to load (get source data, call a python script to transform, load to Redshift). However, product needs to be loaded before sales runs, as we need product cost, etc to calculate margin. Is it possible to have a "master" pipeline in Data Pipeline that calls products first, waits for its successful completion, and then calls sales? If so, how? I'm open to other product suggestions as well if Data Pipeline is not well-suited to this type of workflow. Appreciate the help
I think I can relate to this use case. Any how, Data Pipeline does not do this kind of dependency management on its own. It however can be simulated using file preconditions.
In this example, your child pipelines may depend on a file being present (as a precondition) before starting. A Master pipeline would create trigger files based on some logic executed in its activities. A child pipeline may create other trigger files that will start a subsequent pipeline downstream.
Another solution is to use Simple Workflow product . That has the features you are looking for - but would need custom coding using the Flow SDK.
This is a basic use case of datapipeline and should definitely be possible. You can use their graphical pipeline editor for creating this pipeline. Breaking down the problem:
There are are two datasets:
Product
Sales
Steps to load these datasets:
Get source data: Say from S3. For this, use S3DataNode
Call a python script to transform: Use ShellCommandActivity with staging. Data Pipeline does data staging implicitly for S3DataNodes attached to ShellCommandActivity. You can use them using special env variables provided: Details
Load output to Redshift: Use RedshiftDatabase
You will need to do add above components for each of the dataset you need to work with (product and sales in this case). For easy management, you can run these on an EC2 Instance.
Condition: 'product' needs to be loaded before 'sales' runs
Add dependsOn relationship. Add this field on ShellCommandActivity of Sales that refers to ShellCommandActivity of Product. See dependsOn field in documentation. It says: 'One or more references to other Activities that must reach the FINISHED state before this activity will start'.
Tip: In most cases, you would not want your next day execution to start while previous day execution is still active aka RUNNING. To avoid such a scenario, use 'maxActiveInstances' field and set it to '1'.

How to setup a project and break it into sub-projects, how to use slick in this setup

This is a brand new project, so I can use the latest version of play.
I am using IntelliJ 13.
So I want to break the models/db/service layer because I will also have a job service (reading messages off a queue for example) that will need this server layer also.
Since slick is outside of play, how do I setup the datasource for this project, keeping in mind I will be connecting to multiple databases.
Do I need to create a custom config file for this?
web-app (play2!)
- service
service (models + dao)
models
dao
jobs (service)
I don't see any examples like this, which I find strange because I think pretty much any project would have to be setup this way in the real world (beyond simple examples).
Can someone show be sample code where things are broken down like this?
This example isn't broken into sub-projects, but it is very split up and would allow you to specify multiple databases.
https://github.com/geigerma/play-cake