Best Practice for deploying ADF pipelines - azure-sql-database

I am brand new to ADF and am creating my very first data factory. I am using the UI option (if anyone can point me to any documents for using code I'd be most grateful).
I will have 3 different environments - dev/test/prod. Each of these have got slightly different configs (yes I know!). So my datasets and linked services will need to change for each environment. What is the best way to do this? How would you approach this?
(p.s: We also have BitBucket and Jenkins/Octopus for CI/CD, so ideally would like to create scripts to automate this if possible.)
Thank you

You can create data factory using code. You can find code with detailed information here
There are 2 approach to deploy ADF pipeline.
ARM template
Custom approach (Json files, via REST API) - With this approach, we can fully automate CI/CD process as collaboration branch will be our source for deployment. This is the reason why the approach is also known as (direct) deployment from code (JSON files).
Refer this blog by Kamil Nowinski
Scope of the question is broad. But, this video by Mohamed Radwan practically shows how you can deploy and manage 3 different environments i.e. ADF-DEV, ADF-PROD and ADF-UAT.

Related

What is the difference and relationship between an Azure DevOps Build Definition, and a Pipeline?

I am trying to automate a process in Azure DevOps, using the REST API. I think it should go like this (at least, this is the current manual process):
fork repo
create pipeline(s) based using YAML files in newly forked repo
run pipelines in particular way
I am new to the Azure DevOps REST API and I am struggling to understand what I have done and what I should be doing.
Using the REST API, I seem to be able to create what I would call a pipeline, using the pipeline endpoint; I do notice that if I want to run it, I have to interact with its build definition instead.
Also, looking at code other colleagues have written, it seems (though I may be wrong) like they are able to achieve the same by simply creating a build definition, and not explicitly creating pipeline.
This lack of understanding is driving me bonkers so I am hoping someone can enlighten me!
Question
What is the difference, and relationship, between a Build Definition and a Pipeline?
Additional info, I am not interested in working with the older Release Pipelines and I have tried to find the answer among the Azure DevOps REST API docs, but to no avail.
If you want to create a pipeline you can do this using both of this. However, the difference is actually in terms of concept:
build definitions are part of first available flow which consist: build and release where build was responsible for building, testing and publishing artifact for later use in releases to deploy
pipeline are a new approach which leverage YAML designed process for building/testing/deploying code
More info you can find here - Whats the difference between a build pipeline and a release pipeline in Azure DevOps?
And for instance for this pipeline/build
https://dev.azure.com/thecodemanual/DevOps%20Manual/_build?definitionId=157
where definition id is 157
You will get reposnses in both endpoints:
https://dev.azure.com/{{organization}}/{{project}}/_apis/build/definitions/157?api-version=5.1
and
https://dev.azure.com/{{organization}}/{{project}}/_apis/pipelines/157?api-version=6.0-preview.1
and in that term pipeline id = build id
The pipelines endpoint is not very useful:
https://dev.azure.com/{Organization}/{ProjectName}/_apis/pipelines?api-version=6.0-preview.1
It will only give you a list of pipelines with very basic info such as name, ID, folder etc.
To create and update YAML pipelines you need to use the Build definitions endpoint. The IDs you use in the endpoint are the same IDs as the Pipelines endpoint uses.
Get definition, Get list, Create, Update:
https://dev.azure.com/{Organization}/{ProjectName}/_apis/build/definitions?api-version=6.0
(To create a working pipeline you must first Get an existing pipeline, modify the JSON you receive, then POST it as a new definition.)

Automating Sequence of Manual Steps

I have sequence of steps that an user does, e.g. logging on the a remote UNIX shell, creation of files/directories, changing permission, Running remote Shell scripts and commands, File deletion, File movements,
Run DB queries and basis the query results perform certain tasks exporting the results to a file or run further shell commands/scripts or DB insert statements etc etc.
doing there steps users achieves different processed or data processing and validating.
What is the best way to automate the above schenerio, Should we go for a Workflow tools like Activiti etc. or is there a better framework/way to achieve the requirements.
My requirement is to work with Open-source, and possibly Java based.
I am completely new to this so any help pointers would be appreciated.
The scenario you describe is certainly possible with a workflow tool like Activiti. Apache Camel or Spring Integration would be another possibility (as all the steps you mention are automatic system tasks).
A workflow framework would be a good option if you need one of these
you want to store the history data for 'audit purposes': who did what/when/how long did it take.
you want to visually model your steps, perhaps to discuss it with business people.
there is a need for human interaction between some of the steps
Your description reminds me of a software/account provisioning process.
There are a large number of provisioning tools on the market both Open Source or otherwise (Dell Crowbar is one options).
However, A couple of the comments you made in your response to Joram indicate a more general purpose tool such as Activiti may be an option:
"Swivel Chair" tasks - User tasks that may one day be automated
Visual model of process state
Most provisioning tools dont allow for generic user tasks and dont provide a (good) visual model of the process state.
However, they generally include remote script execution which would need to be cobbled together as a service task if using a BOM tool.
I would certainly expand my research to include provisioning tools as they sound like a better fit, however if you cant find anything that works for you, a BPM platform provides a generic framework to build what you need.

How to setup a project and break it into sub-projects, how to use slick in this setup

This is a brand new project, so I can use the latest version of play.
I am using IntelliJ 13.
So I want to break the models/db/service layer because I will also have a job service (reading messages off a queue for example) that will need this server layer also.
Since slick is outside of play, how do I setup the datasource for this project, keeping in mind I will be connecting to multiple databases.
Do I need to create a custom config file for this?
web-app (play2!)
- service
service (models + dao)
models
dao
jobs (service)
I don't see any examples like this, which I find strange because I think pretty much any project would have to be setup this way in the real world (beyond simple examples).
Can someone show be sample code where things are broken down like this?
This example isn't broken into sub-projects, but it is very split up and would allow you to specify multiple databases.
https://github.com/geigerma/play-cake

How to set up development environment for yql and open table development? How to test it locally? (best practice)

I develop -- from time to time -- yahoo open tables to access different resources on the web. Currently I am using a JavaScript editor and -- when I want to test if my open table works -- I upload the xml table description to a server to test it with a yql client application. However this approach is quite slow and -- sometimes -- I get blocked by yahoo, because of a mistake in my open table description. Therefore I would like to learn about best practices on how to test and develop yahoo open table locally. How does look your set up for the open table development?
To clarify my question, I am looking for any convenient way (best practise) to develop and test yql tables, e.g., running a part of a Java Script inside Rhino.
First of all: I agree that I don't see a really convenient way either to test YQL datatable definitions locally. Nevertheless, here is how I approach this issue.
Hosting on github
YQL datatable definitions are often used in very open scenarios e.g. when there is an existing API that you want to wrap via YQL. Therefore I am normally working on a fork of the YQL community tables and I just add my own definitions there. The hosting of the .xml files takes place on github in this case: https://github.com/yql/yql-tables
The other advantage of this approach is as well that it is easy for me to share my datatables with the community if I feel that they might be valuable for others as well.
Hosting privately
The free github account only comes with free repositories though, so everybody would be able to see and use your datatables. If that is not good for you then you could either buy a github pro account to get private repositories, or host the datatable definitions yourself.
To do that you can upload them to your own server - as you are already doing - or you should also be able to set up a web server like Apache locally on your machine and then get a dynamic hostname from dyndns.com or similar, so that you can point to this definitions from YQL. I have not tried this because github was working sufficiently well for me but I am sure that it is possible.
Why don't you just put the file you are editing in a public dropbox folder? This is what I do and it works pretty well.

Website data retrieval

An recent article has prompted me to pick up a project I have been working on for a while. I want to create a web service front end for a number of sites to allow automated completion of forms and data retrieval from the results, and other areas of the site. I have acheived a degree of success using Selenium and custom code however I am looking to extend this to a stage where adding additional sites is a trivial task (maybe one which doesn't require a developer even).
The Kapow web data server looks to achieve a lot of this however I am told it is quite expensive (currently awaiting a quote). Has anyone had experience with this, or can suggest any alternatives (Open Source ideally)?
Disclaimer: I realise the potential legality issues around automating data retrieval from 3rd party websites - this tool is designed to be used in a price comparison system and all of the websites integrated with it will be done with the express permission of the owners. Where the sites provide an API this will clearly be the favoured approach.
Thanks
Realised it's been a while since I posted this, however should anyone come across it, I have had lots of success in using the WSO2 framework (particularly the mashup server) for this. For data mining tasks I have also used a Java library that this wraps - webharvest - which has achieved everything I needed