how to use Azure Synapse database templates programmatically - azure-data-lake

I can create an Azure data lake database with pre-built tables using Azure Synapse database templates from the Synapse Studio UI, but is there a way to use these templates programmatically? So far from my research I have not found a command, API, or SDK for this. Perhaps I could create the database and tables via the UI, then generate the associated spark sql creation scripts, but don't see a way how to do that either. Does anyone have any ideas on how to do either of the prior?

You can create the data lake storage, tables and data insertion programmatically using Azure SDKs. But these templates have been made available to overcome these series of manual tasks. Using these templates save your time and efforts to create an environment and sample data for your development.
Therefore, asking to deploy these templates programmatically challenging the complete concept of templates. If you want to deploy these resources manually, you can use Azure SDKs.

Related

Can we use SQL scripts (Develop hub) during pipeline creation (Integrate hub) in azure synapse?

I want to use my SQL script (present under Develop hub) file inside a Pipeline (present under Integrate hub). Currently I do not see any Activities available solving this purpose.
There is one Script activity under General section which only have a Query & NonQuery option, not for referring any SQL script file created earlier.
Is that feature available at all in Azure Synapse Analytics? Can we refer to SQL script by some other means?
If your Synapse workspace is paired with Azure DevOps then I imagine it’s easy to get the file content with a REST API call (eg here). However then you have to parse the file as GO is not supported by the Script activity. ADF / Synapse Pipeline functions do not support a RegEx style split eg word boundary and GO (\bGO\b) so it starts to get kind of fiddly. I had some success with replace and uriComponent functions.
However you would be better of using Stored Procedures and the Stored Proc activity in Synapse Pipelines - much simpler implementation.

Load multiple files using Azure Data factory or Synapse

I am moving from SSIS to Azure.
we have 100's of files and MSSQL tables that we want to push into a Gen2 data lake
using 3 zones then SQL Data Lake
Zones being Raw, Staging & Presentation (Change names as you wish)
What is the best process to automate this as much as possible
for example build a table with files / folders / tables to bring into Raw zone
then have Synapse bring these objects either full or incremental load
then process the them into the next 2 zones I guess more custom code as we progress.
Your requirement can be accomplished using multiple activities in Azure Data Factory.
To migrate SSIS packages, you need to use SSIS Integrated Runtime (IR). ADF supports SSIS Integration which can be configured by creating a new SSIS Integration runtime. To create the same, click on the Configure SSIS Integration, provide the basic details and create a new runtime.
Refer below image to create new SSIS IR.
Refer this third-party tutorial by SQLShack to Move local SSIS packages to Azure Data Factory.
Now, to copy the data to different zones using copy activity. You can make as much copy of your data as your requirement using copy activity. Refer Copy data between Azure data stores using Azure Data Factory.
ADF also supports Incrementally load data using Change Data Capture (CDC).
Note: Both Azure SQL MI and SQL Server support the Change Data Capture technology.
Tumbling window trigger and CDC window parameters need to be configured to make the incremental load automated. Check this official tutorial.
The last part:
then process them into the next 2 zones
This you need to manage programmatically as there is no such feature available in ADF which can update the other copies of the data based on CDC. You need to either create a separate CDC for those zones or do it logically.

Excel into Azure Data Factory into SQL

I read a few threads on this but noticed most are outdated, with excel becoming an integration in 2020.
I have a few excel files stored in Drobox, I would like to automate the extraction of that data into azure data factory, perform some ETL functions with data coming from other sources, and finally push the final, complete table to Azure SQL.
I would like to ask what is the most efficient way of doing so?
Would it be on the basis of automating a logic app to extract the xlsx files into Azure Blob, use data factory for ETL, join with other SQL tables, and finally push the final table to Azure SQL?
Appreciate it!
Before using Logic app to extract excel file Know Issues and Limitations with respect to excel connectors.
If you are importing large files using logic app depending on size of files you are importing consider this thread once - logic apps vs azure functions for large files
Just to summarize approach, I have mentioned below steps:
Step1: Use Azure Logic app to upload excel files from Dropbox to blob storage
Step2: Create data factory pipeline with copy data activity
Step3: Use blob storage service as a source dataset.
Step4: Create SQL database with required schema.
Step5: Do schema mapping
Step6: Finally Use SQL database table as sink

Options for ingesting and processing data in Azure sql

I need expert opinion on a project I am working on. We currently get data files that we load into our Azure sql database using a local script that calls stored procedures. I am planning on replacing the script with ssis jobs to load the data into our Azure Sql but wondering if that's a good option given our needs.I am opened to different suggestions too. The process we go through is to load data file to staging tables and validate before making updates to live tables. The validation and updates are done by calling stored procedures...so the ssis package will just load the data and make calls to those stored procedures. I have looked at ADF IR and Databricks but they seem overkill but am open to hear people with experience using those as well. I am currently running the ssis package locally as well. Any suggestion on better architecture or tools for this scenario? Thanks!
I would definitely have a look at Azure Data Factory Data flows. With this you can easily build your ETL pipelines in the a Azure Data Factory GUI.
In the following example two text files from a Blob Storage are read, joined, a surrogate key is added and finally the data is loaded to Azure Synapse Analytics (would be the same for Azure SQL):
You finally put this Mapping Data Flow into a pipeline and can trigger it, e. g. if new data arrives.
You can just BULK INSERT data from Azure Blob Store:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-bulk-access-to-data-in-azure-blob-storage?view=sql-server-ver15#accessing-data-in-a-csv-file-referencing-an-azure-blob-storage-location
Then you can use ADF (no IR) or Databricks or Azure Batch or Azure Elastic Jobs to schedule the execution.

Deploying USQL project

I am new to data lake analytics and using USQL.
I am currently setting up data factory pipeline which would replace an existing SSIS workflow. The data factory pipeline would essentially
Extract data transactional database into ADLS
Transform raw entities using USQL
Load the data into SSAS using custom activity
Question
I have a USQL project set up and wanted if there was a standard way of deploying them to ADLA other than just uploading the scripts to a folder in the store.
Great question!
I'm not sure about a standard way, or even a way that might be considered best practice yet. But I use all of the tools you mention to perform very similar tasks.
To try and answer your question: What I do is create the U-SQL scripts as stored procedures within the logical ADLA database. In the VS USQL project I have 1 script per stored proc. The ADF activities then call the proc name. This gives you the right level of disconnection between services and also means you don't need additional blob storage for USQL files.
In my VS solution I often also have a PowerShell project to help manage things. Specifically one what takes all my 'usp_' U-SQL scripts to create one big DDL style thing that can be deployed to the logical ADLA database.
The PowerShell then does the deployment for me using the submit job cmdlet. Example below.
Submit-AzureRmDataLakeAnalyticsJob `
-Name $JobName `
-AccountName $DLAnalytics `
–Script $USQLProcDeployAll `
-DegreeOfParallelism $DLAnalyticsDoP
Hope this gives you a steer. I also accept that these tools are still fairly new. So open to other suggestions.
Cheers