Deploying USQL project - azure-data-lake

I am new to data lake analytics and using USQL.
I am currently setting up data factory pipeline which would replace an existing SSIS workflow. The data factory pipeline would essentially
Extract data transactional database into ADLS
Transform raw entities using USQL
Load the data into SSAS using custom activity
Question
I have a USQL project set up and wanted if there was a standard way of deploying them to ADLA other than just uploading the scripts to a folder in the store.

Great question!
I'm not sure about a standard way, or even a way that might be considered best practice yet. But I use all of the tools you mention to perform very similar tasks.
To try and answer your question: What I do is create the U-SQL scripts as stored procedures within the logical ADLA database. In the VS USQL project I have 1 script per stored proc. The ADF activities then call the proc name. This gives you the right level of disconnection between services and also means you don't need additional blob storage for USQL files.
In my VS solution I often also have a PowerShell project to help manage things. Specifically one what takes all my 'usp_' U-SQL scripts to create one big DDL style thing that can be deployed to the logical ADLA database.
The PowerShell then does the deployment for me using the submit job cmdlet. Example below.
Submit-AzureRmDataLakeAnalyticsJob `
-Name $JobName `
-AccountName $DLAnalytics `
–Script $USQLProcDeployAll `
-DegreeOfParallelism $DLAnalyticsDoP
Hope this gives you a steer. I also accept that these tools are still fairly new. So open to other suggestions.
Cheers

Related

Can we use SQL scripts (Develop hub) during pipeline creation (Integrate hub) in azure synapse?

I want to use my SQL script (present under Develop hub) file inside a Pipeline (present under Integrate hub). Currently I do not see any Activities available solving this purpose.
There is one Script activity under General section which only have a Query & NonQuery option, not for referring any SQL script file created earlier.
Is that feature available at all in Azure Synapse Analytics? Can we refer to SQL script by some other means?
If your Synapse workspace is paired with Azure DevOps then I imagine it’s easy to get the file content with a REST API call (eg here). However then you have to parse the file as GO is not supported by the Script activity. ADF / Synapse Pipeline functions do not support a RegEx style split eg word boundary and GO (\bGO\b) so it starts to get kind of fiddly. I had some success with replace and uriComponent functions.
However you would be better of using Stored Procedures and the Stored Proc activity in Synapse Pipelines - much simpler implementation.

how to use Azure Synapse database templates programmatically

I can create an Azure data lake database with pre-built tables using Azure Synapse database templates from the Synapse Studio UI, but is there a way to use these templates programmatically? So far from my research I have not found a command, API, or SDK for this. Perhaps I could create the database and tables via the UI, then generate the associated spark sql creation scripts, but don't see a way how to do that either. Does anyone have any ideas on how to do either of the prior?
You can create the data lake storage, tables and data insertion programmatically using Azure SDKs. But these templates have been made available to overcome these series of manual tasks. Using these templates save your time and efforts to create an environment and sample data for your development.
Therefore, asking to deploy these templates programmatically challenging the complete concept of templates. If you want to deploy these resources manually, you can use Azure SDKs.

Excel into Azure Data Factory into SQL

I read a few threads on this but noticed most are outdated, with excel becoming an integration in 2020.
I have a few excel files stored in Drobox, I would like to automate the extraction of that data into azure data factory, perform some ETL functions with data coming from other sources, and finally push the final, complete table to Azure SQL.
I would like to ask what is the most efficient way of doing so?
Would it be on the basis of automating a logic app to extract the xlsx files into Azure Blob, use data factory for ETL, join with other SQL tables, and finally push the final table to Azure SQL?
Appreciate it!
Before using Logic app to extract excel file Know Issues and Limitations with respect to excel connectors.
If you are importing large files using logic app depending on size of files you are importing consider this thread once - logic apps vs azure functions for large files
Just to summarize approach, I have mentioned below steps:
Step1: Use Azure Logic app to upload excel files from Dropbox to blob storage
Step2: Create data factory pipeline with copy data activity
Step3: Use blob storage service as a source dataset.
Step4: Create SQL database with required schema.
Step5: Do schema mapping
Step6: Finally Use SQL database table as sink

Options for ingesting and processing data in Azure sql

I need expert opinion on a project I am working on. We currently get data files that we load into our Azure sql database using a local script that calls stored procedures. I am planning on replacing the script with ssis jobs to load the data into our Azure Sql but wondering if that's a good option given our needs.I am opened to different suggestions too. The process we go through is to load data file to staging tables and validate before making updates to live tables. The validation and updates are done by calling stored procedures...so the ssis package will just load the data and make calls to those stored procedures. I have looked at ADF IR and Databricks but they seem overkill but am open to hear people with experience using those as well. I am currently running the ssis package locally as well. Any suggestion on better architecture or tools for this scenario? Thanks!
I would definitely have a look at Azure Data Factory Data flows. With this you can easily build your ETL pipelines in the a Azure Data Factory GUI.
In the following example two text files from a Blob Storage are read, joined, a surrogate key is added and finally the data is loaded to Azure Synapse Analytics (would be the same for Azure SQL):
You finally put this Mapping Data Flow into a pipeline and can trigger it, e. g. if new data arrives.
You can just BULK INSERT data from Azure Blob Store:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-bulk-access-to-data-in-azure-blob-storage?view=sql-server-ver15#accessing-data-in-a-csv-file-referencing-an-azure-blob-storage-location
Then you can use ADF (no IR) or Databricks or Azure Batch or Azure Elastic Jobs to schedule the execution.

Azure SQL DB - data file export (.csv) from azure sql

I am new to Azure SQL.
We have a client db which is in Azure SQL. We need to set up a process automation which extract query results to .CSV files and load it in our server (on premise SQL server 2008 R2).
What is the best method to generate csv files from Azure sql and make it accessible for the on premise server?
Honestly the best in terms of professional approach is to use Azure Data Factory and installation of Integration Runtime on the on premises.
You of course can use BCP but it will be cumbersome in the long run. A lot of scripts, tables, maintenance. No logging, no metrics, no alerts... Don't do it honestly.
SSIS is another option butin my opinion it takes more effort than ADF solution.
Azure Data Factory will allow you to do this in professional way using user interface with no coding. It also can be parametrized so you just change name of table name parameter and suddenly you are exporting 20, 50 or 100 tables at ease.
Here is video example and intro into data factory if you want to see quick overview. In this overview there is also demo which imports CSV to Azure SQL, you can just change it a little bit to make Azure SQL -> CSV and CSV > SQL server or just directly Azure SQL > SQL server.
https://youtu.be/EpDkxTHAhOs
It really is straightforward.
Consider using simple bcp from the on prem environment save the results to csv and then load the csv into the on prem server.
You can also use SSIS to implement an automated task.
Though I would like to know why you need the intermediate csv file? you can simply just copy data between databases (cloud -> On prem) with a scheduled SSIS package.
If you have on-prem SQL access then a simple SSIS package is probably the quickest and easiest way to go. If your source is Azure SQL and the ultimate destination is On-Prem SQL, you could use SSIS and skip the CSV all together.
If you want to stick to an Azure PAAS solution you could consider using Azure Data Factory. You can setup a gateway to access the on-prem SQL server directly or if you really want to stick to a CSC then look into using a Logic App.
Azure Data Factory is surely option.
Simple solution would be pyodbc driver with little bit of python. https://learn.microsoft.com/en-us/sql/connect/python/python-driver-for-sql-server?view=sql-server-2017
You can also try sqlcmd and bit of powershell or bash on top.
https://learn.microsoft.com/en-us/sql/tools/sqlcmd-utility?view=sql-server-2017