Can we use SQL scripts (Develop hub) during pipeline creation (Integrate hub) in azure synapse? - azure-synapse

I want to use my SQL script (present under Develop hub) file inside a Pipeline (present under Integrate hub). Currently I do not see any Activities available solving this purpose.
There is one Script activity under General section which only have a Query & NonQuery option, not for referring any SQL script file created earlier.
Is that feature available at all in Azure Synapse Analytics? Can we refer to SQL script by some other means?

If your Synapse workspace is paired with Azure DevOps then I imagine it’s easy to get the file content with a REST API call (eg here). However then you have to parse the file as GO is not supported by the Script activity. ADF / Synapse Pipeline functions do not support a RegEx style split eg word boundary and GO (\bGO\b) so it starts to get kind of fiddly. I had some success with replace and uriComponent functions.
However you would be better of using Stored Procedures and the Stored Proc activity in Synapse Pipelines - much simpler implementation.

Related

how to use Azure Synapse database templates programmatically

I can create an Azure data lake database with pre-built tables using Azure Synapse database templates from the Synapse Studio UI, but is there a way to use these templates programmatically? So far from my research I have not found a command, API, or SDK for this. Perhaps I could create the database and tables via the UI, then generate the associated spark sql creation scripts, but don't see a way how to do that either. Does anyone have any ideas on how to do either of the prior?
You can create the data lake storage, tables and data insertion programmatically using Azure SDKs. But these templates have been made available to overcome these series of manual tasks. Using these templates save your time and efforts to create an environment and sample data for your development.
Therefore, asking to deploy these templates programmatically challenging the complete concept of templates. If you want to deploy these resources manually, you can use Azure SDKs.

how to schedule a query in Azure synapse on-demand

how to schedule a Query in Azure Synapse On-demand and save the result to a azure storage every 1 hour
my idea is to materialize the results into a separate storage and use PowerBI to access the results
Besides the fact that PowerBI can directly access your Synapse instance, if you want to go this route you have several options:
This can be done using a pipeline in the new Synapse Workspace. You should be aware that this technology is still in preview.
Use Polybase and Stored Procedures on a Job Scheduler to INSERT to a Blob Storage location. There is a lot of configuration in this option.
At present, I would recommend Azure Data Factory (ADF) on a Schedule Trigger. This is the simplest and most reliable of the current options. Based on the scenario you described, a single Copy activity could easily perform this task.

Options for ingesting and processing data in Azure sql

I need expert opinion on a project I am working on. We currently get data files that we load into our Azure sql database using a local script that calls stored procedures. I am planning on replacing the script with ssis jobs to load the data into our Azure Sql but wondering if that's a good option given our needs.I am opened to different suggestions too. The process we go through is to load data file to staging tables and validate before making updates to live tables. The validation and updates are done by calling stored procedures...so the ssis package will just load the data and make calls to those stored procedures. I have looked at ADF IR and Databricks but they seem overkill but am open to hear people with experience using those as well. I am currently running the ssis package locally as well. Any suggestion on better architecture or tools for this scenario? Thanks!
I would definitely have a look at Azure Data Factory Data flows. With this you can easily build your ETL pipelines in the a Azure Data Factory GUI.
In the following example two text files from a Blob Storage are read, joined, a surrogate key is added and finally the data is loaded to Azure Synapse Analytics (would be the same for Azure SQL):
You finally put this Mapping Data Flow into a pipeline and can trigger it, e. g. if new data arrives.
You can just BULK INSERT data from Azure Blob Store:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-bulk-access-to-data-in-azure-blob-storage?view=sql-server-ver15#accessing-data-in-a-csv-file-referencing-an-azure-blob-storage-location
Then you can use ADF (no IR) or Databricks or Azure Batch or Azure Elastic Jobs to schedule the execution.

Perform custom SQL query with Google Cloud Data Fusion

I have data pipelines that consist of multiple SQL queries being run against BigQuery tables, I would like to build these in Google Cloud Fusion, but I don't see an option to transform/select with custom SQL.
is this available, or am I misinterpreting the use cases for this tool?
A new Action plugin is being added that would allow you to specify a SQL to run in BQ. Expect the connectors to be available in Hub by mid May.
Nitin
There is now a native BigQuery Execute action that allows SQL queries to run as part of a Data Fusion Pipeline.
This job is an action, see below from the official documentation:
Action plugins define custom actions that are scheduled to take place
during a workflow but don't directly manipulate data in the workflow.
For example, using the Database custom action, you can run an
arbitrary database command at the end of your pipeline. Alternatively,
you can trigger an action to move files within Cloud Storage.

Deploying USQL project

I am new to data lake analytics and using USQL.
I am currently setting up data factory pipeline which would replace an existing SSIS workflow. The data factory pipeline would essentially
Extract data transactional database into ADLS
Transform raw entities using USQL
Load the data into SSAS using custom activity
Question
I have a USQL project set up and wanted if there was a standard way of deploying them to ADLA other than just uploading the scripts to a folder in the store.
Great question!
I'm not sure about a standard way, or even a way that might be considered best practice yet. But I use all of the tools you mention to perform very similar tasks.
To try and answer your question: What I do is create the U-SQL scripts as stored procedures within the logical ADLA database. In the VS USQL project I have 1 script per stored proc. The ADF activities then call the proc name. This gives you the right level of disconnection between services and also means you don't need additional blob storage for USQL files.
In my VS solution I often also have a PowerShell project to help manage things. Specifically one what takes all my 'usp_' U-SQL scripts to create one big DDL style thing that can be deployed to the logical ADLA database.
The PowerShell then does the deployment for me using the submit job cmdlet. Example below.
Submit-AzureRmDataLakeAnalyticsJob `
-Name $JobName `
-AccountName $DLAnalytics `
–Script $USQLProcDeployAll `
-DegreeOfParallelism $DLAnalyticsDoP
Hope this gives you a steer. I also accept that these tools are still fairly new. So open to other suggestions.
Cheers