IF and Switch activities not available in the pipeline Activities pane - azure-synapse

Using Synapse ADF, IF and Switch activities are not available in the pipeline Activities pane
Microsoft says they should be there If Condition activity - Azure Data Factory & Azure Synapse
Is the list of activities configurable ? How to add activities ?

You cannot nest these activities so If and Switch will not appear on the menu when you are inside an If or a Switch. You also cannot nest For Each activities. Simply come back up to the top level of your pipeline and you will see the activities.
If you have more complex logic, think about using the logical functions in the expression language like and and or, or using like a Stored Procedure, Databricks Notebook, Azure Synapse Notebook, etc

Related

Usage Tracking in Azure synapse analytics

Can anyone share a Kusto query (KQL) that I can use in log analytics that would return some usage tracking stats?
I am trying to identify which "Views" and "Tables" are used the most. Also trying to find out who the power users are and commands/query that is run against the "Tables".
Any insights would be appreciated.
You can use below functions to gather the useage statics
DiagnosticMetricsExpand()
DiagnosticLogsExpand()
ActivityLogRecordsExpand()
And create target tables to store the function data to analyse the useage information.
Refer the Azure documentation for complete details https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-no-code?tabs=activity-logs
Tutorial: Ingest monitoring data in Azure Data Explorer without code
In this tutorial, you learn how to ingest monitoring data to Azure Data Explorer without one line of code and query that data.

Can we use SQL scripts (Develop hub) during pipeline creation (Integrate hub) in azure synapse?

I want to use my SQL script (present under Develop hub) file inside a Pipeline (present under Integrate hub). Currently I do not see any Activities available solving this purpose.
There is one Script activity under General section which only have a Query & NonQuery option, not for referring any SQL script file created earlier.
Is that feature available at all in Azure Synapse Analytics? Can we refer to SQL script by some other means?
If your Synapse workspace is paired with Azure DevOps then I imagine it’s easy to get the file content with a REST API call (eg here). However then you have to parse the file as GO is not supported by the Script activity. ADF / Synapse Pipeline functions do not support a RegEx style split eg word boundary and GO (\bGO\b) so it starts to get kind of fiddly. I had some success with replace and uriComponent functions.
However you would be better of using Stored Procedures and the Stored Proc activity in Synapse Pipelines - much simpler implementation.

Best methods for processing a rule hierarchy/tree

I'm using Azure SQL Pools/Synapse/SQL DW and have a rule hierarchy that I need to process. At each level a parent can specify if all (AND) or any (OR) children are required in order for the rule to be satisfied. Each level in the hierarchy can specify a different condition to the parent (so you could have an AND condition that contains an OR etc.)
In pure SQL this can be implemented as a loop that starts from leaf level and parses each level by left joining the hierarchy onto the data to be evaluated. Any data that does not match the condition is pruned from the dataset. AND conditions are processed by counting the distinct number of children that exist and the distinct number of children that match.
This creates a lot of complex SQL to maintain, as well as using a less efficient loop. I suspect that the graph functionality may be a better structure here, but cannot see any inbuilt functionality that would actually help with the processing. Likewise hierarchyid sounds appropriate for this however I don't believe it exists in Azure Synapse/Pools/DW
Azure Synapse Analytics dedicated SQL pools do not support either the graph tables or the hierarchyId available in SQL Server box product and Azure SQL DB. Therefore your best option is to probably use a nearby Azure SQL DB to do this processing. Use Azure Data Factory (ADF) or Synapse Pipelines to move data between them.
Alternately, I've done a few question answers which I think give good coverage on using graph or hierarchical data in Synapse and some of the approaches
which include: using Azure SQL DB, using WHILE loops and using Azure Synapse Notebooks and the GraphFrames library:
Recursive Query in Azure Synapse Analytics for Dates
This was where someone thought they needed a recursive query but did not:
Recursive Query in Azure Synapse Analytics for Dates
Synapse top level parent hierarchy coverage and examples of the SQL loops and GraphFrames option: https://stackoverflow.com/a/67065509/1527504
The second question in particular is quite thorough.

how to schedule a query in Azure synapse on-demand

how to schedule a Query in Azure Synapse On-demand and save the result to a azure storage every 1 hour
my idea is to materialize the results into a separate storage and use PowerBI to access the results
Besides the fact that PowerBI can directly access your Synapse instance, if you want to go this route you have several options:
This can be done using a pipeline in the new Synapse Workspace. You should be aware that this technology is still in preview.
Use Polybase and Stored Procedures on a Job Scheduler to INSERT to a Blob Storage location. There is a lot of configuration in this option.
At present, I would recommend Azure Data Factory (ADF) on a Schedule Trigger. This is the simplest and most reliable of the current options. Based on the scenario you described, a single Copy activity could easily perform this task.

Perform custom SQL query with Google Cloud Data Fusion

I have data pipelines that consist of multiple SQL queries being run against BigQuery tables, I would like to build these in Google Cloud Fusion, but I don't see an option to transform/select with custom SQL.
is this available, or am I misinterpreting the use cases for this tool?
A new Action plugin is being added that would allow you to specify a SQL to run in BQ. Expect the connectors to be available in Hub by mid May.
Nitin
There is now a native BigQuery Execute action that allows SQL queries to run as part of a Data Fusion Pipeline.
This job is an action, see below from the official documentation:
Action plugins define custom actions that are scheduled to take place
during a workflow but don't directly manipulate data in the workflow.
For example, using the Database custom action, you can run an
arbitrary database command at the end of your pipeline. Alternatively,
you can trigger an action to move files within Cloud Storage.