I want to execute a query in azure data lake daily. Can we schedule a U-SQL query in azure data lake?
Currently, there is no built-in way inside Data Lake Analytics to schedule a U-SQL job. Instead, you can use other services or tools to perform the scheduling. A popular one for Azure customers is Azure Data Factory.
Simple scheduling of U-SQL jobs inside Data Lake Analytics is something we are considering adding as a native capability.
There's two ways to execute a query in azure data lake daily:
Using ADF and Store the U-SQL script in Blob Storage and reference it via a Blob Storage linked service.
Create a SSIS Package using visual studio then import this package in SqlServer Agent serves Job . see Schedule U-SQL jobs
Related
Can we move data from onprem sql server to azure synapse with out using a azure data factory?
My main intention behind this is not to have intermediate storage of data in blob or storage container. the bulk insert works in ADF but do we have any other option here.
In case if this is a 1 time activity, you can use database migration assistant tool.
Else you can also leverage ssis to transfer data between on prem sql and synapse.
how to schedule a Query in Azure Synapse On-demand and save the result to a azure storage every 1 hour
my idea is to materialize the results into a separate storage and use PowerBI to access the results
Besides the fact that PowerBI can directly access your Synapse instance, if you want to go this route you have several options:
This can be done using a pipeline in the new Synapse Workspace. You should be aware that this technology is still in preview.
Use Polybase and Stored Procedures on a Job Scheduler to INSERT to a Blob Storage location. There is a lot of configuration in this option.
At present, I would recommend Azure Data Factory (ADF) on a Schedule Trigger. This is the simplest and most reliable of the current options. Based on the scenario you described, a single Copy activity could easily perform this task.
I need expert opinion on a project I am working on. We currently get data files that we load into our Azure sql database using a local script that calls stored procedures. I am planning on replacing the script with ssis jobs to load the data into our Azure Sql but wondering if that's a good option given our needs.I am opened to different suggestions too. The process we go through is to load data file to staging tables and validate before making updates to live tables. The validation and updates are done by calling stored procedures...so the ssis package will just load the data and make calls to those stored procedures. I have looked at ADF IR and Databricks but they seem overkill but am open to hear people with experience using those as well. I am currently running the ssis package locally as well. Any suggestion on better architecture or tools for this scenario? Thanks!
I would definitely have a look at Azure Data Factory Data flows. With this you can easily build your ETL pipelines in the a Azure Data Factory GUI.
In the following example two text files from a Blob Storage are read, joined, a surrogate key is added and finally the data is loaded to Azure Synapse Analytics (would be the same for Azure SQL):
You finally put this Mapping Data Flow into a pipeline and can trigger it, e. g. if new data arrives.
You can just BULK INSERT data from Azure Blob Store:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-bulk-access-to-data-in-azure-blob-storage?view=sql-server-ver15#accessing-data-in-a-csv-file-referencing-an-azure-blob-storage-location
Then you can use ADF (no IR) or Databricks or Azure Batch or Azure Elastic Jobs to schedule the execution.
Is there a way to output U-SQL results directly to a SQL DB such as Azure SQL DB? Couldn't find much about that.
Thanks!
U-SQL only currently outputs to files or internal tables (ie tables within ADLA databases), but you have a couple of options. Azure SQL Database has recently gained the ability to load files from Azure Blob Storage using either BULK INSERT or OPENROWSET, so you could try that. This article shows the syntax and gives a reminder that:
Azure Blob storage containers with public blobs or public containers
access permissions are not currently supported.
wasb://<BlobContainerName>#<StorageAccountName>.blob.core.windows.net/yourFolder/yourFile.txt
BULK INSERT and OPENROWSET with Azure Blob Storage is shown here:
https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/02/23/loading-files-from-azure-blob-storage-into-azure-sql-database/
You could also use Azure Data Factory (ADF). Its Copy Activity could load the data from Azure Data Lake Storage (ADLS) to an Azure SQL Database in two steps:
execute U-SQL script which creates output files in ADLS (internal tables are not currently supported as a source in ADF)
move the data from ADLS to Azure SQL Database
As a final option, if your data is likely to get into larger volumes (ie Terabytes (TB) then you could use Azure SQL Data Warehouse which supports Polybase. Polybase now supports both Azure Blob Storage and ADLS as a source.
Perhaps if you can tell us a bit more about your process we can refine which of these options is most suitable for you.
Migration of on-premise SSIS packages to Azure SQL Data Warehouse.
Can someone suggest references or ideas/steps involved in modifying existing SSIS packages that loads a on-premise sql data warehouse to populate a SQL Data Warehouse?
Is this possible?
Regards,
KK
If you would like to just use your existing SSIS package without changing much, it can be as simple as re-configuring the OLEDB destination to connect to Azure Data Warehouse endpoint.
But then, the right way to go about loading data to Azure DW depends on the amount of data involved and what intervals. If you are exporting large amounts of data at regular intervals, then you might want to edit your SSIS package to first stage the data in Azure blob storage flat files. Next, use execute SQL task to create external tables via Polybase and then use CREATE TABLE AS dbo.InternalTable AS SELECT * FROM blob.ExternalTable.
Please check this guidance from Microsoft
If you use the latest feature pack you can go (via blob storage) into the destination table.
https://msdn.microsoft.com/en-US/library/mt146770.aspx