how to schedule a Query in Azure Synapse On-demand and save the result to a azure storage every 1 hour
my idea is to materialize the results into a separate storage and use PowerBI to access the results
Besides the fact that PowerBI can directly access your Synapse instance, if you want to go this route you have several options:
This can be done using a pipeline in the new Synapse Workspace. You should be aware that this technology is still in preview.
Use Polybase and Stored Procedures on a Job Scheduler to INSERT to a Blob Storage location. There is a lot of configuration in this option.
At present, I would recommend Azure Data Factory (ADF) on a Schedule Trigger. This is the simplest and most reliable of the current options. Based on the scenario you described, a single Copy activity could easily perform this task.
Related
I am moving from SSIS to Azure.
we have 100's of files and MSSQL tables that we want to push into a Gen2 data lake
using 3 zones then SQL Data Lake
Zones being Raw, Staging & Presentation (Change names as you wish)
What is the best process to automate this as much as possible
for example build a table with files / folders / tables to bring into Raw zone
then have Synapse bring these objects either full or incremental load
then process the them into the next 2 zones I guess more custom code as we progress.
Your requirement can be accomplished using multiple activities in Azure Data Factory.
To migrate SSIS packages, you need to use SSIS Integrated Runtime (IR). ADF supports SSIS Integration which can be configured by creating a new SSIS Integration runtime. To create the same, click on the Configure SSIS Integration, provide the basic details and create a new runtime.
Refer below image to create new SSIS IR.
Refer this third-party tutorial by SQLShack to Move local SSIS packages to Azure Data Factory.
Now, to copy the data to different zones using copy activity. You can make as much copy of your data as your requirement using copy activity. Refer Copy data between Azure data stores using Azure Data Factory.
ADF also supports Incrementally load data using Change Data Capture (CDC).
Note: Both Azure SQL MI and SQL Server support the Change Data Capture technology.
Tumbling window trigger and CDC window parameters need to be configured to make the incremental load automated. Check this official tutorial.
The last part:
then process them into the next 2 zones
This you need to manage programmatically as there is no such feature available in ADF which can update the other copies of the data based on CDC. You need to either create a separate CDC for those zones or do it logically.
We have a requirement to move data from oracle Cloud storage to Azure Cloud storage.
The requirement is basically to move data from an Oracle ADW database (hosted on Oracle cloud) to Snowflake database (hosted on Azure).
Since the data volume in tables is huge (some with 60mil+ records) we do not wish to use any ETL tool and instead want to setup a pipeline as below.
Oracle ADW database -> Store data in Oracle storage --> Move data to Azure Cloud storage -> Load into Snowflake using snowpipe or similar snowflake utilities.
How should I go about this implementation?
Also share your views on whether we can use Oracle fastconnect and Azure ExpressRoute to directly pull data from Oracle Cloud onto snowflake (or into Azure storage)
I am looking for the same thing with the simplest method from Oracle (on prem but could be cloud), into Snowflake. Looks like data must be exporeted or dropped to external tables, shifted to Azure Blob storage (like AWS S3), then pushed into Snowflake using COPY INTO - basically copying on disk external tables. This is what Snowpipe does:
"Snowpipe copies the files into a queue, from which they are loaded into the target table in a continuous, serverless fashion based on parameters defined in a specified pipe object. The following table indicates the cloud storage service support for automated Snowpipe from Snowflake accounts hosted on each cloud platform:"
It's been a while since I have worked with this. The other option is GoldenGate, which was not expensive the last time I looked into it:
https://www.snowflake.com/blog/continuous-data-replication-into-snowflake-with-oracle-goldengate/
Easy, simple, fast. Anyone have any better ideas would be appreciated.
I need expert opinion on a project I am working on. We currently get data files that we load into our Azure sql database using a local script that calls stored procedures. I am planning on replacing the script with ssis jobs to load the data into our Azure Sql but wondering if that's a good option given our needs.I am opened to different suggestions too. The process we go through is to load data file to staging tables and validate before making updates to live tables. The validation and updates are done by calling stored procedures...so the ssis package will just load the data and make calls to those stored procedures. I have looked at ADF IR and Databricks but they seem overkill but am open to hear people with experience using those as well. I am currently running the ssis package locally as well. Any suggestion on better architecture or tools for this scenario? Thanks!
I would definitely have a look at Azure Data Factory Data flows. With this you can easily build your ETL pipelines in the a Azure Data Factory GUI.
In the following example two text files from a Blob Storage are read, joined, a surrogate key is added and finally the data is loaded to Azure Synapse Analytics (would be the same for Azure SQL):
You finally put this Mapping Data Flow into a pipeline and can trigger it, e. g. if new data arrives.
You can just BULK INSERT data from Azure Blob Store:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-bulk-access-to-data-in-azure-blob-storage?view=sql-server-ver15#accessing-data-in-a-csv-file-referencing-an-azure-blob-storage-location
Then you can use ADF (no IR) or Databricks or Azure Batch or Azure Elastic Jobs to schedule the execution.
I want to execute a query in azure data lake daily. Can we schedule a U-SQL query in azure data lake?
Currently, there is no built-in way inside Data Lake Analytics to schedule a U-SQL job. Instead, you can use other services or tools to perform the scheduling. A popular one for Azure customers is Azure Data Factory.
Simple scheduling of U-SQL jobs inside Data Lake Analytics is something we are considering adding as a native capability.
There's two ways to execute a query in azure data lake daily:
Using ADF and Store the U-SQL script in Blob Storage and reference it via a Blob Storage linked service.
Create a SSIS Package using visual studio then import this package in SqlServer Agent serves Job . see Schedule U-SQL jobs
I've done a fair bit of reading and it seems like there are a couple of off-the-shelf products that replicate/sync data from on-premise database to Azure SQL Data Warehouse but I've found nothing that syncs using an Azure database as the source. The Azure Data Factory holds some promise however it looks more suited to one off loads.
Anyone know of a way? (SSIS package not really an option as I want the transfer to occur wholly inside the cloud)
Azure Data Factory can run continuous loads from SQL Database to SQL Data Warehouse. You'll want to look into the frequency and interval parameters for the pipeline
The documentation is here https://azure.microsoft.com/en-us/documentation/articles/data-factory-create-datasets/.