azue synapse copy from Azure Sql to Datalake Table - azure-synapse

i want to copy data from azure Sql Tabel to Datalake Storage account table using synapse analytics, in the Datalake table i want to store table name and max id for the incremental load, is this possibe

If your requirement is only to transfer the data from Azure SQL Database to Data Lake Storage (ADLS) account and no big data analysis required, you can simply use Copy activity in either Azure Data Factory (ADF) or Synapse pipeline.
ADF also allows you to perform required transformation on your data before storing it into the destination using data flow activity.
Refer this official tutorial to Copy data from a SQL Server database to Azure Blob storage.
Now coming to Incremental load, ADF and Synapse pipelines both provide complete inbuilt support for it. You need to select a column as Watermark column in your source table.
Watermark column in the source data store, which can be used to slice
the new or updated records for every run. Normally, the data in this
selected column (for example, last_modify_time or ID) keeps increasing
when rows are created or updated. The maximum value in this column is
used as a watermark.
Microsoft provides a complete step-by-step tutorial to Incrementally load data from Azure SQL Database to Azure Blob storage using the Azure portal which you can follow and implement with appropriate changes as per your use case.
Apart from watermark technique, there are other methods which you can choose to manage incremental load. Check here.

Related

Load multiple files using Azure Data factory or Synapse

I am moving from SSIS to Azure.
we have 100's of files and MSSQL tables that we want to push into a Gen2 data lake
using 3 zones then SQL Data Lake
Zones being Raw, Staging & Presentation (Change names as you wish)
What is the best process to automate this as much as possible
for example build a table with files / folders / tables to bring into Raw zone
then have Synapse bring these objects either full or incremental load
then process the them into the next 2 zones I guess more custom code as we progress.
Your requirement can be accomplished using multiple activities in Azure Data Factory.
To migrate SSIS packages, you need to use SSIS Integrated Runtime (IR). ADF supports SSIS Integration which can be configured by creating a new SSIS Integration runtime. To create the same, click on the Configure SSIS Integration, provide the basic details and create a new runtime.
Refer below image to create new SSIS IR.
Refer this third-party tutorial by SQLShack to Move local SSIS packages to Azure Data Factory.
Now, to copy the data to different zones using copy activity. You can make as much copy of your data as your requirement using copy activity. Refer Copy data between Azure data stores using Azure Data Factory.
ADF also supports Incrementally load data using Change Data Capture (CDC).
Note: Both Azure SQL MI and SQL Server support the Change Data Capture technology.
Tumbling window trigger and CDC window parameters need to be configured to make the incremental load automated. Check this official tutorial.
The last part:
then process them into the next 2 zones
This you need to manage programmatically as there is no such feature available in ADF which can update the other copies of the data based on CDC. You need to either create a separate CDC for those zones or do it logically.

Can we join tables in on-premise SQL Server database to tables in Delta tables in Azure Delta lake? what are my options

I am archiving rows that are older than a year into ADLSv2 as delta tables, when there is a need to report on that data, I need to join archived data with some tables existing on on-premise database. Is there a way we can do a join without re-hydrating from or hydrating data to cloud?
Yes, you can achieve this task by using Azure Data Factory.
Azure Data Factory (ADF) is a fully managed, serverless data integration
service. Visually integrate data sources with more than 90 built-in,
maintenance-free connectors at no added cost. Easily construct ETL and
ELT processes code-free in an intuitive environment or write your own
code.
Firstly, you need to install the Self-hosted Integration Runtime in your local machine to access the on-premises SQL Server in ADF. To accomplish this, refer Connect to On-premises Data in Azure Data Factory with the Self-hosted Integration Runtime.
As you have archived the data in ADLS, you need to change the Access tier of that container from Cold -> Hot in order to retrieve the data in ADF.
Later, create a Linked Service using Self-hosted IR which you have created. Create a Dataset using this Linked Service to access the on-premises database.
Similarly, create a Linked Service using default Azure IR. Create a Dataset using this Linked Service to access the data from ADLS.
Now, you also require a destination database where you will store the data after join. If you are storing it in same on-premises database, you can use the existing Linked Service but you need to create a new Dataset mentioning the destination table name.
Once all this configuration done, create a Data Flow activity pipeline in ADF.
Mapping data flows are visually designed data transformations in Azure
Data Factory. Data flows allow data engineers to develop data
transformation logic without writing code. The resulting data flows
are executed as activities within Azure Data Factory pipelines that
use scaled-out Apache Spark clusters.
Learn more about Mapping data flow here.
Finally, in data-flow activity, your sources will be on-premises dataset and ADLS dataset which you have created above. You will be using join transformation in mapping data flow to combine data from two sources. The output stream will include all columns from both sources matched based on a join condition.
The sink transformation will take your destination dataset where the data will be stored as an output.

Azure Data Factory Copy Activity to Copy to Azure Data Lake Table

I need to Copy data incrementally from On-Prem SQL server into Table in Azure Data Lake Store.
But when creating Copy Activity using Azure Portal, in the Destination I only see the folders(No option for Tables).
How can I do scheduled On-prem table to Data Lake Table Syncs?
Data Lake Store does not have a notion of tables. It is file storage system (like HDFS). You can however use capabilities such as Hive or Data Lake Analytics on top of your data stored in Data Lake Store to conform your data to a schema. In hive, you can do that using external tables, while in Data Lake Analytics you can run a simple extract script.
I hope this helps!
Azure Data Lake Analytics (ADLA) does have the concept of databases which have tables. However they are not currently exposed as a target in Data Factory. I believe it's on the backlog, although I can't find the reference right now.
What you could do is use Data Factory to copy data into Data Lake Store then run a U-SQL script which imports it into the ADLA database.
If you do feel this is an important feature, you can create a request here and vote for it:
https://feedback.azure.com/forums/327234-data-lake
ADLA Databases and tables:

How to move sharepoint list or excel file to azure sql dw?

I want to copy data from sharepoint to microsoft azure sql DW using azure datafactory or alternative service. Can I do this. Please anyone help me with this.
You can do this by setting up a data pipeline using Azure Data Factory to Azure blob storage. Afterwards you can use Azure's fast PolyBase technology to load the data from blob to your SQL Data Warehouse instance.
Can I ask how much data you intend on loading into the DW? Azure Data Warehouse is intended for use with at least terabyte level data up to petabyte compute and storage. I only ask because each SharePoint list or Excel file has a maximum of 2GB per file.

Error trying to move data from Azure table to DataLake store with DataFactory

I've been building a Datafactory pipeline to move data from my azure table storage to a datalake store, but the tasks fail with an exception that I can't find any information on. The error is
Copy activity encountered a user error: ErrorCode=UserErrorTabularCopyBehaviorNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=CopyBehavior property is not supported if the source is tabular data source.,Source=Microsoft.DataTransfer.ClientLibrary,'.
I don't know where the problem lies, if in the datasets, the linked services or the pipeline, and can't seem to find any info at all on the error I'm seeing on the console.
Since the copy behavior from Azure Table Storage to Azure Data Lake Store is not currently supported as a temporary work around you could go from Azure Table Storage to Azure Blob Storage to Azure Data Lake store.
Azure Table Storage to Azure Blob Storage
Azure Blob Storage to Azure Data Lake Store
I know this is not ideal solution but if you are under time constraints, it is just an intermediary step to get the data into the data lake.
HTH
The 'CopyBehaviour' property is not supported for Table storage (which is not a file based store) that you are trying to use as a source in ADF copy activity. That is the reason why you are seeing this error message.