How to increase performance on Azure inbuilt SQL Serverless Pool in Synapse - azure-synapse

We are currently extracting multiple tables from Azure SQL Servereless pool in Synapse. Unlike a regular Azure SQL Database it is very easy to increase the performance from Basic all the way through to Premium or Business continuity.
Can someone let me know how to go about increasing the performance of Azure SQL Serverles Pool in synapse?

Serverless SQL pool is a distributed data processing system and it doesn't have any inbuilt storage to store data. It uses external table to query the data from Azure data lake storage. Therefore, data cannot be copied to the serverless SQL pool. If data needs to be extracted from serverless SQL pool, you can extract data directly from the underlying external storage. If the target datastore supports polybase data loading, use that to load to the target table from ADLS.

Related

Where is data physically stored in Azure Synapse Dedicated SQL Pool?

Documentation from Microsoft and others strongly emphasizes the separation between storage and compute in Azure Synapse Analytics.
In the case of a Serverless SQL pool, it is clearly explained that the data is stored in an Azure Data Lake DSL Gen2.
However, in the case of a Dedicated SQL Pool, the documentation is not explicit enough on data storage.
In a book that deals with Azure Synapse, it is stated that in the case of Dedicated SQL Pool, data is stored in Storage Nodes which are completely separate from Compute Nodes.
Since this claim is not in Microsoft's documentation, I dare not trust it.
So, is there an official resource that sheds light on this question?
This is a question that has been on my mind for a long time as well. However, I have come to the conclusion that data is actually stored in Dedicated SQL Pools.
Let me explain why I believe this.
Take a look at the documentation given here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/quickstart-copy-activity-load-sql-pool
Notice that it is about loading data into a Dedicated SQL Pool. Further, to quote part of the documentation,
A dedicated SQL pool offers T-SQL based compute and storage
capabilities. After creating a dedicated SQL pool in your Synapse
workspace, data can be loaded, modeled, processed, and delivered for
faster analytic insight.
It is said that Dedicated SQL Pools provide both compute and storage capabilities.
Furthermore, with Dedicated SQL Pools, you may already know that it is possible to create traditional tables. We can organize these tables into something along the lines of a star or snowflake schema to model our data warehouses.
Creation of such tables, however, is not possible with Serverless SQL Pools. Only the creation of metadata objects, i.e. views or external tables are allowed. This is explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/on-demand-workspace-overview
To quote the relevant passage of the article,
Serverless SQL pool has no local storage, only metadata objects are
stored in databases. Therefore, T-SQL related to the following
concepts isn't supported:
Tables Triggers Materialized views DDL statements other than ones
related to views and security DML statements
To me, the fact that tables can actually be created in Dedicated SQL Pools is further proof that the data is physically stored in them.
My final argument is around the idea of distributions. The concept is explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture
This talks about how data is divided up among the compute nodes and how queries are executed in parallel on the distributions in these nodes. It would not be possible to implement this if the data was not actually stored in these nodes.
In my humble opinion, how I believe Azure Storage comes into the picture (at least, when it comes to Dedicated SQL Pools) is with regards to storing data as files in a data lake and then ingesting them into the pool for analysis.
An explanation can be found here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/overview-architecture
Yet another quote,
Serverless SQL pool allows you to query your data lake files, while
dedicated SQL pool allows you to query and ingest data from your data
lake files. When data is ingested into dedicated SQL pool, the data is
sharded into distributions to optimize the performance of the system.
This is where Polybase comes into play. You can define various data loading patterns (into Dedicated SQL Pools) using Polybase as explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview
The Microsoft documentation on Design tables using dedicated SQL pool in Azure Synapse Analytics, found at https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview, states the following:
Table persistence: Tables store data either permanently in Azure
Storage, temporarily in Azure Storage, or in a data store external to
dedicated SQL pool.
Regular table A regular table stores data in Azure Storage as part of
dedicated SQL pool...

Azure SQL Pool what really it is and could it be used for Postgres database

I have question regarding SQL Pool. Not sure i understood what it is. Does SQL Pool service is the service for SQL Server type databases? I have Postgres database and consider to move it to Azure nevertheless what is there any usage of SQL Pool service in case of Azure Postgres or it's only for Azure SQL Server database? Last: Does SQL Pool also used by Synapse ETL?
Azure SQL Pool is used with Azure Synapse Analytics to query Big Data. You can consider it as a Data Warehouse. Once your dedicated SQL pool is created, you can import big data with simple PolyBase T-SQL queries, and then use the power of the distributed query engine to run high-performance analytics.
How SQL Pool works? In a cloud data solution, data is ingested into big data stores from a variety of sources. Once in a big data store, Hadoop, Spark, and machine learning algorithms prepare and train the data. When the data is ready for complex analysis, dedicated SQL pool uses PolyBase to query the big data stores. PolyBase uses standard T-SQL queries to bring the data into dedicated SQL pool tables.
No, PostgreSQL can't be used in SQL Pool. There is actually no link between these two services. If you want to migrate the on-premises PostgreSQL to Azure, you can use Azure Database for PostgreSQL. Check Tutorial: Migrate PostgreSQL to Azure DB for PostgreSQL online using DMS via the Azure CLI.

Oracle Cloud to Azure Cloud storage

We have a requirement to move data from oracle Cloud storage to Azure Cloud storage.
The requirement is basically to move data from an Oracle ADW database (hosted on Oracle cloud) to Snowflake database (hosted on Azure).
Since the data volume in tables is huge (some with 60mil+ records) we do not wish to use any ETL tool and instead want to setup a pipeline as below.
Oracle ADW database -> Store data in Oracle storage --> Move data to Azure Cloud storage -> Load into Snowflake using snowpipe or similar snowflake utilities.
How should I go about this implementation?
Also share your views on whether we can use Oracle fastconnect and Azure ExpressRoute to directly pull data from Oracle Cloud onto snowflake (or into Azure storage)
I am looking for the same thing with the simplest method from Oracle (on prem but could be cloud), into Snowflake. Looks like data must be exporeted or dropped to external tables, shifted to Azure Blob storage (like AWS S3), then pushed into Snowflake using COPY INTO - basically copying on disk external tables. This is what Snowpipe does:
"Snowpipe copies the files into a queue, from which they are loaded into the target table in a continuous, serverless fashion based on parameters defined in a specified pipe object. The following table indicates the cloud storage service support for automated Snowpipe from Snowflake accounts hosted on each cloud platform:"
It's been a while since I have worked with this. The other option is GoldenGate, which was not expensive the last time I looked into it:
https://www.snowflake.com/blog/continuous-data-replication-into-snowflake-with-oracle-goldengate/
Easy, simple, fast. Anyone have any better ideas would be appreciated.

U SQL: direct output to SQL DB

Is there a way to output U-SQL results directly to a SQL DB such as Azure SQL DB? Couldn't find much about that.
Thanks!
U-SQL only currently outputs to files or internal tables (ie tables within ADLA databases), but you have a couple of options. Azure SQL Database has recently gained the ability to load files from Azure Blob Storage using either BULK INSERT or OPENROWSET, so you could try that. This article shows the syntax and gives a reminder that:
Azure Blob storage containers with public blobs or public containers
access permissions are not currently supported.
wasb://<BlobContainerName>#<StorageAccountName>.blob.core.windows.net/yourFolder/yourFile.txt
BULK INSERT and OPENROWSET with Azure Blob Storage is shown here:
https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/02/23/loading-files-from-azure-blob-storage-into-azure-sql-database/
You could also use Azure Data Factory (ADF). Its Copy Activity could load the data from Azure Data Lake Storage (ADLS) to an Azure SQL Database in two steps:
execute U-SQL script which creates output files in ADLS (internal tables are not currently supported as a source in ADF)
move the data from ADLS to Azure SQL Database
As a final option, if your data is likely to get into larger volumes (ie Terabytes (TB) then you could use Azure SQL Data Warehouse which supports Polybase. Polybase now supports both Azure Blob Storage and ADLS as a source.
Perhaps if you can tell us a bit more about your process we can refine which of these options is most suitable for you.

How to move sharepoint list or excel file to azure sql dw?

I want to copy data from sharepoint to microsoft azure sql DW using azure datafactory or alternative service. Can I do this. Please anyone help me with this.
You can do this by setting up a data pipeline using Azure Data Factory to Azure blob storage. Afterwards you can use Azure's fast PolyBase technology to load the data from blob to your SQL Data Warehouse instance.
Can I ask how much data you intend on loading into the DW? Azure Data Warehouse is intended for use with at least terabyte level data up to petabyte compute and storage. I only ask because each SharePoint list or Excel file has a maximum of 2GB per file.