Azure Synapse AI using dedicated SQL server VM - azure-synapse

Is it possible to add dedicated SQL server VM to azure synapse sql pool for machine learning purposes?

No. You can only create a new dedicated sql pool.
Moreover, the official guide is to use AML service for ML related experiments - both training and inference. Here is a guide: https://learn.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-sql-pool-model-scoring-wizard
If you wanna use your dedicated SQL Server VM, it can be used as Integration->Integration runtimes->Self-hosted. But that is for data flows/pipelines so totally different objective (ref: Azure Data Factory).

Related

Can we use spark pool to process data from dedicated SQL pool and is that a good architecture?

Is it a good design to use spark pool for processing data which comes in dedicated SQL pool and again write back to dedicated SQL pool and to adls.
As of now everything we r doing with dedicated SQL pool so if we add spark pool so will it be more efficient or it will just be burden to existing dedicated SQL pool.
Yes, you can use spark pool to process data from dedicated SQL pool and is that a good architecture as there it is recommended and directly support by Microsoft Officials.
The Synapse Dedicated SQL Pool Connector is an API that efficiently
moves data between Apache Spark runtime and Dedicated SQL pool in
Azure Synapse Analytics. This connector is available in Scala.
If your project required large scale streaming you can definitely go for Apache Spark. There won't be any burden on existing architecture. You will get expected results.
Refer: Azure Synapse Dedicated SQL Pool connector for Apache Spark

Copy Data From On-Premise SQL Server To Azure SQL - Azure Private Network

Requirement: I wanted to copy data from a specific table/view residing on a on-premise SQL Server to Azure SQL DB.
Infrastructure: As depicted in below picture. Essentially, the Azure network is directly connected with corporate network over Express Route. Thus it's a pure private network connection; as good as the corporate network itself.
Issue/Question: I know there are multiple approaches present to get this operation done and I am not restricted to use ADF copy Data tool only. BUT, for all of these I see some cavets or extra steps needed to be done as below:
ADF Copy Data Tool: Needs a SH-IR and a small MSI package needs to be installed on on-premise machine which hosts the SQL server for registration purpose.
Logic Apps: Needs a Virtual Gateway (OR) ASE
App Service: If the operation is wrapped in a C# application and I choose to deploy to a Azure Web Apps. Then in-order to connect to on-premise SQL Server we need to setup hybrid connection manager and as in #1 we need to install something in on-premise machine.
For my case, none of these extra steps can be done. essentially, the on-premise SQL Server comes under a different BU and thus I don't have any permission there; except they have given grant to a table/view. Thus, none of these extra shitty steps can be done.
Moreover, as mentioned above; since it's connected over express route as direct connection, As can be seen in above picture, both the on-premise and azure SQL are essentially inside the same corporate network. THUS, I should be able to access them directly without configuring any of these extra steps as mentioned above.
Please confirm on these and provide a suggestion.
Thank You.
You can still go with the ADF scenario without a SHIR by creating ADF in a Managed VNET using Private Endpoint. As you already have an ER circuit and have the flexibility to configure the Azure side, can you do this with Azure IR: Access on-premises SQL Server from Data Factory Managed VNet using Private Endpoint - Azure Data Factory | Microsoft Docs
There are 2 solutions which could work for your scenario but even for them to work ,you would need access to on prem SQL server machine access to some extent atleast for one time config and Azure SQL db should be accessible via SSMS installed on on-prem machine.
Using linked server
You can create a linked server ( process explained here https://www.sqlshack.com/create-linked-server-azure-sql-database/ ) on on-prem server and create a agent server job to insert data to azure SQL db table.
Via Python Script
This would need Python installation on on-prem machine. Once installed you can write script to transfer data between on-prem SQL server and Azure SQL db. You can schedule this script again by using an agent server job.

Connect to Azure Synapse Spark Pool from outside

Is there a possibility to connect to the Azure Synapse Spark pool from "outside". Meaning with Rstudio or Sparksql?
Unfortunately, R support isn't available for Synapse yet. So connecting it to Rstudio isn't useful.
Spark pools in Azure Synapse include the following components that are
available on the pools by default. Spark Core. Includes Spark Core,
Spark SQL, GraphX, and MLlib.
As per the above statement from this official document, Spark SQL is already by default available in Azure Synapse. There is no such need to connect to outside.
Apart from all this, you can consider #wBob's inputs shared in comment section based on your requirement.

Why does external I.P. need access to on-prem sql database when moving data with ADF to Azure SQL?

Why does external I.P. need access to on-prem sql database when copying data with ADF to Azure SQL?
It looks like on-prem sql makes a direct connection to Azure SQL (bypassing ADF). Is this by design or do I follow the wrong workflow?
Data Factory use the integration runtime to help us create the connection to the Source/Sink dataset. Azure integration runtime for cloud dataset and Self-host integration runtime for on-premise source/sink dataset.
The integration runtime (IR) is the compute infrastructure that Azure
Data Factory uses to provide data-integration capabilities across
different network environments. For details about IR, see Integration
runtime overview.
A self-hosted integration runtime can run copy activities between a
cloud data store and a data store in a private network. It also can
dispatch transform activities against compute resources in an
on-premises network or an Azure virtual network. The installation of
a self-hosted integration runtime needs an on-premises machine or a
virtual machine inside a private network.
Azure integration runtime is provides by ADF in default. The self-host integration runtime must be created manually.
That means Data Factory can not access the on-prem SQL database directly. It need the self-host integration runtime to help us connect to the on-prem SQL database.
It means that the on-prem sql does not make a direct connection to Azure SQL(bypassing ADF. That why external I.P. need access to on-prem sql database when copying data with ADF to Azure SQL.
HTH.

SQL server to Azure process workflow migration

We are supporting a legacy system for our organisation. In the current scenario, we receive a SQL Server backup (.bak files) from the application vendor on an FTP location. For every weekend on Sunday it is a Full backup and for every other day its the differential one.
On our side, we have a SQL server instance running which has custom stored procedures written and scheduled to check the location every morning and then restore the backups every day. These restored backups are then used by the organisation for internal reporting purposes. There are 100s of other stored procedures written for different reports in different DBs on the same instance.
Since SQL Server 2008 is now out of support and for cost-saving purposes of running on-premise system, my team has been given a task to look into migrating this whole system to Azure SQL database.
My question is what is the most effective way in which we can move this workflow to the cloud? I have an azure trial account set up for me to try but haven't been successful in restoring the .bak files on Azure SQL instance.
Thanks.
You essentially have two options for Azure, either perform a fairly linear Lift and Shift to SQL Server on an Azure VM or go with a more advanced Azure PaaS offering in Azure SQL Database Managed Instance. The specific deployment Azure SQL Database (Single Instance) will not support your current solution requires with regard to the .bak file support, and I have detailed that below. For further details between the difference between Azure SQL Database Single Instance versus Managed Instance, please see: Features comparison: Azure SQL Database and Azure SQL Managed Instance
The second option, is to leverage the Azure Enterprise Ready Analytics Architecture (AERAA) (link) of Azure (PaaS) Analytics services. With Azure SQL Database (PaaS) services, as opposed to on-premise SQL Server or SQL Server on an Azure VM, there is no Integration Runtime or Analysis Services as a bundled service component. These services are separate PaaS offerings and with the help of the linked AERAA blog, you can gain a better understanding of the Azure Analytics services.
The .bak versus .bacpac support dilemma:
Since the main requirement for your solution is support of .bak files, you need to understand where .bak and where .bacpac files are supported. The term Azure SQL Database applies to both a specific deployment option for an Azure SQL database (PaaS) service and as a general term for Azure SQL cloud databases. As for the specific deployment option, Azure SQL Database (Single Instance nor Elastic Pools) will support your scenario with .bak files. This deployment option will support export/import functionality via .bacpac file format. It will not support full/partial restore functionality. The backup/restore functionality although configurable, is only in scope for the specific database hosted by an Azure SQL (logical) Server instance. Basically, you can not restore an external file. You can import, which is always a full copy. So, for that reason, for an Azure PaaS database service you will need Azure SQL Database Managed Instance for .bak file support or deploy an SQL Server VM image to an Azure VM, and migrate your objects via Azure Database Migration Service.
Regards,
Mike