Connect to Azure Synapse Spark Pool from outside - apache-spark-sql

Is there a possibility to connect to the Azure Synapse Spark pool from "outside". Meaning with Rstudio or Sparksql?

Unfortunately, R support isn't available for Synapse yet. So connecting it to Rstudio isn't useful.
Spark pools in Azure Synapse include the following components that are
available on the pools by default. Spark Core. Includes Spark Core,
Spark SQL, GraphX, and MLlib.
As per the above statement from this official document, Spark SQL is already by default available in Azure Synapse. There is no such need to connect to outside.
Apart from all this, you can consider #wBob's inputs shared in comment section based on your requirement.

Related

Azure Synapse Dedicated SQL Pool Connector for Apache Spark

0
I am trying to write data into Azure Synapse dedicated pool using python from spark pool.
https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export?tabs=scala%2Cscala1%2Cscala2%2Cscala3%2Cscala4%2Cscala5
It seems like scala has option to have call back and receive error message details. I don't see that in python as option. Any alternatives?
In JDBC we have option to use "schemaCheckEnabled = false" to avoid checking for nullability in target table. How can we enable the same thing in Synapse pool connector?

Can we use spark pool to process data from dedicated SQL pool and is that a good architecture?

Is it a good design to use spark pool for processing data which comes in dedicated SQL pool and again write back to dedicated SQL pool and to adls.
As of now everything we r doing with dedicated SQL pool so if we add spark pool so will it be more efficient or it will just be burden to existing dedicated SQL pool.
Yes, you can use spark pool to process data from dedicated SQL pool and is that a good architecture as there it is recommended and directly support by Microsoft Officials.
The Synapse Dedicated SQL Pool Connector is an API that efficiently
moves data between Apache Spark runtime and Dedicated SQL pool in
Azure Synapse Analytics. This connector is available in Scala.
If your project required large scale streaming you can definitely go for Apache Spark. There won't be any burden on existing architecture. You will get expected results.
Refer: Azure Synapse Dedicated SQL Pool connector for Apache Spark

Select from MySQL AWS into Azure SQL

I have a MySQL DB on AWS.
I want to run a few simple SQL statements that select data from MySQL and insert to Azure DB.
Something like
select *
into Azure_Table
from
MySQL_Table
I also want to schedule this on a daily basis.
How can I do it directly from Azure SQL without having to use Data Factory / SSIS
Thank you
You can use Data Ingestion in ADF.
You can select the source and sink. Then schedule as per your need.
Note: Since you have the Source as MySQL on AWS i.e. outside of Azure Cloud, you would have to setup Self-hosted integration runtime for the linked service at source. Follow official MS doc for Setting up a self-hosted integration runtime using UI.
You can Migrate Amazon RDS for MySQL to Azure Database for MySQL using MySQL Workbench.
You can refer to below official documentation where you can get step by step explanation:
Migrate Amazon for MySQL to Azure Database for MySQL using MySQL Workbench.
Workaround – There is no direct way to query third-party database from Azure. But, you can migrate it to Azure and then perform operations.

Can't Access Azure Synapse Spark Pool Databases on SSMS

Since I've starting using Azure Synapse Analytics, I created a Spark Pool Cluster, then on the Spark Pool cluster I created databases and tables using Pyspark on top of parquet files in Azure Data Lake Store Gen2.
I use to be able to access my spark Database/ parquet tables through SSMS using the Serverless SQL endpoint but now I can no longer see my spark Databases through the Severless SQL Endpoint in SSMS. My spark databases are still accessible through Azure Data Studio but not through SSMS. Nothing has been deployed or alter on my side. Can you help resolve the issue? I would like to be able to access my spark databases through SSMS.
Sql Serverless Endpoint
Azure Synapse Database
If your Spark DB is built on top of Parquet files, as you said, databases should sync to external tables in Serverless SQL pool just fine and you should be able to see synced SQL external tables in SSMS as well. Check this link for more info about metadata synchronization.
If everything mentioned above is checked, then I'd suggest you to navigate to Help + Support in Azure Portal and fill in a support ticket request with details of your problem so engineering team can take a look and see whether there is some issue with your workspace or not.

Azure Synapse AI using dedicated SQL server VM

Is it possible to add dedicated SQL server VM to azure synapse sql pool for machine learning purposes?
No. You can only create a new dedicated sql pool.
Moreover, the official guide is to use AML service for ML related experiments - both training and inference. Here is a guide: https://learn.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-sql-pool-model-scoring-wizard
If you wanna use your dedicated SQL Server VM, it can be used as Integration->Integration runtimes->Self-hosted. But that is for data flows/pipelines so totally different objective (ref: Azure Data Factory).