Azure Synapse Dedicated SQL Pool Connector for Apache Spark - azure-synapse

0
I am trying to write data into Azure Synapse dedicated pool using python from spark pool.
https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export?tabs=scala%2Cscala1%2Cscala2%2Cscala3%2Cscala4%2Cscala5
It seems like scala has option to have call back and receive error message details. I don't see that in python as option. Any alternatives?
In JDBC we have option to use "schemaCheckEnabled = false" to avoid checking for nullability in target table. How can we enable the same thing in Synapse pool connector?

Related

Connecting Synapse Workspace to existing Dedicated SQL Pool

We have a development environment Synapse Workspace connected to a development environment Dedicated SQL Pool. We want another development environment Synapse Workspace but we want to connect it to the existing development environment dedicated SQL Pool. So basically, both Synapse workspaces should connect to the same dedicated SQL Pool.
Apparently, I was told this is not possible. Is there any way or a workaround? We want users to be able to query the data using the Synapse workspace in the dedicated SQL pool.
That is not possible. What you can do is to create another pool in existing workspace to have two development environments. If this is about letting users querying the data your best bet would be copying the data from one pool to another by Built-in copy task in Synapse:
Built-in copy task in Synapse
You can also specify scheduled copy as well.

Can we use spark pool to process data from dedicated SQL pool and is that a good architecture?

Is it a good design to use spark pool for processing data which comes in dedicated SQL pool and again write back to dedicated SQL pool and to adls.
As of now everything we r doing with dedicated SQL pool so if we add spark pool so will it be more efficient or it will just be burden to existing dedicated SQL pool.
Yes, you can use spark pool to process data from dedicated SQL pool and is that a good architecture as there it is recommended and directly support by Microsoft Officials.
The Synapse Dedicated SQL Pool Connector is an API that efficiently
moves data between Apache Spark runtime and Dedicated SQL pool in
Azure Synapse Analytics. This connector is available in Scala.
If your project required large scale streaming you can definitely go for Apache Spark. There won't be any burden on existing architecture. You will get expected results.
Refer: Azure Synapse Dedicated SQL Pool connector for Apache Spark

Connect to Azure Synapse Spark Pool from outside

Is there a possibility to connect to the Azure Synapse Spark pool from "outside". Meaning with Rstudio or Sparksql?
Unfortunately, R support isn't available for Synapse yet. So connecting it to Rstudio isn't useful.
Spark pools in Azure Synapse include the following components that are
available on the pools by default. Spark Core. Includes Spark Core,
Spark SQL, GraphX, and MLlib.
As per the above statement from this official document, Spark SQL is already by default available in Azure Synapse. There is no such need to connect to outside.
Apart from all this, you can consider #wBob's inputs shared in comment section based on your requirement.

Can't Access Azure Synapse Spark Pool Databases on SSMS

Since I've starting using Azure Synapse Analytics, I created a Spark Pool Cluster, then on the Spark Pool cluster I created databases and tables using Pyspark on top of parquet files in Azure Data Lake Store Gen2.
I use to be able to access my spark Database/ parquet tables through SSMS using the Serverless SQL endpoint but now I can no longer see my spark Databases through the Severless SQL Endpoint in SSMS. My spark databases are still accessible through Azure Data Studio but not through SSMS. Nothing has been deployed or alter on my side. Can you help resolve the issue? I would like to be able to access my spark databases through SSMS.
Sql Serverless Endpoint
Azure Synapse Database
If your Spark DB is built on top of Parquet files, as you said, databases should sync to external tables in Serverless SQL pool just fine and you should be able to see synced SQL external tables in SSMS as well. Check this link for more info about metadata synchronization.
If everything mentioned above is checked, then I'd suggest you to navigate to Help + Support in Azure Portal and fill in a support ticket request with details of your problem so engineering team can take a look and see whether there is some issue with your workspace or not.

Azure Synapse AI using dedicated SQL server VM

Is it possible to add dedicated SQL server VM to azure synapse sql pool for machine learning purposes?
No. You can only create a new dedicated sql pool.
Moreover, the official guide is to use AML service for ML related experiments - both training and inference. Here is a guide: https://learn.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-sql-pool-model-scoring-wizard
If you wanna use your dedicated SQL Server VM, it can be used as Integration->Integration runtimes->Self-hosted. But that is for data flows/pipelines so totally different objective (ref: Azure Data Factory).