Synapse Lake database view not available in SQL Pool? - apache-spark-sql

Currently exploring using Spark notebooks in Synapse for data transformation instead of data flows but the lake db capabilities are a little confusing.
I created a lake db, an external table (catalog?) and a view using a notebook in Synapse Workspace. The view is visible in the Synapse UI and I can query it.
But the view is not available when connecting via the SQL pool using management or data studio for example. Are only table meta data shared, or am I missing something? Having trouble finding documentation regarding this.

But the view is not available when connecting via the SQL pool using management or data studio for example. Is this intended, or am I missing something?
The Serverless SQL Pool and the Spark Pool share a catalog, but the Dedicated SQL Pool has its own.
Spark views are session (temp views) or app (global views) scoped and do not belong in the catalog. That is the reason you don't see views.

Related

Lake Database converts into a SQL Database in Azure Synapse Analytics

Forgive me I am newbie here and I cannot post images just yet.
Lately I am having few issues with Lake Database that was created in Azure Synapse Analytics using Azure Synapse Link for Dataverse database in PowerApps.
Dynamics365 developers were adding new columns to Dataverse database and they are not displaying or working when executing queries in SSMS or Synapse Studio.
Therefore I have unlinked the Synapse Link in PowerApps and relinked with some tables.
When I unlink, the container and Lake database were deleting correctly but the same database appears in SQL databases section in Azure Synapse studio. I tried to delete it but I am getting an error "Operation DROP DATABASE is not allowed for a replicated database".
Before unlink
After unlink
I have created the Lake database again using Synapse link from PowerApps but it seems the tables meta data is not updating.
Can anyone help me with the above issues (in bold) please.
an error "Operation DROP DATABASE is not allowed for a replicated database".
This error is returned when you try to create objects in a database that's shared with Spark pool. The databases that are replicated from Apache Spark pools are read only. You can't create new objects into a replicated database by using T-SQL.
Create a separate database and reference the synchronized tables by using three-part names and cross-database queries.
Dynamics365 developers were adding new columns to Dataverse database
and they are not displaying or working when executing queries in SSMS
or Synapse Studio.
This issue could be because of many reasons. Without knowing the actual reason, we can't troubleshoot this issue.
If we refer Microsoft official document on Known limitations and issues with Azure Synapse Link for SQL if your column data type isn't supported by the Azure Synapse Analytics, it won't be replicated in Azure Synapse Analytics.

Azure Synapse Lake Database Not Appearing in Built Serverless Pool List

I have created a new Azure Lake Database using the following procedure
The Lake Database name is called TestLakeDB.
However, when I check the list of databases available in Use database TestLakeDB doesn't appear.
Any thoughts?
Thanks for the valuable discussion. Posting your conversation as answer to help other community members who faces similar issues.
When we create Lake database after connecting to the github, it won't reflect in the Use Database because it is created in github mode.
To reflect the the Lake Database, create the database in the synapse live mode and connect to the github. Now we can see it reflects our database named Lake_Database1 which is created in synapse live mode in the Use Database.

Where is data physically stored in Azure Synapse Dedicated SQL Pool?

Documentation from Microsoft and others strongly emphasizes the separation between storage and compute in Azure Synapse Analytics.
In the case of a Serverless SQL pool, it is clearly explained that the data is stored in an Azure Data Lake DSL Gen2.
However, in the case of a Dedicated SQL Pool, the documentation is not explicit enough on data storage.
In a book that deals with Azure Synapse, it is stated that in the case of Dedicated SQL Pool, data is stored in Storage Nodes which are completely separate from Compute Nodes.
Since this claim is not in Microsoft's documentation, I dare not trust it.
So, is there an official resource that sheds light on this question?
This is a question that has been on my mind for a long time as well. However, I have come to the conclusion that data is actually stored in Dedicated SQL Pools.
Let me explain why I believe this.
Take a look at the documentation given here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/quickstart-copy-activity-load-sql-pool
Notice that it is about loading data into a Dedicated SQL Pool. Further, to quote part of the documentation,
A dedicated SQL pool offers T-SQL based compute and storage
capabilities. After creating a dedicated SQL pool in your Synapse
workspace, data can be loaded, modeled, processed, and delivered for
faster analytic insight.
It is said that Dedicated SQL Pools provide both compute and storage capabilities.
Furthermore, with Dedicated SQL Pools, you may already know that it is possible to create traditional tables. We can organize these tables into something along the lines of a star or snowflake schema to model our data warehouses.
Creation of such tables, however, is not possible with Serverless SQL Pools. Only the creation of metadata objects, i.e. views or external tables are allowed. This is explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/on-demand-workspace-overview
To quote the relevant passage of the article,
Serverless SQL pool has no local storage, only metadata objects are
stored in databases. Therefore, T-SQL related to the following
concepts isn't supported:
Tables Triggers Materialized views DDL statements other than ones
related to views and security DML statements
To me, the fact that tables can actually be created in Dedicated SQL Pools is further proof that the data is physically stored in them.
My final argument is around the idea of distributions. The concept is explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture
This talks about how data is divided up among the compute nodes and how queries are executed in parallel on the distributions in these nodes. It would not be possible to implement this if the data was not actually stored in these nodes.
In my humble opinion, how I believe Azure Storage comes into the picture (at least, when it comes to Dedicated SQL Pools) is with regards to storing data as files in a data lake and then ingesting them into the pool for analysis.
An explanation can be found here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/overview-architecture
Yet another quote,
Serverless SQL pool allows you to query your data lake files, while
dedicated SQL pool allows you to query and ingest data from your data
lake files. When data is ingested into dedicated SQL pool, the data is
sharded into distributions to optimize the performance of the system.
This is where Polybase comes into play. You can define various data loading patterns (into Dedicated SQL Pools) using Polybase as explained here,
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview
The Microsoft documentation on Design tables using dedicated SQL pool in Azure Synapse Analytics, found at https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview, states the following:
Table persistence: Tables store data either permanently in Azure
Storage, temporarily in Azure Storage, or in a data store external to
dedicated SQL pool.
Regular table A regular table stores data in Azure Storage as part of
dedicated SQL pool...

azure synapse - serverless sql pool databases not visible in data tab

I cannot see the serverless sql pool datababases or tables on the data tab in synapse.
Dedicated sql pool & spark pool are visible.
Is this by design or am i missing something?
Yes, it's by design.
Under Data Tab you can browse Dedicated SQL pools and Spark pools.

Is it possible to export data from MS Azure SQL directly Into the Azure Table Storage?

Is there any direct way within the Azure MSSQL ecosystem to export SQL returned data set into the Azure table storage?
Something like BCP but with the Table Storage connection string on the -Output end?
There is a service named Azure Data Factory which can directly copy data from Azure SQL Database to Azure Table Storage, even between other supported data stores, please see the section Supported data stores of the article "Data movement and the Copy Activity: migrating data to the cloud and between cloud stores" to know, but it is for Web, not like BCP command tool.
You can refer to the tutorial Build your first Azure data factory using Azure Portal/Data Factory Editor to know how to use it.
And as references, you can refer to the articles Move data to and from Azure SQL Database using Azure Data Factory & Move data to and from Azure Table using Azure Data Factory to know how it works.