Azure synapse spark pool creating delta table at Azure storage account - azure-synapse

Are there any example on the web to create a delta table using SQL API
to create a delta table at Synapse spark pool
Create table default.testing (col1 ...) using delta location 'abfss:....'
I see authentication error during the creation .
My Question is
When using SQL API , to create delta table , what credential is actually using?
Can use linked service to create the delta table ?
How can permanently mount a storage account container?

Related

Add a column to a delta table in Azure Synapse

I have a delta table that I created in Azure Synapse using a mapping data flow. The data flow reads append-only changes from Dataverse, finds the latest value, and upserts them to the table.
Now, I'd like to add a column to the delta table. When you select Upsert in a mapping dataflow, the Merge Schema is disabled, so it doesn't appear I can use that.
I tried creating a notebook and executing the following SQL, but I get an error.
ALTER TABLE delta.`https://xxxx.dfs.core.windows.net/path/to/table` ADD COLUMNS (mytest STRING)
Error: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: null path
The path provided is not in the default Synapse container.
How can I alter the table and add the column?
The issue I was running into was that the default Synapse container wasn't present in the storage account. After creating the container, the command executed successfully.
Open Azure Portal
Navigate to Synapse Workspace
Click Properties
Value of Primary ADLS Gen2 file system
This helped me track down the issue. https://learn.microsoft.com/en-us/answers/questions/706694/unable-to-run-sql-queries-in-azure-synapse-error-o.html

How to create a blank "Delta" Lake table schema in Azure Data Lake Gen2 using Azure Synapse Serverless SQL Pool?

I have a file with data integrated from 2 different sources using Azure Mapping Data Flow and loaded into an ADLS2 datalake container/folder i.e. for example :- /staging/EDW/Current/products.parquet file.
I now need to process this file in staging using Azure Mapping Data Flow and load into it's corresponding dimension table using SCD type2 method to maintain history.
However, I want to try creating & process this dimension table as "Delta" table in Azure Data Lake using Azure Mapping Data Flow only. However, since SCD type 2 requires a source lookup to check if there are any existing records/rows and if not insert all or if changed records do updates etc etc. (let's say during first time load).
For that, I need to first create a default/blank "Delta" table in Azure data lake folder i.e. for example :- /curated/Delta/Dimension/Products/. Just like we would have done if it were in Azure SQL DW (Dedicated Pool) in which we could have first created a blank dbo.dim_products table with just the schema/structure and no rows.
I am trying to implement a DataLake-House architecture implementation by utilizing & evaluating the best features of both Delta Lake and Azure Synapse Serverless SQL pool using Azure Mapping data flow - for performance, cost savings, ease of development (low code) & understanding. However, at the same time want to avoid a Logical Datawarehouse (LDW) kind of architecture implementation at this time.
For this, tried creating a new database under built-in Azure Synapse Serverless SQL pool, defined data source, format and a blank delta table/schema structure (without any rows); but no luck.
create database delta_dwh;
create external data source deltalakestorage
with ( location = 'https://aaaaaaaa.dfs.core.windows.net/curated/Delta/' );
create external file format deltalakeformat
with (format_type = delta);
drop external table products;
create external table dbo.products
(
product_skey int,
product_id int,
product_name nvarchar(max),
product_category nvarchar(max),
product_price decimal (38,18),
valid_from date,
valid_to date,
is_active char(1)
)
with
(
location='https://aaaaaaaa.dfs.core.windows.net/curated/Delta/Dimensions/Products',
data_source = deltalakestorage,
file_format = deltalakeformat
);
However, this fails since a Delta table/file requires _delta_log/*.json folder/file to be present which maintains transaction log. That means, I have to first write few (dummy) rows as in Delta format to the said target folder and then only I can read it and perform following queries used in for SCD type 2 implementation:
select isnull(max(product_skey), 0)
FROM OPENROWSET(
BULK 'https://aaaaaaaa.dfs.core.windows.net/curated/Delta/Dimensions/Products/*.parquet',
FORMAT = 'DELTA') as rows
Any thoughts, inputs, suggestions ??
Thanks!
You may try to create initial /dummy data_flow + pipiline to create this empty delta files.
It's only simple workaround.
Create CSV with your sample table data.
Create dataflow with name =initDelta
Use this CSV as source in data flow
In projection panel set up correct data types.
Add filtering after source and setup dummy filter 1=2 etc.
Add sink with delta output.
Put your initDelta dataflow into dummy pipeline and run it.
Folder structure for delta should created.
You mentioned the your initial data is in parque file. You can use this file. Schema of table(columns and data types) will be imported from file. Filter out all rows and save result as delta.
I think it should work or I missed something in your problem
I don't think you can use Serverless SQL pool to create a delta table........yet. I think it is coming soon though.

Azure SQL External table of Azure Table storage data

Is it possible to create an external table in Azure SQL of the data residing in Azure Table storage?
Answer is no.
I am currently facing similiar issue and this is my research so far:
Azure SQL Database doesn't allow Azure Table Storage as a external data source.
Sources:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
Reason:
The possible data source scenarios are to copy from Hadoop (DataLake/Hive,..), Blob (Text files,csv) or RDBMS (another sql server). The Azure Table Storage is not listed.
The possible external data formats are only variations of text files/hadoop: Delimited Text, Hive RCFile, Hive ORC,Parquet.
Note - even copying from blob in JSON format requires implementing custom data format.
Workaround:
Create a copy pipeline with Azure Data Factory.
Create a copy
function/script with Azure Functions using C# and manually transfer
the data
Yes, there are a couple options. Please see the following:
CREATE EXTERNAL TABLE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external table for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external table created for PolyBase cannot be used for Elastic Database queries. Similarly, an external table created for Elastic Database queries cannot be used for PolyBase, etc.
CREATE EXTERNAL DATA SOURCE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external data source for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external data source created for PolyBase cannot be used for Elastic Database queries. Similarly, an external data source created for Elastic Database queries cannot be used for PolyBase, etc.
What is your use case?

Can't CREATE EXTERNAL DATA SOURCE in SQL

I'm trying to create an external data source to access Azure Blob Storage. However, I'm having issues with creating the actual data source.
I've followed the instructions located here:
Examples of bulk access to data in azure blob storage and
Create external data source - transact sql. I'm using SQL Server 2016 on a VM accessing via SSMS on a client machine using Windows Authentication with no issues. Instructions say creating this external data source works for SQL Server 2016 and Azure Blob Storage.
I have created the Master Key:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = <password>
and, the database scoped credential
CREATE DATABASE SCOPED CREDENTIAL UploadCountries
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = <key>;
I have verified both of these exist in the database by querying sys.symmetric_keys and sys.database_scoped_credentials.
However, when I try executing the following code it says 'Incorrect syntax near 'EXTERNAL'
CREATE EXTERNAL DATA SOURCE BlobCountries
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://<somewhere>.table.core.windows.net/<somewhere>',
CREDENTIAL = UploadCountries
);
Your thoughts and help are appreciated!
Steve.
In “Examples of Bulk Access to Data in Azure Blob Storage”, we can find:
Bulk access to Azure blob storage from SQL Server, requires at least SQL Server 2017 CTP 1.1.
And in Arguments section of “CREATE EXTERNAL DATA SOURCE (Transact-SQL)”, we can find similar information:
Use BLOB_STORAGE when performing bulk operations using BULK INSERT or OPENROWSET with SQL Server 2017
You are using SQL Server 2016, so you get Incorrect syntax near 'EXTERNAL' error when you create external data source for Azure Blob storage.

External table in Blob Storage in Azure SQL(Not Azure SQL DW)

Here is my script which I am trying to run in Azure SQL Database:
CREATE DATABASE SCOPED CREDENTIAL some_cred WITH IDENTITY = user1,
SECRET = '<Key of Blob Storage container>';
CREATE EXTERNAL DATA SOURCE TEST
WITH
(
TYPE=BLOB_STORAGE,
LOCATION='wasbs://<containername>#accountname.blob.core.windows.net',
CREDENTIAL= <somecred>`enter code here`
);
CREATE EXTERNAL TABLE dbo.test
(
val VARCHAR(255)
)
WITH
(
DATA_SOURCE = TEST
)
I am getting the following error:
External tables are not supported with the provided data source type.
My goal is to create external table in blob storage so that Hive query in HDInsight references to the same blob. The table needs to be managed through Azure SQL. What's wrong with this script?
Azure SQL Database does have the feature to load files stored in Blob Storage but it only via the BULK INSERT and OPENROWSET language features. See here for more information.
BULK INSERT dbo.test
FROM 'data/yourFile.txt'
WITH ( DATA_SOURCE = 'YourAzureBlobStorageAccount');
The way you have scripted it is more like an external table using Polybase which is only available in SQL Server 2016 and Azure SQL Data Warehouse at this time.
I'm thinking External tables can be used for Cross Database Querying (Elastic queries). So it couldn't able to use the External Data Source which is BLOB_STORAGE