Azure Blob Cannot Bulk Load - sql

CREATE PROCEDURE LoadData
AS
BEGIN
DELETE FROM [dbo].[File];
BULK INSERT [dbo].[File]
FROM 'File.csv'
WITH (
DATA_SOURCE = 'AzureBlob',
FORMAT = 'CSV',
FIRSTROW = 2
);
END
---------------------
CREATE EXTERNAL DATA SOURCE AzureBlob
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://marczakiocsvstorage.blob.core.windows.net/input',
CREDENTIAL = BlobCredential
);
-----------------------------
CREATE DATABASE SCOPED CREDENTIAL BlobCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=SAS_TOKEN_HERE';
Following this guide (https://marczak.io/posts/azure-loading-csv-to-sql/), I am attempting to load data from an Azure Blob into an Azure SQL table.
After creating the external data source and running the stored procedure I am getting the following error:
"Cannot bulk load because the file "File.csv" could not be opened. Operating system error code 5(Access is denied.)."
I made sure to double check my SAS Token and exclude the question mark when creating the credential. Also double checked the Contrainer URL. All seems okay. What could I be missing here preventing the blob from being read?

Please make the "input" container exist inside the BLOB account.
Please verify the Shared Access Signature start and expiry date and time, verify its time zone and verify "Allowed IP addresses" is blank.
Try using OPENROWSET instead.
SELECT * INTO TempFile FROM OPENROWSET( BULK 'input/File.csv', DATA_SOURCE = 'AzureBlob', SINGLE_CLOB)

It appears that when you are using SQL Authentication and Azure SQL is not allow to access the bulk load blob storage.
If you Azure Blob storage account is not public, you need to generate a shared access signatures(SAS) key for the account by using the Azure portal. Put the SAS key in CREDENTIAL, and create an EXTERNAL DATA SOURCE with CREDENTIAL, as shown in the following example:
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2015-12-11&ss=b&srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z&spr=https&sig=copyFromAzurePortal';
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://myazureblobstorage.blob.core.windows.net',
CREDENTIAL= MyAzureBlobStorageCredential);
For more details, you could refer to this article.

Related

Databricks online store - Login to Azure SQL Database with Service Principal

I want to use Databricks Online Store with Azure SQL Database, however I am unable to autenthicate through Databricks Feature Store API. I need to use Service Principal credentials.
I tried using Application ID as username and Secret as password ( com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user '[REDACTED]'. ClientConnectionId:some-id-x-x-), but no luck.
I also tried to generate AAD access Token and use it as a password, however I am getting password exceeds maximum length of 128characters...
When I use the same credentials to test it via JayDeBeApi everything works...
Code I am using:
from databricks.feature_store.online_store_spec import AzureSqlServerSpec
from databricks.feature_store import FeatureStoreClient
username = "application-id"
password = "application-secret"
tenantId = "tenant-id"
server_name = "server-name.database.windows.net"
port = "1433"
db_name = "database-name"
fs = FeatureStoreClient()
online_store = AzureSqlServerSpec(
hostname=server_name,
port=1433,
database_name=db_name,
user=username,
password=password,
table_name="TableName",
)
fs.publish_table(
name='feature_store.TableName',
online_store=online_store,
mode='merge'
)

Access SQL DB Managed Identity in Data Factory using Key Vault

I'm trying to connect to Azure SQL DB using AD Authentication (Managed Identity) in Data Factory by saving the connection string in Azure Key Vault. I've setup the Managed Identity access in Azure SQL DB by providing the access to ADF (ADF name). I've stored the connection string in Key Vault in following formats but I was not successful.
Tried following formats of connection strings:
Server=tcp:xxxxxxxxxx.database.windows.net;Initial Catalog=xxxxxxx;Authentication = 'Active Directory Interactive';
Server=tcp:xxxxxxxxxxxx.database.windows.net;Initial Catalog=xxxxxxxxxxx;User ID=DatafactoryName;Authentication = 'Active Directory Interactive'; -- Actual DatafactoryName
Server=tcp:xxxxxxxxxxxxxx.windows.net;Initial Catalog=xxxxxxxxx;User ID=MSI_ID;Authentication = 'Active Directory Interactive'; -- Actual MSI ID for the DataFactory
Server=tcp:xxxxxxxxxxxxxx.windows.net;Initial Catalog=xxxxxxxxx;User ID=a;Authentication = 'Active Directory Interactive'; -- Tried arbitrary value
I'm getting the following error
The connection string should be:
Data Source=tcp:<servername>.database.windows.net,1433;Initial Catalog=<databasename>;Connection Timeout=30
The connection should like this:
Ref: Managed identities for Azure resources authentication and Reference secret stored in key vault
You can try
Integrated Security=False;Encrypt=True;Connection Timeout=30;Data Source=xxxxxxxxxx.database.windows.net;Initial Catalog=xxxxxxx

Is there a way BULK INSERT from local Azure Blob Storage?

TL;DR I am trying to point SQL to BULK INSERT from Local Azure Blob Storage
The problem:
Hi all,
I'm trying to connect my local SQL Server database instance to the blob storage emulator as an external connection, however I'm getting a "Bad or inaccessible location specified" error. Here are the steps I'm taking:
I have created the following MasterDatabaseKey and CREDENTIALS as follows:
IF EXISTS (SELECT * FROM sys.symmetric_keys WHERE name = '##MS_DatabaseMasterKey##')
DROP MASTER KEY;
--Create Master Key
CREATE MASTER KEY
ENCRYPTION BY PASSWORD='MyStrongPassword';
and database credentials:
-- DROP DB Credentials If Exist
IF EXISTS (SELECT * FROM sys.database_credentials WHERE name = 'credentials')
DROP DATABASE SCORED CREDENTIAL credentials;
--Create scoped credentials to connect to Blob
CREATE DATABASE SCOPED CREDENTIAL credentials
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET =
'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=='; --local storage key
GO
then I created the following External Data Source:
CREATE EXTERNAL DATA SOURCE external_source
WITH
(
TYPE = BLOB_STORAGE,
LOCATION = 'http://127.0.0.1:10000/devstoreaccount1/container/some_folder/',
CREDENTIAL = credentials
)
But when I run the BULK INSERT command:
BULK INSERT [dbo].[StagingTable] FROM 'some_file_on_blob_storage.csv' WITH (DATA_SOURCE = 'external_source', FIRSTROW = 1, FIELDTERMINATOR = ',', ROWTERMINATOR = '\n')
but it fails and returns
Bad or inaccessible location specified in external data source "external_source".
How can I load a file from Local Blob Storage into SQL Server?
Nick.McDermaid has point out the error correctly. From your code and the error message, the error is caused by the wrong LOCATION syntax:
Do not add a trailing /, file name, or shared access signature
parameters at the end of the LOCATION URL when configuring an
external data source for bulk operations.
Ref here: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-ver15&tabs=dedicated#examples-bulk-operations
Change value to LOCATION = 'http://127.0.0.1:10000/devstoreaccount1/container/some_folder', the error should be solved. I tested and all works well.
For you another question, we can not answer you directly. I suggest you post another question with you detailed code. We're all glad to help you.
Update:
About your another question, I tested and found that we must set the Shared access signature(SAS) 'Allowed resource type' = Object, then we can access container and child folder and the files in the container.
Example, both the statements work well.
HTH.

Access Azure Blob Storage via Azure SQL Database through Managed Identity

I am trying to connect to Azure Blob storage via Azure SQK database through Managed Identity based on the below set of steps:
Assigned an Identity to the Server
Gave access to the Server on Blob storage as contributor
Executed the below queries
Create Master Key
CREATE DATABASE SCOPED CREDENTIAL MSI WITH IDENTITY = 'Managed Service Identity';
CREATE EXTERNAL DATA SOURCE [BlobStorage] WITH
(
TYPE = BLOB_STORAGE,
LOCATION = 'https://<<blobnm>>.blob.core.windows.net/<<containerNm>>',
CREDENTIAL = MSI
)
create table test
(
c1 varchar(5),
c2 varchar(4)
)
BULK INSERT test from 'poly.csv' WITH ( DATA_SOURCE = 'BlobStorage',FORMAT='csv',FIRSTROW = 2 );
But I am getting the below error :
Cannot bulk load because the file "msi/poly.csv" could not be opened. Operating system error code 86(The specified network password is not correct.)
So can anyone tell me what I am missing out ?
There are many reasons for this error. I have listed some reasons as follows:
Check whether the SAS key has expired? And please check the Allowed permissions.
Did you delete the question mark when you create the SECRET?
CREATE DATABASE SCOPED CREDENTIAL UploadInvoices
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2019-12-12******2FspTCY%3D'
I also tried the following test, it works well. My csv file has no headers.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '***';
go
CREATE DATABASE SCOPED CREDENTIAL UploadInvoices
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2019-12-12&ss=bfqt&srt=sco&sp******%2FspTCY%3D'; -- dl
CREATE EXTERNAL DATA SOURCE MyAzureInvoices
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://***.blob.core.windows.net/<container_name>',
CREDENTIAL = UploadInvoices
);
BULK INSERT production.customer
FROM 'bs140513_032310-demo.csv'
WITH
(
DATA_SOURCE = 'MyAzureInvoices',
FORMAT = 'CSV',
ERRORFILE = 'load_errors_TABLE_B',
ERRORFILE_DATA_SOURCE = 'MyAzureInvoices',
FIRSTROW = 2
)
GO
I think there is a mistake in your command that you use to create the CREDENTIALS in SQL. It has to be
CREATE CREDENTIAL ServiceIdentity WITH IDENTITY = 'Managed Identity';
and not 'Managed Service Identity'
Refer https://learn.microsoft.com/en-us/sql/t-sql/statements/create-credential-transact-sql?view=sql-server-ver15

Add custom S3 endpoint for Vertica backup

I am trying to backup the Vertica cluster to a S3 like data store (supports S3 protocol) internal to my enterprise network. We have similar credentials (ACCESS KEY and SECRET KEY).
Here's how my .ini file looks like
[S3]
s3_backup_path = s3://vertica_backups
s3_backup_file_system_path = []:/vertica/backups
s3_concurrency_backup = 10
s3_concurrency_restore = 10
[Transmission]
hardLinkLocal = True
[Database]
dbName = production
dbUser = dbadmin
dbPromptForPassword = False
[Misc]
snapshotName = fullbak1
restorePointLimit = 3
objectRestoreMode = createOrReplace
passwordFile = pwdfile
enableFreeSpaceCheck = True
Where can I supply my specific endpoint? For instance, my S3 store is available on a.b.c.d:80. I have tried changing s3_backup_path = a.b.c.d:80://wms_vertica_backups but I get the error Error: Error in VBR config: Invalid s3_backup_path. Also, I have the ACCESS KEY and SECRET KEY in ~/.aws/credentials.
After going through more resources I have exported the following ENV variables VBR_BACKUP_STORAGE_ENDPOINT_URL, VBR_BACKUP_STORAGE_ACCESS_KEY_ID, VBR_BACKUP_STORAGE_SECRET_ACCESS_KEY. vbr init throws the error Error: Unable to locate credentials Init FAILED. , I'm guessing it is still trying to connect to the AWS S3 servers. (Now removed credentials from ~/.aws/credentials
I think it's worthy to add that I'm running Vertica Enterprise mode 8.1.1.
For anyone looking for something similar, the question was answered in the Vertica forum here