I use Azure datalake gen 2, I transform data with databricks and I have delta tables which are sent in Power BI. But the clients have to be allowed to request in sql my tables.
What is the best practice ? Is it possible with databricks or have I to use something else ?
Thank you in advance for helping me !!
With premium workspace, you could let Users Credentials pass through to storage account form within Azure Databricks.
Go to Compute --> Cluster --> Advanced options, you'll see a check box Enable credential passthrough for user-level data access (refer screenshot)
Related
I am importing data from SQL DW to Power Bi using SQL server authentication credentials.
I read in this Microsoft Doc that VNets can be used as Data gateways for various Power BI Data sources. Can this be applied here? Transfer of data from Synapse SQL DW to Power BI service will always happen through public internet or can it happen through VNets also?
I am new with these services, so my question could be silly!
Yes you can connect through public internet as well as from private vnet(data gateway).
Virtual network data gateways allows import or direct query datasets to connect to data services within an Azure VNet without the need of an on-premises data gateway.
as per the doc you are following VNet data gateways will support connectivity to the following Azure data services:
1.Azure SQL
2.Azure Synapse Analytics
3.Azure Data Explorer (Kusto)
4.Azure Table Storage
5.Azure Blob Storage
6.Azure HDInsight (Spark)
7.Azure Data Lake (Gen2)
8.Cosmos DB
Note:The The virtual network (VNet) data gateway is still in preview. and Virtual network data gateways is a premium-only feature, and will be available only in Power BI Premium workspaces and Premium Per User (PPU) for public preview. However, licensing requirements might change when VNet data gateways become generally available.
Reference
Create virtual network data gateways
There is an option to connect a Cloud mySQL instance from BigQuery. I just wanted to know how we can connect a Cloud SQL Server instance to BigQuery.
SQL Server:
There are a bunch of third-party extensions/tools that provide this service. One of them is SSIS Data Flow Source & Destination for Google BigQuery, which is Visual Studio extension that connects SQL Server with Google BigQuery data through SSIS Workflows.:
https://www.cdata.com/drivers/bigquery/ssis/
https://marketplace.visualstudio.com/items?itemName=CDATASOFTWARE.SSISDataFlowSourceDestinationforGoogleBigQuery
In regards to using SQL Server Integration Services to load the data from the on-premises SQL Server to BigQuery, you can take a look for this site. You can also perform ETL from a relational database into BigQuery using Cloud Dataflow, the official documentation details how it can be done, you might need to use Cloud Storage as an intermediate data sink.
Cloud SQL:
BigQuery allows to query data from Cloud SQL by using federated query. The connection must be created within the same project where your Cloud SQL instance is located. If you want to query your data stored in your Cloud SQL instance from BigQuery located in another project, please follow the steps listed below:
Enable the BigQuery API and the BigQuery connection API within your project.
Create a connection to your Cloud SQL instance within the project by following this documentation.
Once you have created the connection, please locate and select it within BigQuery.
Click on the SHARE CONNECTION button and grant permissions to the users that will be use that connection. Please note that the BigQuery Connection User role is the only needed to use a shared connection.
Additionally, please notice that the "Cloud SQL federated queries" feature is in a Beta stage and might change or have limited support (is no available for certain regions, in which case, it is required to use one the supported options mentioned here). Please remember, that to use Cloud SQL Federated queries in BigQuery, the intances need to have a public IP.
If you are limited e.g. by region, one good option might be exporting the data from CloudSQL to Storage as a CSV, and then load it into BigQuery. If you need, it is possible to automate this process using Cloud Composer, refer to this article.
Other approach is to extract information from Cloud SQL (with exports) and import it into BigQuery through load jobs, or streaming inserts.
I hope you find the above pieces of information useful.
It is possible, but be warned the feature is currently Beta
https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries
Has anyone used SSMS v18.2 or Azure Data Studio to connect to a DataBricks Cluster and so query on DataBricks tables and/or the DataBricks File System (dbfs)?
Would like to know how you can set this up to show a DataBricks server in connections and use PolyBase to connect to dbfs
I can connect to ADLS using the PolyBase commands like as follows:
-- Scoped Credential
CREATE DATABASE SCOPED CREDENTIAL myScopedCredential
WITH
IDENTITY = '<MyId>#https://login.microsoftonline.com/<Id2>/oauth2/token',
SECRET = '<MySecret>';
-- External Data Source
CREATE EXTERNAL DATA SOURCE myDataSource
WITH
(
TYPE = HADOOP,
LOCATION = 'adl://mydatalakeserver.azuredatalakestore.net',
CREDENTIAL = myScopedCredential
)
-- Something similar to setup for dbfs?
-- What IDENTITY used for Scoped Credential?
As per my knowledge, Azure Databrick cannot be connect to SQL Server 2019 using SSMS or Azure Data Studio.
The following list provides the data sources in Azure that you can use with Azure Databricks. For a complete list of data sources that can be used with Azure Databricks, see Data sources for Azure Databricks.
The Spark connector for Microsoft SQL Server and Azure SQL Database enables Microsoft SQL Server and Azure SQL Database to act as input data sources and output data sinks for Spark jobs. It allows you to use real- time transactional data in big data analytics and persist results for ad-hoc queries or reporting.
For more details, refer "Connecting to Microsoft SQL Server and Azure SQL database with Spark connector".
Hope this helps.
This doesn't seem possible without the use of 3rd party tools or custom applications. Databricks SQL just doesn't expose the protocols necessary.
There are 3rd party tools (e.g. from CData) that can help you here. See this article: https://www.cdata.com/kb/tech/databricks-odbc-linked-server.rst
I'd like to allow a Power BI report to access a single azure SQL database in such a way that it could allow for cleaner deployment/replication across multiple products. As of now, I manually provide the reports with a read only SQL login, but having to do this each time a new report is created would be sub-optimal.
Is there any way to integrate Power BI with Azure's MSI, or anything of the sort to allow for smoother deployment?
You can connect to the Azure SQL database through the PowerBI online service and then publish this as a 'content pack'.
Then you can use the PowerBI Service Connector to access the dataset without creds.
I want to copy data from sharepoint to microsoft azure sql DW using azure datafactory or alternative service. Can I do this. Please anyone help me with this.
You can do this by setting up a data pipeline using Azure Data Factory to Azure blob storage. Afterwards you can use Azure's fast PolyBase technology to load the data from blob to your SQL Data Warehouse instance.
Can I ask how much data you intend on loading into the DW? Azure Data Warehouse is intended for use with at least terabyte level data up to petabyte compute and storage. I only ask because each SharePoint list or Excel file has a maximum of 2GB per file.