Performance enhancement when using Direct Query to get data from SQL server in Power BI

Performance enhancement when using Direct Query to get data from SQL server in Power BI - ssas

I am using PBI Desktop to create PBIX files which I later upload to Azure Power BI Embedded (PaaS solution). The PBIX gets data from Azure SQL server in Direct Query mode. I want to increase the efficiency of queries that Power BI Embedded sends to SQL for getting my data.
My pbix contains relationships between many tables and RLS (Row Level Security) configured and is taking a lot of time to load. Please advice if the following options will help me increase the efficiency of queries, thus reducing the time taken by the pbix to load: -
Using Advanced Options in the Get Data dialog box : Inserting a SQL statement here will get only specific data instead of the entire table. This will reduce the data I see in PBI Desktop, but will it really increase the efficiency of queries sent to SQL for the creation of charts? Eg: Say PBIX needs to create a join between two tables. If I use the advanced options, will the Join be done on reduced data?
Using Filters to filter out unwanted rows of the table : Again like above option, this will reduce the data I see in PBI Desktop, but will it really increase the efficiency of queries sent to SQL for the creation of charts? Eg: If I use filters, will the Join be done on reduced data?
[EDIT - I have added one more option below]
Are the queries for charts on different pages of a PBIX file sent to SQL only when the page is loaded? : If this is true then I can separate my charts into different pages to reduce the number of queries sent at once to SQL.

Related

Power BI relationships behind the scenes with SQL Server

I am using import mode with SQL Server on-premises and bringing in all required tables and create relationships. Does the indexing the columns involved in building relationship in PBI have any benefit in terms of report or data refresh performance? What happens behind the scenes in PBI?

In the import mode, data is read from the background SQL Server database and loaded into cache.
So, the indexes in datasource will give very good read performance. So, data refresh times will be faster. Read more on indexing for performance in PowerBI
But, the indexes will not be used, when you are working with report in the Power BI, as data is brought from cache(dataset). Dataset is holding the refreshed data and is being used in the reports.
Refer to below data refresh reference for PowerBI
.

Power BI maxing connections to DB :( Can we populate multiple tables with single Sql.Database call?

I am assisting my team troubleshoot an issue with a Power BI report we are developing. We have a rather complex data model in the source SQL database, so we have created 5-6 views to better manage the data. We have a requirement to use DirectQuery, as one key requirement for the report is that the most up-to-date data in the database is visible, rather than having a delay in loading/caching the data. We also have the single data source, just the one database.
When we run the report, we see a spike of 200-500 connections to the database from the specific user for the report data source, and those connections don't close. This is clearly an issue and unsustainable for any product. We have a ticket open with Microsoft premium support to address the connections not closing, but in the meantime, I'm wondering if we're doing something wrong inside the report?
When I view the queries in the query editor, we basically have one query for each view, and it's a simple:
let
Source = Sql.Database(Server, Database)
query_view_name = Source{[Schema ......]}[Data]
in
query_view_name
(I don't have the raw code in front of me, but that's the gist of it.)
It seems to me, based on analytics in the database, that "Sql.Database" is opening a new connection every time this view is called. And with 5-6 views, that's 5-6 connections at a minimum; then each time a filter is changed, it's more connections, and it's compounds from there until the database connection pool is maxed out.
Is there a way to populate all the tables using a single connection to the database? Why would Power BI be using so many connections? Can we populate multiple tables in the advanced query editor? Using DirectQuery, are there any suggestions for what we can look at/troubleshoot/change in the report?
Thanks!

Power BI establishes multiple connections to the database to load multiple tables in parallel. If you don't want this, you can turn it off from Options->Current file->Data Load->Enable parallel loading of tables:
Keep in mind, that turning this option off most likely will increase the model loading time.
You may want to take a look at Maximum connections per data source option in Options->Current file->Direct query and the whole section Query reduction beneat it. Turning on Slicer selection and Filter selection on this page is highly recommended for cases like yours, but you need to train your users that they need to click on apply to see the results.

Ok.
We have a rather complex data model in the source SQL database, so we have created 5-6 views to better manage the data.
That's fine.
We have a requirement to use DirectQuery,
But now you're going to have a bad time. DirectQuery + complex views is a recipe for poor performance. Queries against your views will add joins, potentially across the whole model for filter context, as well as Measure and Calculated Column expressions. And these queries will change dynamically, based on the user's interaction with the report. So it's very difficult to see and test all the possible queries.
Basic guidance is to use import mode against views, and only use DirectQuery against properly-indexed tables. To address data freshness, you can replace the views with tables you load and keep up-to-date from your application, or perhaps use an Indexed View, etc.

Querying PowerBI data from MS-SQL

I finally decided to ask this (after a lot of google searching):
So we use Power BI for data visualization and thus in it are some calculated dashboards / data outputs which are used to monitor data quality etc. I want to be able to historical log these results so that over-time we can monitor progress i.e. was data quality improved. This is the end of the initial problem.
One approach to this problem was to connect to PowerBI from the MS-SQL side - hoping we can then set timed triggers to do the log by READING THE POWER-BI DASHBOARDS: So how do I query that (I have already developed a method to determine the connection using the Power-BI port as described here:
EXPORTING DATA FROM POWER BI DESKTOP TO MS-SQL
This is a screenshot from one of my MS-SQL connections through "Analysis Services":
I am assuming the objects named like "LocalDateTable_" are the actual BI analysis I want to query. "New Query" is an MDX type of Query. Should I go this route for my problem (logging powerbi analyses)?

At first this sounds crazy but on reflection I guess it was only a matter of time, and a sign of the maturity of Power BI solutions ...
I would use the SQL Server Profiler to capture the queries generated while you use your dashboard & report.
https://insightsquest.com/2017/05/07/profiler-trace-for-power-bi-desktop/
Then I would build an SSIS package to run the MDX queries and deliver the datasets to SQL Server, with extra columns e.g. StartTime.

Directly query databases with 1b rows of data using Tableau or PowerBI

I occasionally see people or companies showcasing querying a db/cube/etc from Tableau or PowerBI with less than 5s of response, sometimes even less than 1s. How do they do this? Is the data optimized to the gills? Are they using a massive Db?
On a related question, I've been experimenting with analysing a much smaller dataset 100m rows with Tableau against SQL DW and it still takes nearly a minute to calculate. Should I try some other tech? Perhaps Analysis Services or a big data technology?
These are usually one-off data analysis assignments so I do not have to worry about data growth.

Live connections in Tableau will only be as fast as the underlying data source. If you look at your log (C:\Users\username\Documents\My Tableau Repository\Logs\log.txt), you will see the sql tableau issued to the database. Run that query on the server itself...should take about the same amount of time. Side note: Tableau has a new data engine coming with the next release. It's called 'Hyper'. This should allow you to create an extract from 2b rows with very good performance. You can download the beta now...more info here

Do SQL targeted BI solutions like Looker and Chart.io use OLAPs?

I know that OLAP is used in Power Pivot, as far as I know, to speed up interacting with data.
But I know that big data databases like Google BigQuery and Amazon RedShift have appeared in the last few years. Do SQL targeted BI solutions like Looker and Chart.io use OLAPs or do they rely on the speed of the databases?

Looker relies on the speed of the database but does model the data to help with speed. Mode and Periscope are similar to this. Not sure about Chartio.
OLAP was used to organize data to help with query speeds. While used by many BI products like Power Pivot and Pentaho, several companies have built their own ways of organizing data to help with query speed. Sometimes this includes storing data in their own data structures to organize the data. Many cloud BI companies like Birst, Domo and Gooddata do this.
Looker created a modeling language called LookML to model data stored in a data store. As databases are now faster than they were when OLAP was created, Looker took the approach of connecting directly to the data store (Redshift, BigQuery, Snowflake, MySQL, etc) to query the data. The LookML model allows the user to interface with the data and then run the query to get results in a table or visualization.

That depends. I have some experience with BI solution (for example, we worked with Tableau), and it can operate is two main modes: It can execute the query against your server, or can collect the relevant data and store it on the user's machine (or on the server where the app installed). When working with large volumes, we used to make Tableau query the SQL Server itself, that's because our SQL Server machine is very strong compared to the other machines we had.
In any way, even if you store the data locally and want to "refresh" it, when it updates the data it needs to retrieve it from the database, which sometimes can also be an expensive operation (depends on how your data is built and organized).
You should also notice that you compare 2 different families of products: while Google BigQuery and Amazon's RedShift are actually database engines that used to store the data and also query it, most of the BI and reporting solutions are more concerend about querying the data and visualizing it and therefore (generally speaking) are less focused on having smart internal databases (at least from my experience).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas