Analysis Services Power Query Editor performance - sql

Currently I am building a model in Visual Studio for Azure Analysis Services, but I am experiencing very slow performance of the Power Query editor.
I am trying to do a left join on a table of about 1.600.000 rows. The table I am joining with is around 50 million rows. The merge-step works, but when I try to expand the columns it is downloading all the 50M rows for some reason. At least the status bar at the bottom indicates this.
This is quite annoying as it will do this every time I try to edit the query sequence.
Already tried setting several indexes on the SQL table
The Azure SQL server does not show usage peaks of 100%, max 80% sometimes
Any ideas how to solve this?

I noticed in SSMS that the PowerQuery editor creates so-called folded-queries and also introduces sorting statements which I don't setup in the editor.
So I fixed my performance issues by enabling legacy datasources in the Visual Studio options. With this I can write my own SQL statement which are many times faster.
Does anyone know why this is happening in the PowerQuery editor, and if using this legacy way of working has drawbacks compared to the editor?
#alejandro: I need Analysis Services mainly for the fast cache it provided. I tried to load the tables in PowerBI directly, but this became totally unresponsive.

Related

Question on best practice for creating views that are consumed by visualization tools like PowerBI or Tableau

I've tried searching around to see what the best practices are when designing a view that will be used for visualization going directly into PowerBI or Tableau.
I don't know the best way to ask this but is there an issue with creating a big query 30+ columns with multiple joins in the DB for export into the visualization platform? I've seen some posts regarding size and about breaking up into multiple queries but those are also in reference to bringing into some program and writing logic in the program to do the joins etc.
I have tried both ways so far, smaller views that I then create relationships in PowerBI or larger views where I'm dealing with one just flat table. I realize that in most respects PowerBI can do a star scheme with data being brought in but I've also run into weird issues with filtering within the PowerBI itself, that I have been able to alleviate and speed up by doing that work in the DB instead.
Database is a Snowflake warehouse.
Wherever possible, you should be using the underlying database to do the work that databases are good at i.e. selecting/filtering/aggregating data. So your BI tools should be querying those tables rather than bringing all the data into the BI tool as one big dataset and then letting the BI tool process it

SSAS Development Using Top n In Queries

I'm fairly new to SSAS development and when the existing team was giving me a run through of the existing SSAS project they mentioned that every query has a SELECT TOP *n* in it that they then manually go into the XML file and comment out when they are ready to migrate to production (and make sure you pick an n that no one else is using).
This was done because it takes too long to import the data into Visual Studio without the TOP n in the queries.
Is this really best practice, or is there a better way to set up the development environment so that you don't have to comment out code before a deployment?
I assume you are talking about Analysis Services Tabular which does load the data at design time into memory in your "workspace database" which is usually a local Analysis Services Tabular instance.
You may consider creating a SQL view layer and building the Analysis Services model on top of the views. This recommendation is mentioned here with reasons:
http://www.jamesserra.com/archive/2013/03/benefits-of-using-views-in-a-bi-solution/
But SELECT TOP X may not be enough. For example, if SELECT TOP 100 * FROM FactSales only returns fact rows for stores in the southwest but SELECT TOP 100 * FROM DimStore only returns stores in the northeast then it will be challenging to develop your model and calculations because everything will be rolling up to the blank store. So consider putting some more intelligent filter logic into the views.
It sounds like you want to change the MDX statements inside SSRS reports. If that is the case I would not suggest doing that. You need to see the performance of your reports on development environment as well. Of course, as suggested you can reduce the size of the data but that has big drawbacks since you cannot predict the real performance on Production anymore.
By the way, top n queries are generally very expensive since the results have to be calculated for the whole set and then get discarded. So advantage of having "top n" inside MDX to improve performance is pretty limited.

Slow SQL query processing 5000 rows, which SQL Server tool can help me identify the issue?

I'm writing a heavy SQL query using SQL Server 2008. When the query process 100 rows, it finished instantly. When the query process 5000 rows it takes about 1.1 minutes.
I used the Actual Execution Plan to check its performance while it processing 5000 rows. The query contained 18 sub-queries,
there is no significate higher percentage of query cost shown in the plan, e.g. all around 0%,2%,5%,7%. The highest one is 11%.
The screenshot below shows the highest process in the query. (e.g.94% of 11%)
I also used the Client Statistic Tool, Trial 10 shows when it process 5000 rows, Trial 9 shows when it process 100 rows.
Can anybody tell me where (or which SQL Server Tool) I can find the data/detail that indicates the slow process when the query execute 5000 rows?
Add:
Indexes, keys are added.
The actual exe plan shows no comment and no high percentage on each sub-query.
I just found 'Activity Monitor' shows one sub-query's 'Average Duration' is 40000ms in 'Recent Expansive Queries', while the actual plan shows this query takes only 5% cost of total process.
Thanks
For looking at performance, using the database tuning advisor and/or the missing index DMVs then examining the execution plan either in management studio or with something like sql sentry plan explorer should be enough to give you an indication where you need to make modifications.
Understanding the execution plan and how the physical operators relate to your logical operations is the biggest key to performance tuning, that and a good understanding of indexes and statistics.
I don't think there is any tool that will just automagically fix performance for you.
While I do believe that learning the underpinnings of the execution plan and how the SQL Server query optimizer operates is an essential requirement to be a good database developer, and humans are still way better at diagnosing and fiddling with SQL to get it right than most tools native or third-party, there is in fact a tool which SQL Server Management Studio provides which can (sometimes) "automagically" fix performance for you:
Database Engine Tuning Advisor
Which you can access via the ribbon menu under Query -> Analyze Query Using Database Engine Tuning Advisor OR (more helpfully) by selecting your query, right-clicking on the selection, and choosing Analyze Query using Database Engine Tuning Advisor, which gives the added bonus of automatically filtering down to only the database objects being used by your query.
All the tuning advisor actually does is investigates to see if there are any indexes or statistics that could be added to your objects. It then "recommends" them and you can apply none, some, or all of them if you choose.
Caveat emptor alert! All of its recommendations are geared towards making that particular query run faster, so what it definitely does not do is help make you good decisions about the consequences of adding an index that only gets used by maybe one or two queries but has to be updated constantly when you add data to your database. This is a SQL anti-pattern known as "index shotgunning" and is generally frowned upon by DBAs, who would rather see a query rewritten to take advantage of more useful indexes.

CPU Usage to 100%

I have Oracle 11G R2 running on M-4000 machine (supposedly a powerful machine). Recently, I noticed that my application has gone slow and is taking lot of time in quering from database. To my shock when I saw the statistics of DB machine I found the CPU usage to 100%.
Here is the ash report.
Now can someone put me wise to what should I be doing to avoid such situation.
Those queries that are doing a 'table access full' may be your problem... any full table scan will kill a query and can usually be resolved by adding a simple index. You can profile your queries, and tools will recommend indexes to add in order to improve execution of certain queries. I think I did this with Squirrel on an oracle db.
Also, your IDs seem to be strings and you're doing a 'lower(id) like :3'. This should be changed to use integers, or at the very least get rid of the lower and do a match on '3'.

I have statistical data on my SQL queries, what can I do with that to make my app faster?

I have a J2EE application built on EclipseLink and running under Glassfish on Postgres. We're doing some performance analysis now.
I turned on pg logging on our build server and analyzed the output with pgfouine. Now that I have these charts and data from pgfouine, how should I interpret that to actually improve performance?
I think I want to find the most frequently used, but slower queries to get the most benefit. Reducing the number of frequently run queries (perhaps through caching) also seems like a sound approach.
Properly done indexing helps a lot. If a column appears in many WHERE clauses, consider marking it for indexing.