When I'm creating an SSRS report, I always have a dilemma about "how to create the report with the least generation time possible".
In general the generation time (or performance time) is divided into two main parts:
The SQL Query.
The Report components (expressions, groups etc.).
As you know some of the things that are being performed in SSRS can be done in the SQL query and vice versa.
For example:
I can use Group by clause in SQL, but can do the same when using a Table with Groups definition.
I can use Casting in order to compare two values in SQL and also directly inside an expression.
and many more...
My questions are:
A. Which part (SQL query or SSRS) costs more time (assuming that the task can be made in both SSRS and SQL) ?
B. What are the guidelines, if any, on which I should base a decision when having a dillema where to execute the given situation?
As always with performance issues:
Don't prematurely optimize. If something's simpler in SSRS then do it there. Only when a problem arises consider trading clarity for performance (possibly by moving code to the SQL side).
Measure. Use the ExecutionLog2 view to get a general idea about where your bottle-necks are. Do more measuring and testing so you're sure you're investing time in improving performance of bits that matter.
Bottom line: let clarity of code guide where you solve a particular problem, and optimize selectively when performance becomes an issue.
Eric Lippert wrote a nice blog post about when and how to worry about performance. The context is C#, but the basic idea holds for other situations such as SSRS/SQL as well.
By the way, if you have a look at mentioned ExecutionLog2 view, you'll notice that there's in fact three components in performance you should know about:
Data retrieval (SQL)
Report model (transforming the dataset to an internal model)
Rendering (transforming the model into an XLS, PDF, etc)
Knowing in which part a bottle-neck lies is key to knowing how to solve a performance problem.
To end with a suggestion based on my experience:
As a rule of thumb, prefer SQL over SSRS if you're worried about performance, especially for aggregation. Also consider tuning your database (indexes and such) if needed.
This rule of thumb would be best if I could back it up by facts and research. Alas, I don't have any. I can say that in my own experience, most often when I had performance problems with reports moving aggregation and calculation from SSRS to SQL would help in solving this issue.
It is important to remember that SSRS is smart and saves exception until you need it. If you are exporting then it will extract all data. Also if you are viewing on line and you have expand and collapse rows they will not be executed until you want to view it. On that theory a basic SQL is preferred.
It would be best to leave the aggregation to SSRS as the report will attempt to aggregate in any case in the tablix. As for charts it's best to aggregate unless you have a tablix as well.
As for simple calculations this should be done in the SQL, such as comparisons, etc.
Remember that SSRS is smarter than you ;) and the simplest SQL enables this service to work best and this service is primarily for display.
If you use a MS SQL server for the dataset the service will work at its best.
Related
I am trying to connect Tableau to a SQL view I made in PostgreSQL.
This view returns ~80k rows with 12 fields. On my local PostgreSQL database, it take 7 seconds to execute. But when I try to create a chart in a worksheet using this view, it take forever to display something (more than 2 minutes to add just a field).
This views in complex and involve many join, coalesce and case due to business specifities.
Do you guys have an idea to improve?
Thank you very much for your help ! :-)
Best,
Max
Tableau documentation has helpful info for performance optimization
https://help.tableau.com/current/pro/desktop/en-us/performance_tips.htm
I highly recommend the whitepaper on designing efficient dashboards mentioned on that site - a bit dated, but timeless advice
For starters, learn to use the Performance Recorder in Tableau to find out what tasks are causing delays, and if they involve queries, to capture the SQL that Tableau emits.
With Tableau, and many other client tools, the standard first approach is to see what SQL the client tool generates, then execute that SQL without using the client tool, say just in psql in your case. If you can reproduce the slow query just in SQL, then you are better positioned to either
Optimize your database, say either with indices, or restructuring your schema OR
Understand why your client tool, Tableau in this case, generated that inefficient query and reason about what you could differently in Tableau that would cause it to generate different SQL
The whitepaper I mentioned should be helpful
Does using 'custom SQL' instead of joins in Tableau increase the performance of extract refresh on the server? Can someone explain it briefly?
The answer to almost every performance question is first, "it depends" and second, test and understand the measurement results. Real results carry more weight than advice from anyone on the Internet (from me or anyone else)
Still, custom SQL is usually not helpful for increasing performance in Tableau, and often hurts. It is usually much better to define your relationships in Tableau and let Tableau then generate optimized SQL for each view -- just as you let a compiler generate optimized machine code.
When you use custom SQL, you prevent Tableau from optimizing the SQL it generates. It has to run the SQL you provide in a subquery.
The best use case for custom SQL in Tableau is to access database specific features, or possibly windowing queries. Most other SQL functionality is available by using the corresponding Tableau features.
If you do have a complex slow custom SQL query that you must use, it is usually a good idea to make an extract so you only pay the performance cost during extract refresh.
So in your case, I'd focus effort on streamlining or eliminating the custom SQL, monitoring the query plan for the generated SQL, and indexing your database to best support that query.
I'm having a query in COGNOS which would fetch me a huge volume of data. Since the execution time would be higher, I'd like to fine tune my query. Everyone knows that the WHERE clause in the query would get executed first.
My doubt is which would happen first when a query is executed?
The JOIN in the query would be established first or the WHERE clause would be executed first?
If JOIN is established first, I should specify the filters of the DIMENSION first else I should specify the filters of the FACT first.
Please explain me.
Thanks in advance.
The idea of SQL is that it is a high level declarative language, meaning you tell it what results you want rather than how to get them. There are exceptions to this in various SQL implementations such as hints in Oracle to use a specific index etc, but as a general rule this holds true.
Behind the scenes the optimiser for your RDBMS implements relational algebra to do a cost based estimate of the different potential execution plans and select the one that it predicts will be the most efficient. The great thing about this is that you do not need to worry what order you write your where clauses in etc, so long as all of the information is there the optimiser should pick the most efficient plan.
That being said there are often things that you can so on the database to improve query performance such as building indexes on columns in large tables that are often used in filtering criteria or joins.
Another consideration is whether you can use parallel hints to speed up your run time but this will depend on your query, the execution plan that is being used, the RDBMS you are using and the hardware it is running on.
If you post the query syntax and what RDBMS you are using we can check if there is anything obvious that could be amended in this case.
The order of filters definitely does not matter. The optimizer will take care of that.
As for filtering on the fact or dimension table - do you mean you are exposing the same field in your Cognos model for each (ex ProductID from both fact and Product dimension)? If so, that is not recommended. Generally speaking, you should expose the dimension field only.
This is more of a question about your SQL environment, though. I would export the SQL generated by Cognos from within Report Studio (Tools -> Show Generated SQL). From there, hopefully you are able to work with a good DBA to see if there are any obvious missing indexes, etc in your tables.
There's not a whole lot of options within Cognos to change the SQL generation. The prior poster mentions hints, which could work if writing native SQL, but that is a concept not known to Cognos. Really, all you can do is change the implict/explict join syntax which just controls whether the join happens in an ON statement or in the WHERE. Although the WHERE side is pretty ugly it generally compiles the same as ON.
I'm writing many reporting queries for my current employer utilizing Oracle's WITH clause to allow myself to create simple steps, each of which is a data-oriented transformation, that build upon each other to perform a complex task.
It was brought to my attention today that overuse of the WITH clause could have negative side effects on the Oracle server's resources.
Can anyone explain why over use of the Oracle WITH clause may cause a server to crash? Or point me to some articles where I can research appropriate use cases? I started using the WITH clause heavily to add structure to my code and make it easier to understand. I hope with some informative responses here I can continue to use it efficiently.
If an example query would be helpful I'll try to post one later today.
Thanks!
Based on this: http://www.dba-oracle.com/t_with_clause.htm it looks like this is a way to avoid using temporary tables. However, as others will note, this may actually mean heavier, more expensive queries that will put an additional drain on the database server.
It may not 'crash'. That's a bit dramatic. More likely it will just be slower, use more memory, etc. How that affects your company will depend on the amount of data, amount of processors, amount of processing (either using with or not)
I am building queries for a database in MS Access 2007 and I am wondering if my current design practices are up to par. Basically, the database was configured before I came, but I have been given the responsibility of building efficient queries to extract the data.
My current queries are small and simple, each accomplishing 2-3 tasks (sometimes only 1) at a time. The reason I am taking this approach is because I am completely new to SQL, and I find it easier to work with many, simple queries and use reports to consolidate the data, as opposed to building extremely complex queries which are 1) hard to build (for me, anyways) and 2) hard to maintain.
I was just curious if anyone had any best practices for query design, and if you could give me some specific feed back for the approach listed above, and whether or not I should start making complex queries, or just stick to simple queries and reports to consolidate the relevant data.
Thanks.
The people answering this question are not coming to it from an Access point of view, so I'll offer some observations as somebody who has been creating Access applications professionally full-time since 1996.
First off, there are several places where you'll have SQL in an Access application:
stored queries.
stored properties of forms, reports, combo boxes and list boxes.
in VBA code where you are writing SQL on the fly.
Managing all of these SQL statements in an organized fashion is difficult, if not impossible. But I'm not sure it's worth it!
First off, consider just stored queries. If you follow the advice of saving a query for every individual task so that each SQL statement is used in only one place, you'll soon have a mess in the list of queries, and you'll be forced into some kind of naming convention to keep track of what's what. Because of this, I generally don't save queries EXCEPT where they MUST be saved, or where the optimization that comes with a saved query is going to be helpful (i.e., large dataset or complex joins/filtering).
For example, when I first started programming in Access, I'd save all the rowsources of my combo boxes as saved queries. I developed a naming convention so they wouldn't be mixed in with the other queries in the list of queries, so it wasn't to hard to manage. At first, I thought I'd be re-using the saved queries, but it quickly became clear that I needed to make changes for individual circumstances, and changing a query that was used elsewhere might alter its results in other contexts, so really, there was no "shared code" benefit to the saved queries (as I thought there would be). The only place where it was helpful was where I had the same combo box on multiple forms, and then I could save the rowsource for that as a saved query and if I needed to alter it, I could it in just one place. However, that was really only an advantage for a relatively complex rowsource -- a simple SELECT on a couple of fields doesn't really benefit from that kind of sharing, particularly when it's used in only a couple of different places.
In short, I quickly concluded that it was just easier to save the SQL statements where they were used -- since there was very little re-use in the first place (once I gained enough experience to realize the pitfalls of trying to re-use them), this worked much better, and it kept the SQL close to where it was being used.
For forms and reports, I do some of the same things, but in general, use saved queries for the purpose of avoiding having to write too many complex subselects for use as derived tables. Where I needed those it was always easier to write it and save it and then use it with a JOIN in another SQL statement than it was to try to use the subselect inline as a derived table (which just makes for complicated SQL that's hard to read -- particularly when you can't comment or format your SQL, as is the case with saved Access queries).
In general, I don't save the recordsources of forms or reports except where there is real re-use going on (a report will often use the same recordsource as a form, so in that case, it's useful to save it, so that when you change the SQL of the form, the report that goes with it inherits the alteration).
That all leaves dynamic SQL assembled in VBA code. I use lots of this, from dynamically setting the rowsources of combo/listboxes, to setting the recordsources of subforms for filtering purposes. This is harder to manage, and sometimes I use string constants in the module to make that easier. For instance, in a case where you're writing dynamic SQL where everything remains the same except the WHERE clause, a constant with the SELECT and a second constant with the ORDER BY makes it a lot easier to assemble the complete SQL statement.
I don't know if this really answers your questions, but I have learned over the years that the benefits of re-using SQL statements are vastly outweighed by the uncertainty that comes from the inability to track easily where that SQL statement may be used. I find that storing the SQL statment as close to where it is used as possible is the best practice, as that is a form of "self-documentation" (though not a great one!).
I do make many exceptions and save queries when there is a real and demonstrable benefit in terms of performance or managing what would otherwise become much more comples SQL. However, I would also note that one should also not go too far in the other direction, using tons of nested saved queries, because then you run into other problems (i.e., the "too many databases" problem, which is actually caused by using up the 2048 table handles available at one time -- it's done more easily than you might think).
My humble opinion, it doesn't matter if DB engine is big and monstrous as MSSQL or Oracle, or tiny and simple as SQLite, every query (or stored procedure or any other unit of data processing) should be responsible only for 1 function. I use this principle anywhere (not only in DB development) and I can say it works.
If you are not sure, try to read books about refactoring, Fawler for example. I suppose his principles are applicable to any area of development.
If you are storing your data in MSAccess then your database cannot be too large and any optimization you do is limited by the constraints MSAccess imposes. If better (more optimized) queries is a goal, then perhaps migrating the data out of Access and into SQL Server may allow you to have better flexibility in development going forward. You can leverage cached execution plans, stored procedure, and views.
This may mean that you will need to enhance your T-SQL skills to accomplish this.
So weigh out the options you propose in your question:
1. Keep code simple (comfortable at your current skill level)
2. Meet the responsibility to create efficient queries for data extraction.
SQL Server Express could be a good starting point (it's free).