I apologize if this has already been answered, but did a search and could not find anything.
So, I've been working in SSRS and have made some parameters that display a different tablix depending on the options selected; for example, if the user does not want to display both the original AND actual expiration date, I have a parameter to display one of the two tablixes. I may need to add one or two more tablixes though because I wanted to implement an "order by" parameter (I couldn't get it to work in my actual SQL code).
Does having numerous tablixes affect performance? I do not want my reports to be bogged down! Thank you.
The number of tablixes is not a significant factor in performance imo. The main cause for performance issues would be
1) SQL performance
2) slow performing library functions or procedures
Related
I have to make a report with 168 rows. Most of them are sequential data, but there are summation rows for which I need to build helper tables.
Therefore I need to build like 45-50 queries, most of them Append Queries.
Is there a way to minimize the number of queries and develop a large report with 168 rows?
Should I use code?
Just this last year I created a complicated, multi-part and multi-page report with graphs, summations, running averages, trends, "pivot-tables", etc. I did not count how many "rows" of data, but here are some things I did to manage the many queries:
Most important lesson learned: After much optimization and attempts to consolidate and reuse queries and temporary tables, it still turns out that there is no set of "magic few" queries that will return the data you need. Even if you reduce the number of SQL queries from 45 to 35 (which would be impressive in many cases), there are still many queries that you need to manage in an intelligent way. The point is to worry more about writing manageable queries and good infrastructure, rather than making the focus on reducing the number. (If your process is similar, you'll inevitably have to add more queries and more details later anyway.)
Union queries indeed have their place and are sometimes necessary, but simply combining queries to "reduce the number" can have negative consequences. 1) Union queries cannot be built or visualized using the Design View. I consider myself a "real coder", but I still appreciate the ability to use UI components when I can. Design View offers various useful syntax and datatype checks. 2) It is often useful in debugging and optimization to be able to run queries individually. 3) Unions do not improve efficiency and might actually slow down queries when duplicate removal and sorting are not necessary. 4) I have experienced certain perfectly correct queries that result in errors when combined in Unions. I haven't learned how to predict this behavior so it's almost not worth mentioning... except to not be fooled into thinking that the individual queries are somehow flawed. (There are usually workarounds.)
Create all related report queries and temporary tables in a separate Access database and link to the main database. In other words, create a separate reporting front-end if possible. Not only can this keep the source database cleaner, it can make it more efficient (highly dependent on number of users and how they're sharing the database).
Name queries using a consistent pattern. I tried using numbered queries with some success. I personally find that descriptive names are more useful than short, cryptic names. Much cut and pasting becomes necessary however.
VBA code or macros can be better than individual saved queries.
I rarely use complicated macros, so most of these tips are relevant to VBA code, but I won't argue against macros because they offer at least some similar benefits. It's also possible without much work to create a useful "dashboard" form that makes VBA code click-n-run similar to macros.
Comments can be included adjacent to the SQL. This can be invaluable in outlining ugly SQL. For example, it can be worth explaining why you choose a LEFT JOIN with extra WHERE criteria instead of an INNER JOIN, especially to prevent "helpful" coworkers (or yourself) from rewriting a query just to find they failed to consider all the original contexts.
An entire sequence of query texts and execution can be traced and debugged live. If error handling is coded appropriately, you can create custom logs and specialized handling of errors. SQL text can be edited and rerun without stopping the entire process.
Query parameters can be passed to queries without awkward UI prompts. Parameters can be used to code proper queries with user input (i.e. avoid SQL injection) and can reduce number of similar queries that simply have different input or criteria but are otherwise identical.
Multiple queries can be wrapped in a transactions and all committed or rolled back together! Not sure whether macros support this.
You can either move the SQL to VBA, to a macro, or if they're all appending to one table, make a large union subquery. All will reach that goal. For usability, I often go for the macro, since it's click to run. Just SetWarnings first, and then chain RunSQL statements.
The UNION query is also elegant solution if applicable.
I'm looking for some ideas managing very large SQL queries in Oracle.
My employer is looking to build very wide reports ( 150 - 200 ) columns of data per report.
Each item is a sub-query or an element from a view. The data has to be real time, so DW style batch processing is not an option. We also don't use any BI tools , just a java app that generates Excel ( its a requirement to output data in Excel)
The query also contains unions as feeds from other systems.
The queries result in very large SQL ( about 1500 lines) that is very difficult to manage.
What strategies can I employ to make the work more manageable?
It is also not a performance problem. I was able to optimize the query to be very efficient , its mostly width of the query , managing 200 columns is a challenge in itself.
I deal with queries this length daily and here is some of what helps me out in manitaining them:
First alias every single one of the those columns. When you are building it you may know where each one came from but when it is time to make a change, it is really helpful to know exactly where each column came from. This applies to join conditions, group by and where conditions as well as the select columns.
Organize in easily understandable and testable chunks. I use temp tables to pull things that make sense together and so I can see the results before the final query while in test mode.
This brings me to test mode. If I have chunks of data, I design the proc with a test mode and then query individual temp tables when in test mode, so I can see where the data went wrong if there is a bug. Not sure how Oracle works but in SQL Server, I make this the last parameter and give it a default value, so that it doesn't need to be passed in by the application.
Consider logging the execution details and the values of passed in parameters and certainly log any error messages. This will help tremendously when you have to troubleshoot why this report that has functioned perfectly for six years doesn't work for this one user.
Put columns on a separate line for each one and do the same for where clauses. At times you may have to troublshoot by commenting out joins until you find the one that is causing the problem. It is easier if you can easily comment out the associated fields as well.
If you don't have a technical design document, then at least use comments to explain your thought process. You want to understand the whys not the hows in any comments. This stuff is hard to come back to later and understand even when you wrote it. Give your future self some help.
In developing from scratch, I put the select list in and then comment all but the first item. Then I build the query only until I get that value - testing until I am sure what I got was correct. Then I add the next one and whatever joins or where conditions I might need to get it. Test again making sure it is right. (Oops why did that go from 1000 records to 20000 when I added that? Hmm maybe there is something I need to handle there or is that right?) By adding only one thing at a time, you will find an error in the logic much faster and be much more confident of your results. It will also take you less time than trying to build a massive query in one go.
Finally, there is no substitute for understanding your data. There are plently of complex queries that work but do not give the correct answer. Know if you need an inner join or a left join. Know what where conditions you need to get the records you want. Know how to handle the records when you have a one-to-many relationship (this may require push back on the requirements); should you have 3 lines (one for each child record), or should you put that data in a comma delimited list or should you pick only one of the many records and have one line using aggregation. If the latter, what is the criteria for choosing the record you want to keep?
Without seeing the specifics of your problem, here are a couple of ideas that immediately come to mind:
If you are looking purely for management, I might suggest organizing your subqueries as a number of views and then referencing those views in your final query.
For performance on the other hand you may want to consider creating temp tables or even materialized views (which are fixed views) to break up the heavier parts of your process.
If your queries require an enormous amount of subquerying in order to gain usable data, you might need to rethink your database design and possibly create a number of datamarts to easily access reporting data. Think of these as mini-warehouses sans the multi-year trended data.
Finally, I know you said you don't use any BI tools but this problem certainly seems like one that might make sense by organizing your data into "cubes" or Business Object "universes". It might be worthwhile to at least entertain the cost of bringing on a BI tool vs. the programming hours to support the current setup.
When I'm creating an SSRS report, I always have a dilemma about "how to create the report with the least generation time possible".
In general the generation time (or performance time) is divided into two main parts:
The SQL Query.
The Report components (expressions, groups etc.).
As you know some of the things that are being performed in SSRS can be done in the SQL query and vice versa.
For example:
I can use Group by clause in SQL, but can do the same when using a Table with Groups definition.
I can use Casting in order to compare two values in SQL and also directly inside an expression.
and many more...
My questions are:
A. Which part (SQL query or SSRS) costs more time (assuming that the task can be made in both SSRS and SQL) ?
B. What are the guidelines, if any, on which I should base a decision when having a dillema where to execute the given situation?
As always with performance issues:
Don't prematurely optimize. If something's simpler in SSRS then do it there. Only when a problem arises consider trading clarity for performance (possibly by moving code to the SQL side).
Measure. Use the ExecutionLog2 view to get a general idea about where your bottle-necks are. Do more measuring and testing so you're sure you're investing time in improving performance of bits that matter.
Bottom line: let clarity of code guide where you solve a particular problem, and optimize selectively when performance becomes an issue.
Eric Lippert wrote a nice blog post about when and how to worry about performance. The context is C#, but the basic idea holds for other situations such as SSRS/SQL as well.
By the way, if you have a look at mentioned ExecutionLog2 view, you'll notice that there's in fact three components in performance you should know about:
Data retrieval (SQL)
Report model (transforming the dataset to an internal model)
Rendering (transforming the model into an XLS, PDF, etc)
Knowing in which part a bottle-neck lies is key to knowing how to solve a performance problem.
To end with a suggestion based on my experience:
As a rule of thumb, prefer SQL over SSRS if you're worried about performance, especially for aggregation. Also consider tuning your database (indexes and such) if needed.
This rule of thumb would be best if I could back it up by facts and research. Alas, I don't have any. I can say that in my own experience, most often when I had performance problems with reports moving aggregation and calculation from SSRS to SQL would help in solving this issue.
It is important to remember that SSRS is smart and saves exception until you need it. If you are exporting then it will extract all data. Also if you are viewing on line and you have expand and collapse rows they will not be executed until you want to view it. On that theory a basic SQL is preferred.
It would be best to leave the aggregation to SSRS as the report will attempt to aggregate in any case in the tablix. As for charts it's best to aggregate unless you have a tablix as well.
As for simple calculations this should be done in the SQL, such as comparisons, etc.
Remember that SSRS is smarter than you ;) and the simplest SQL enables this service to work best and this service is primarily for display.
If you use a MS SQL server for the dataset the service will work at its best.
Part of my work involves creating reports and data from SQL Server to be used as information for decision. The majority of the data is aggregated, like inventory, sales and costs totals from departments, and other dimensions.
When I am creating the reports, and more specifically, I am developing the SELECTs to extract the aggregated data from the OLTP database, I worry about mistaking a JOIN or a GROUP BY, for example, returning incorrect results.
I try to use some "best practices" to prevent me for "generating" wrong numbers:
When creating an aggregated data set, always explode this data set without the aggregation and look for any obvious error.
Export the exploded data set to Excel and compare the SUM(), AVG(), etc, from SQL Server and Excel.
Involve the people who would use the information and ask for some validation (ask people to help to identify mistakes on the numbers).
Never deploy those things in the afternoon - when possible, try to take a look at the T-SQL on the next morning with a refreshed mind. I had many bugs corrected using this simple procedure.
Even with those procedures, I always worry about the numbers.
What are your best practices for ensuring the correctness of the reports?
have you considered filling your tables with test data that produces known results and compare your query results with your expected results.
Signed, in writing
I've found that one of the best practices is that both the reader/client and the developers are on the same (documented) page. That way, when mysterious numbers appear (and they do), I can point to the specification in writing and say, "This is why you see this number. Would you like it to be different?".
Test, test, test
For seriously complicated reports, we went through test data up and down with the client, until all the numbers were correct, and client was were satisfied.
Edge Cases
We discovered a seriously complicated case in our reporting system that turned everything upside down (on our end). What if the user generates a report (say Year-End 2009) , enters data for the new year, and then comes back to generate the same report? The data has changed but that report should not. Thinking and working these cases out can save a lot of heartache.
Write some automated tests.
We have quite a lot of reporting services reports - we test them using Selenium. We use a test data page to squirt some known data into an empty database, then run the report and assert that the numbers are as expected.
The builds run every time we check in, so we know we haven't done anything too stupid
I understand that, in the interest of efficiency, when you query the database you should only return the columns that are needed and no more.
But given that I like to use objects to store the query result, this leaves me in a dilemma:
If I only retrieve the column values that I need in a particular situation, I can only partially populate the object. This, I feel, leaves my object in a non-ideal state where only some of the properties and methods are available. Later, if a situation arises where I would like to the reuse the object but find that the new situation requires a different but overlapping set of columns to be returned, I am faced with a choice.
Should I reuse the existing SQL and add to the list of selected columns the additional fields that are required by the new situation so that the same query and object mapping logic can be reused for both? Or should I create another method that results in the execution of a slightly different SQL which results in the populating of only those object properties that were returned by the 2nd query?
I strongly suspect that there is no magic answer and that the answer really "depends" on the situation but I guess I am looking for general advice. In general, my approach has been to either return all columns from the queried table or to add to the query the additional columns as they are needed but to reuse the same SQL (and mapping code) that is, until performance becomes a concern. In general, I find that unless you are retrieving a large number of row - and I usually am not - that the cost of adding additional columns to the output does not have a noticable effect on performance and that the savings in development time and the simplified API that result are a good trade off.
But how do you deal with this situation when performance does become a factor? Do you create methods like
Employees.GetPersonalInfo
Employees.GetLittleMorePersonlInfoButMinusSalary
etc, etc etc
Or do you somehow end up creating an API where the user of your API has to specify which columns/properties he wants populated/returned, thereby adding complexity and making your API less friendly/easy to use?
Let's say you want to get Employee info. How many objects would typically be involved?
1) an Employee object
2) An Employees collection object containing one Employee object for each Employee row returned
3) An object, such as EmployeeQueries that returns contains methods such as "GetHiredThisWeek" which returns an Employees collection of 0 or more records.
I realize all of this is very subjective, but I am looking for suggestions on what you have found works best for you.
I would say make your application correct first, then worry about performance in this case.
You could be optimizing away your queries only to realize you won't use that query anyway. Create the most generalized queries that your entire app can use, and then as you are confident things are working properly, look for problem areas if needed.
It is likely that you won't have a great need for huge performance up front. Some people say the lazy programmers are the best programmers. Don't over-complicate things up front, make a single Employee object.
If you find a need to optimize, you'll create a method/class, or however your ORM library does it. This should be an exception to the rule; only do it if you have reason to do so.
...the cost of adding additional columns to the output does not have a noticable effect on performance...
Right. I don't quite understand what "new situation" could arise, but either way, it would be a much better idea (IMO) to get all the columns rather than run multiple queries. There isn't much of a performance penalty at all for getting more columns than you need (although the queries will take more RAM, but that shouldn't be a big issue; besides, hardware is cheap). Also, you'd save yourself quite a bit of development time.
As for the second part of your question, it's really up to you. As an example, Rails takes more of a "usability first, performance last" approach, but that may not be what you want. It just depends on your needs. If you're willing to sacrifice a little usability for performance, by all means, go for it. I would.
If you are using your Objects in a "row at a time" CRUD type application, then, by all means copy all the columns into your object, the extra overhead is minimal, and you object becomes truly re-usable for any program wanting row access to the table.
However if your SQL is doing a complex join or returning a large set of rows, then request precisely and only what you want. You get two performance penalties here, one handling each column each time will eat up cpu for no benefit, and, two most DBMS systems have a bag of tricks for optimising queries (such as index only access) which can only be used if you specify precisely which columns you want.
There is no reuse issue in most of these cases as scan/search processes tend to very specific to a particular use case.