Tuning OBIEE generated SELECT queries - sql

We have our data marts/warehouse on Oracle 11g implemented as a star schema. Business reports are designed using OBIEE. I come from a ETL background and have very little knowledge in OBIEE.
Once the OBIEE RPD is designed, I see that OBIEE starts generating SELECT queries in the background to feed data into the reports. On many occasions, I have noticed that the SELECT queries are not optimized (big fact table is fully scanned more than once in separate WITH clauses).
When the report performance is bad, the OBIEE queries are sent to the ETL team for performance tuning. I'm confused about how I can tune them because they are auto generated. I know there is an option to write custom sql in OBIEE (without going via RPD) for each report, but our standards do not allow that and I also think it does not leverage the benefits of OBIEE.
Has anyone faced a problem like above? How to tune such queries?

Firstly, you're right that custom SQL (known as direct database query) is not a good idea in principle, though it is useful on occasion. But it's not the solution to your problem.
Tuning the OBI queries generated is an OBI RPD task, for the OBI developer; tuning the database for the OBI queries generated is a database/ETL task. But you can't really do one without the other – OBI needs to be designed so as to generate suitable queries, and the database needs to be designed in such a way that suitable good queries can be generated to answer the question being asked.
OBI is basically a SQL generator, and if the RPD model is bad suboptimal, then the resulting query will be bad suboptimal. OBI will generate SQL based on the information it has in the RPD about the layout and structure of the data and database.
You're obviously coming at it from the database side, and so to you the SQL is bad because it isn't what you'd write. It's also possible that the database design is bad for getting an answer to the question that OBI is being asked.

As jackohug says, OBIEE is a SQL generator, and the general aproach is to try to optimize the query generated by OBIEE, not try to change this query. Somehow, depending on the performance problem, you can try some tricks.
First all, is your table partioned and your reports can benefit from the partioning?
Second, add indexes on the fact table so any filter on the dimensions can benefit the access to the fact table.
Third, building agregate tables, resuming the fact table, so when reports don't show much detail you first access to the agregate table with much less data, and is only as the users drill down through structure (and while doing so, they are applying filters to the data they are interested in) that they access to the much detailed fact table but applying filters to avoid full scans.
You could also tell OBIEE to use hints when accessing to the table, although, as with Direct Database Query I wouldn't recommend it, I would try first optimizing using the first three aproaches.
Regards

if you have diagnostics and tuning pack licenses, you can run the SQL Tuning Advisor. The SQL Tuning Advisor is running the optimizer in tuning mode and it may be able to generate a SQL Profile with a better execution plan. Sometimes the advisor recommends indexes for tuning as well. Both SQL Profiles and indexes do not require a change to the application.

I've yet to have much success with the SQL tuning advisor. Some experience in SQL tuning and a bit of research can typically produce a far better plan.
If all the layers are built well and all you need is a final tweak then add a hidden column to the start of the report (Answer/Analysis) with a SQL hint.
I'd be very careful about adding hints through the RPD layers because of the many different and unexpected ways that others will join and use the tables.

Related

Tableau take forever to use a PostgreSQL view

I am trying to connect Tableau to a SQL view I made in PostgreSQL.
This view returns ~80k rows with 12 fields. On my local PostgreSQL database, it take 7 seconds to execute. But when I try to create a chart in a worksheet using this view, it take forever to display something (more than 2 minutes to add just a field).
This views in complex and involve many join, coalesce and case due to business specifities.
Do you guys have an idea to improve?
Thank you very much for your help ! :-)
Best,
Max
Tableau documentation has helpful info for performance optimization
https://help.tableau.com/current/pro/desktop/en-us/performance_tips.htm
I highly recommend the whitepaper on designing efficient dashboards mentioned on that site - a bit dated, but timeless advice
For starters, learn to use the Performance Recorder in Tableau to find out what tasks are causing delays, and if they involve queries, to capture the SQL that Tableau emits.
With Tableau, and many other client tools, the standard first approach is to see what SQL the client tool generates, then execute that SQL without using the client tool, say just in psql in your case. If you can reproduce the slow query just in SQL, then you are better positioned to either
Optimize your database, say either with indices, or restructuring your schema OR
Understand why your client tool, Tableau in this case, generated that inefficient query and reason about what you could differently in Tableau that would cause it to generate different SQL
The whitepaper I mentioned should be helpful

Custom SQL Query in Tableau

Does using 'custom SQL' instead of joins in Tableau increase the performance of extract refresh on the server? Can someone explain it briefly?
The answer to almost every performance question is first, "it depends" and second, test and understand the measurement results. Real results carry more weight than advice from anyone on the Internet (from me or anyone else)
Still, custom SQL is usually not helpful for increasing performance in Tableau, and often hurts. It is usually much better to define your relationships in Tableau and let Tableau then generate optimized SQL for each view -- just as you let a compiler generate optimized machine code.
When you use custom SQL, you prevent Tableau from optimizing the SQL it generates. It has to run the SQL you provide in a subquery.
The best use case for custom SQL in Tableau is to access database specific features, or possibly windowing queries. Most other SQL functionality is available by using the corresponding Tableau features.
If you do have a complex slow custom SQL query that you must use, it is usually a good idea to make an extract so you only pay the performance cost during extract refresh.
So in your case, I'd focus effort on streamlining or eliminating the custom SQL, monitoring the query plan for the generated SQL, and indexing your database to best support that query.

When a query is executed, what happens first in the back-end?

I'm having a query in COGNOS which would fetch me a huge volume of data. Since the execution time would be higher, I'd like to fine tune my query. Everyone knows that the WHERE clause in the query would get executed first.
My doubt is which would happen first when a query is executed?
The JOIN in the query would be established first or the WHERE clause would be executed first?
If JOIN is established first, I should specify the filters of the DIMENSION first else I should specify the filters of the FACT first.
Please explain me.
Thanks in advance.
The idea of SQL is that it is a high level declarative language, meaning you tell it what results you want rather than how to get them. There are exceptions to this in various SQL implementations such as hints in Oracle to use a specific index etc, but as a general rule this holds true.
Behind the scenes the optimiser for your RDBMS implements relational algebra to do a cost based estimate of the different potential execution plans and select the one that it predicts will be the most efficient. The great thing about this is that you do not need to worry what order you write your where clauses in etc, so long as all of the information is there the optimiser should pick the most efficient plan.
That being said there are often things that you can so on the database to improve query performance such as building indexes on columns in large tables that are often used in filtering criteria or joins.
Another consideration is whether you can use parallel hints to speed up your run time but this will depend on your query, the execution plan that is being used, the RDBMS you are using and the hardware it is running on.
If you post the query syntax and what RDBMS you are using we can check if there is anything obvious that could be amended in this case.
The order of filters definitely does not matter. The optimizer will take care of that.
As for filtering on the fact or dimension table - do you mean you are exposing the same field in your Cognos model for each (ex ProductID from both fact and Product dimension)? If so, that is not recommended. Generally speaking, you should expose the dimension field only.
This is more of a question about your SQL environment, though. I would export the SQL generated by Cognos from within Report Studio (Tools -> Show Generated SQL). From there, hopefully you are able to work with a good DBA to see if there are any obvious missing indexes, etc in your tables.
There's not a whole lot of options within Cognos to change the SQL generation. The prior poster mentions hints, which could work if writing native SQL, but that is a concept not known to Cognos. Really, all you can do is change the implict/explict join syntax which just controls whether the join happens in an ON statement or in the WHERE. Although the WHERE side is pretty ugly it generally compiles the same as ON.

Powerful tools for creating SQL queries

I'm looking for a tool, which would help creating complex SQL queries. Sometimes it's difficult to even verify, whether the results of a query are correct. It's especially easy to get queries joining several tables to return too little or too much data.
The tool should enable at least creation of test tables, some kind of visualization how the queries gather their data and hopefully give better parsing of error cases than for example Oracle does.
Are there tools like this or do I have to stick with creating test tables manually, filling them with test data and commiting all kinds of queries with SQuirrel SQL?
When you have a very complex query it is usually easiest to validate by breaking it up into multiple queries that populate temp tables. These intermediary results can be individually verified and then you bring them together to produce the final result set. Depending on performance needs you can stick with the temp table approach or you can then rewrite to a single statement. Typically when I have a huge query it is for background processing so I stick with the temp table approach.
What RDBMS are you using? All of the major ones have some type of console available (e.g.-SSMS in SQL Server, Toad in Oracle, MySQL Query Browser/Administrator for MySQL, etc.), and they all have Query Execution Plans where you can see how the query will actually run. So, the answer to your question is that it's entirely dependent on what RDBMS you're using, but the safe bet answer is: Yes.
I recommend trying SQL Server 2008 Management Studio Express (SSMSE) if you are working with SQL Server. I have used it at work and I believe it does everything you are looking for.
You can get it and SQL Server (express editions) here.
Certainly not a free, open-source solution, but I believe Quest Software's TOAD will fit your requirements. Quest seems to offer alot of tools in that space...they have tools for modeling and analysis, however I've never used the modeler or analyzer.
I personally have experience with the commercial version of TOAD for Oracle. It's GUI is overwhelming at first, but after you mentally filter out all of the extra buttons that you'll never use, it's manageable.

What tools are available to test SQL statement performance?

In the never-ending search for performance (and my own bludgeoning experience), I've learnt a few things that could drag down the performance of a SQL statement.
Obsessive Compulsive Subqueries Disorder
Doing crazy type conversions (and nest those into oblivion)
Group By on aggregate functions of said crazy type conversions
Where fldID in (select EVERYTHING from my 5mil record table)
I typically work with MSSQL. What tools are available to test the performance of a SQL statement? Are these tools built in and specific to each type of DB server? Or are there general tools available?
SQL Profiler (built-in): Monitoring with SQL Profiler
SQL Benchmark Pro (Commercial)
SQL Server 2008 has the new Data Collector
SQL Server 2005 (onwards) has a missing indexes Dynamic Management View (DMV) which can be quite useful (but only for query plans currently in the plan cache): About the Missing Indexes Feature.
There is also the SQL Server Database Engine Tuning Advisor which does a reasonable job (just don't implement everything it suggests!)
I mostly just use Profiler and the execution plan viewer
Execution Plans are one of the first things to look at when debugging query performance problems. An execution plan will tell you how much time is roughly spent in each portion of your query, and can be used to quickly identify if you are missing indexes or have expensive joins or loops.
MSSQL has a database tuning advisor that will often recommend indexes for tables based upon common queries run during the tuning period, however it wo't rewrite a query for you.
In my opinion, experience and experimentation are the best tools for writing good SQL queries.
In mysql (may be in other databases too) you can EXPLAIN your query to see what database server thinks about it. This usually used to deside which indexes should be created. And this one is build-in, so you can use it without installing additional software.
Adam Machanic has a simple tool called SqlQueryStress that might be of use. It is designed to be used to "run a quick performance test against a single query, in order to test ideas or validate changes".