Custom SQL Query in Tableau - sql

Does using 'custom SQL' instead of joins in Tableau increase the performance of extract refresh on the server? Can someone explain it briefly?

The answer to almost every performance question is first, "it depends" and second, test and understand the measurement results. Real results carry more weight than advice from anyone on the Internet (from me or anyone else)
Still, custom SQL is usually not helpful for increasing performance in Tableau, and often hurts. It is usually much better to define your relationships in Tableau and let Tableau then generate optimized SQL for each view -- just as you let a compiler generate optimized machine code.
When you use custom SQL, you prevent Tableau from optimizing the SQL it generates. It has to run the SQL you provide in a subquery.
The best use case for custom SQL in Tableau is to access database specific features, or possibly windowing queries. Most other SQL functionality is available by using the corresponding Tableau features.
If you do have a complex slow custom SQL query that you must use, it is usually a good idea to make an extract so you only pay the performance cost during extract refresh.
So in your case, I'd focus effort on streamlining or eliminating the custom SQL, monitoring the query plan for the generated SQL, and indexing your database to best support that query.

Related

Tableau take forever to use a PostgreSQL view

I am trying to connect Tableau to a SQL view I made in PostgreSQL.
This view returns ~80k rows with 12 fields. On my local PostgreSQL database, it take 7 seconds to execute. But when I try to create a chart in a worksheet using this view, it take forever to display something (more than 2 minutes to add just a field).
This views in complex and involve many join, coalesce and case due to business specifities.
Do you guys have an idea to improve?
Thank you very much for your help ! :-)
Best,
Max
Tableau documentation has helpful info for performance optimization
https://help.tableau.com/current/pro/desktop/en-us/performance_tips.htm
I highly recommend the whitepaper on designing efficient dashboards mentioned on that site - a bit dated, but timeless advice
For starters, learn to use the Performance Recorder in Tableau to find out what tasks are causing delays, and if they involve queries, to capture the SQL that Tableau emits.
With Tableau, and many other client tools, the standard first approach is to see what SQL the client tool generates, then execute that SQL without using the client tool, say just in psql in your case. If you can reproduce the slow query just in SQL, then you are better positioned to either
Optimize your database, say either with indices, or restructuring your schema OR
Understand why your client tool, Tableau in this case, generated that inefficient query and reason about what you could differently in Tableau that would cause it to generate different SQL
The whitepaper I mentioned should be helpful

Oracle vs SQL Server query optimization main difference

I am doing a paper about query optimization in different DBMS, and I am trying to find core differences in those.
Both use CBO, cost based optimization in the same way, parse the query -> generate plans -> pick best one given statistics about the database.
I'm still researching information on those two engines, but if someone knows how they differ (or not) will be appreciated.
Not a comprehensive answer at all, but wanted to give you my insight. In short, Oracle has a much more developed SQL optimizer.
For starters, Oracle has much more algorithms to choose from. This means, sometimes Oracle distinguish between subtle differences and offer, let's say, three algorithms; MySQL (under the same circumstances) only has one to choose from. Therefore, Oracle has better options for particular cases.
Another difference is that MySQL's execution plans are not very readable. I'm not saying they are bad internally, just that the explain extended doesn't tell you many specifics. Oracle makes a very clear difference between access and filter predicates, while in MySQL you don't really know what's going on.
Oracle has many algorithms suitable for parallel processing in multi servers, while MySQL is limited to multiple thread in the same machine. This can make a difference for highly parallelisable queries than benefit for multi-servers.
Oracle still has a RBO (Rule-Based Optimizer) than can be useful on some occasions. MySQL doesn't. Anyway, Oracle recommends not to use it, but it's still there if you need it.
Oracle offers a myriad of "hints" to the optimizer in the form of comments (/* ... */ as far as I remember?) where you can tweak the execution plan to suit your needs. MySQL has fewer "clauses" for this.

Tuning OBIEE generated SELECT queries

We have our data marts/warehouse on Oracle 11g implemented as a star schema. Business reports are designed using OBIEE. I come from a ETL background and have very little knowledge in OBIEE.
Once the OBIEE RPD is designed, I see that OBIEE starts generating SELECT queries in the background to feed data into the reports. On many occasions, I have noticed that the SELECT queries are not optimized (big fact table is fully scanned more than once in separate WITH clauses).
When the report performance is bad, the OBIEE queries are sent to the ETL team for performance tuning. I'm confused about how I can tune them because they are auto generated. I know there is an option to write custom sql in OBIEE (without going via RPD) for each report, but our standards do not allow that and I also think it does not leverage the benefits of OBIEE.
Has anyone faced a problem like above? How to tune such queries?
Firstly, you're right that custom SQL (known as direct database query) is not a good idea in principle, though it is useful on occasion. But it's not the solution to your problem.
Tuning the OBI queries generated is an OBI RPD task, for the OBI developer; tuning the database for the OBI queries generated is a database/ETL task. But you can't really do one without the other – OBI needs to be designed so as to generate suitable queries, and the database needs to be designed in such a way that suitable good queries can be generated to answer the question being asked.
OBI is basically a SQL generator, and if the RPD model is bad suboptimal, then the resulting query will be bad suboptimal. OBI will generate SQL based on the information it has in the RPD about the layout and structure of the data and database.
You're obviously coming at it from the database side, and so to you the SQL is bad because it isn't what you'd write. It's also possible that the database design is bad for getting an answer to the question that OBI is being asked.
As jackohug says, OBIEE is a SQL generator, and the general aproach is to try to optimize the query generated by OBIEE, not try to change this query. Somehow, depending on the performance problem, you can try some tricks.
First all, is your table partioned and your reports can benefit from the partioning?
Second, add indexes on the fact table so any filter on the dimensions can benefit the access to the fact table.
Third, building agregate tables, resuming the fact table, so when reports don't show much detail you first access to the agregate table with much less data, and is only as the users drill down through structure (and while doing so, they are applying filters to the data they are interested in) that they access to the much detailed fact table but applying filters to avoid full scans.
You could also tell OBIEE to use hints when accessing to the table, although, as with Direct Database Query I wouldn't recommend it, I would try first optimizing using the first three aproaches.
Regards
if you have diagnostics and tuning pack licenses, you can run the SQL Tuning Advisor. The SQL Tuning Advisor is running the optimizer in tuning mode and it may be able to generate a SQL Profile with a better execution plan. Sometimes the advisor recommends indexes for tuning as well. Both SQL Profiles and indexes do not require a change to the application.
I've yet to have much success with the SQL tuning advisor. Some experience in SQL tuning and a bit of research can typically produce a far better plan.
If all the layers are built well and all you need is a final tweak then add a hidden column to the start of the report (Answer/Analysis) with a SQL hint.
I'd be very careful about adding hints through the RPD layers because of the many different and unexpected ways that others will join and use the tables.

Where to Execute? SSRS OR SQL

When I'm creating an SSRS report, I always have a dilemma about "how to create the report with the least generation time possible".
In general the generation time (or performance time) is divided into two main parts:
The SQL Query.
The Report components (expressions, groups etc.).
As you know some of the things that are being performed in SSRS can be done in the SQL query and vice versa.
For example:
I can use Group by clause in SQL, but can do the same when using a Table with Groups definition.
I can use Casting in order to compare two values in SQL and also directly inside an expression.
and many more...
My questions are:
A. Which part (SQL query or SSRS) costs more time (assuming that the task can be made in both SSRS and SQL) ?
B. What are the guidelines, if any, on which I should base a decision when having a dillema where to execute the given situation?
As always with performance issues:
Don't prematurely optimize. If something's simpler in SSRS then do it there. Only when a problem arises consider trading clarity for performance (possibly by moving code to the SQL side).
Measure. Use the ExecutionLog2 view to get a general idea about where your bottle-necks are. Do more measuring and testing so you're sure you're investing time in improving performance of bits that matter.
Bottom line: let clarity of code guide where you solve a particular problem, and optimize selectively when performance becomes an issue.
Eric Lippert wrote a nice blog post about when and how to worry about performance. The context is C#, but the basic idea holds for other situations such as SSRS/SQL as well.
By the way, if you have a look at mentioned ExecutionLog2 view, you'll notice that there's in fact three components in performance you should know about:
Data retrieval (SQL)
Report model (transforming the dataset to an internal model)
Rendering (transforming the model into an XLS, PDF, etc)
Knowing in which part a bottle-neck lies is key to knowing how to solve a performance problem.
To end with a suggestion based on my experience:
As a rule of thumb, prefer SQL over SSRS if you're worried about performance, especially for aggregation. Also consider tuning your database (indexes and such) if needed.
This rule of thumb would be best if I could back it up by facts and research. Alas, I don't have any. I can say that in my own experience, most often when I had performance problems with reports moving aggregation and calculation from SSRS to SQL would help in solving this issue.
It is important to remember that SSRS is smart and saves exception until you need it. If you are exporting then it will extract all data. Also if you are viewing on line and you have expand and collapse rows they will not be executed until you want to view it. On that theory a basic SQL is preferred.
It would be best to leave the aggregation to SSRS as the report will attempt to aggregate in any case in the tablix. As for charts it's best to aggregate unless you have a tablix as well.
As for simple calculations this should be done in the SQL, such as comparisons, etc.
Remember that SSRS is smarter than you ;) and the simplest SQL enables this service to work best and this service is primarily for display.
If you use a MS SQL server for the dataset the service will work at its best.

Powerful tools for creating SQL queries

I'm looking for a tool, which would help creating complex SQL queries. Sometimes it's difficult to even verify, whether the results of a query are correct. It's especially easy to get queries joining several tables to return too little or too much data.
The tool should enable at least creation of test tables, some kind of visualization how the queries gather their data and hopefully give better parsing of error cases than for example Oracle does.
Are there tools like this or do I have to stick with creating test tables manually, filling them with test data and commiting all kinds of queries with SQuirrel SQL?
When you have a very complex query it is usually easiest to validate by breaking it up into multiple queries that populate temp tables. These intermediary results can be individually verified and then you bring them together to produce the final result set. Depending on performance needs you can stick with the temp table approach or you can then rewrite to a single statement. Typically when I have a huge query it is for background processing so I stick with the temp table approach.
What RDBMS are you using? All of the major ones have some type of console available (e.g.-SSMS in SQL Server, Toad in Oracle, MySQL Query Browser/Administrator for MySQL, etc.), and they all have Query Execution Plans where you can see how the query will actually run. So, the answer to your question is that it's entirely dependent on what RDBMS you're using, but the safe bet answer is: Yes.
I recommend trying SQL Server 2008 Management Studio Express (SSMSE) if you are working with SQL Server. I have used it at work and I believe it does everything you are looking for.
You can get it and SQL Server (express editions) here.
Certainly not a free, open-source solution, but I believe Quest Software's TOAD will fit your requirements. Quest seems to offer alot of tools in that space...they have tools for modeling and analysis, however I've never used the modeler or analyzer.
I personally have experience with the commercial version of TOAD for Oracle. It's GUI is overwhelming at first, but after you mentally filter out all of the extra buttons that you'll never use, it's manageable.