Is there any performance gain using a CTE over a derived table?
I've used CTEs a lot and it does actually appear to run faster in some scenarios. The server was fairly well loaded, the variation in times on runs was pretty significant, and I can't believe the execution plan was that different, but it still seemed like the ones with the CTE performed better.
From what I have read and my limited use of them, no, they are just easier to read and can reference themselves.
Related
So my question is rather simple, what's better and faster, doing calculations in code(let's say java) or just doing complex database queries(if we assume we can do one same action in both ways)? Which approach is better in general, and why?
I'd do in the code.
Doing business calculations in the queries in the DB gets the logic of the app spread and not easly understandable, plus you often get bound to specific storage (i.e. SQL Server/Oracle/MySql/etc) leaving the possibility to chage storage paradigma (i.e. to a NoSQL DB).
Then in the code you can apply some injection to change easly the behavior of your code, making it more manageable.
I generally find it faster (in development time) to write a query to do what I need. The first round focuses on logical correctness, and I'll complete the rest of the functionality using that query. I try to avoid doing queries in loops in this phase, but otherwise I don't worry too much about performance.
Then, I'll look at the performance of the whole feature. If the query is too slow, I'll EXPLAIN it and look at the server's statistics. My first focus is on indexes, if that doesn't work I'll try restructuring the query. Sometimes correlated subqueries are faster than joins, sometimes unions are faster than disjunctions in the WHERE clause, sometimes it's the other way around.
If I can't get satisfactory performance using a single query, I may break it up into multiple queries and/or do some of the work in code. Such code tends to be more complicated and longer than an equivalent query, so I try to avoid it unless necessary.
I'm having a query in COGNOS which would fetch me a huge volume of data. Since the execution time would be higher, I'd like to fine tune my query. Everyone knows that the WHERE clause in the query would get executed first.
My doubt is which would happen first when a query is executed?
The JOIN in the query would be established first or the WHERE clause would be executed first?
If JOIN is established first, I should specify the filters of the DIMENSION first else I should specify the filters of the FACT first.
Please explain me.
Thanks in advance.
The idea of SQL is that it is a high level declarative language, meaning you tell it what results you want rather than how to get them. There are exceptions to this in various SQL implementations such as hints in Oracle to use a specific index etc, but as a general rule this holds true.
Behind the scenes the optimiser for your RDBMS implements relational algebra to do a cost based estimate of the different potential execution plans and select the one that it predicts will be the most efficient. The great thing about this is that you do not need to worry what order you write your where clauses in etc, so long as all of the information is there the optimiser should pick the most efficient plan.
That being said there are often things that you can so on the database to improve query performance such as building indexes on columns in large tables that are often used in filtering criteria or joins.
Another consideration is whether you can use parallel hints to speed up your run time but this will depend on your query, the execution plan that is being used, the RDBMS you are using and the hardware it is running on.
If you post the query syntax and what RDBMS you are using we can check if there is anything obvious that could be amended in this case.
The order of filters definitely does not matter. The optimizer will take care of that.
As for filtering on the fact or dimension table - do you mean you are exposing the same field in your Cognos model for each (ex ProductID from both fact and Product dimension)? If so, that is not recommended. Generally speaking, you should expose the dimension field only.
This is more of a question about your SQL environment, though. I would export the SQL generated by Cognos from within Report Studio (Tools -> Show Generated SQL). From there, hopefully you are able to work with a good DBA to see if there are any obvious missing indexes, etc in your tables.
There's not a whole lot of options within Cognos to change the SQL generation. The prior poster mentions hints, which could work if writing native SQL, but that is a concept not known to Cognos. Really, all you can do is change the implict/explict join syntax which just controls whether the join happens in an ON statement or in the WHERE. Although the WHERE side is pretty ugly it generally compiles the same as ON.
I'm writing many reporting queries for my current employer utilizing Oracle's WITH clause to allow myself to create simple steps, each of which is a data-oriented transformation, that build upon each other to perform a complex task.
It was brought to my attention today that overuse of the WITH clause could have negative side effects on the Oracle server's resources.
Can anyone explain why over use of the Oracle WITH clause may cause a server to crash? Or point me to some articles where I can research appropriate use cases? I started using the WITH clause heavily to add structure to my code and make it easier to understand. I hope with some informative responses here I can continue to use it efficiently.
If an example query would be helpful I'll try to post one later today.
Thanks!
Based on this: http://www.dba-oracle.com/t_with_clause.htm it looks like this is a way to avoid using temporary tables. However, as others will note, this may actually mean heavier, more expensive queries that will put an additional drain on the database server.
It may not 'crash'. That's a bit dramatic. More likely it will just be slower, use more memory, etc. How that affects your company will depend on the amount of data, amount of processors, amount of processing (either using with or not)
I come across several instances where I can write a query with using both joins or sub queries. I usually use joins but sometimes use sub queries (without any reason). I have read in several places (including stackoverflow) that joins are faster than sub queries in many instances but sometimes subqueries are faster. Right now the queries I am writing does not deal with large amount of data so I guess the speed isn't much of a concern. But for future, I'm curious about the following.
a.) Why are joins faster than subqueries (in general).
b.) What are the instances when subqueries are faster. How will I know?
c.) If I'm writing a query, how should I judge whether I should use subquery or a join. I will appreciate if someone explains me with an example.
Saying that joins are 'mostly faster' than sub-queries is not true. This entirely depends an the DBMS used.
For Microsoft SQL Server I know that this is not true. Usually, the performance the the same. Not only in theory, but also in practice.
For MySQL I have heard that sub-queries are problematic. I don't have personal evidence.
Oracle seems to be about the same as SQL Server.
The answers to your questions.
a) Joins aren't faster then subqueries (in general). But often DBMSs produce a much smarter execution plan if you use joins. This is related two the procedure how queries are transformed into execution plans.
b) c) In general there are no rules for writing fast queries. Furthermore, there is only one way to choose the correct query for your task: You have to benchmark the different versions. So if you have to decide how to formulate a certain query benchmark the first and if it performs good, then stop. Else change something and benchmark it again and if it is fine, then stop. Use an environment that is close to your production environment: use realistic datasets. A query might perform well with thousands of records but not with millions. Use the same hardware as in production. Consider to benchmark the query in the context of your application, since other queries of these may influence the performance of it.
The main reason from the research I've done is that the compiler more directly utilizes the proper indexes when you explicitly state how to do the join (i.e. left join, inner join, etc.) If you use a sub-query, you are leaving it a bit up to the optimizer and it not always does the fastest way (which is retarded as its called an 'optimizer').
Anyway, it may be easier to write your sub-query, but if you are building a query for speed and long-term use, its clear that you should write out the explicit joins.
Here are some links with some views and examples:
Join vs. Subquery
Another link This ones gives some details why joins are faster (in most cases) than sub-queries.
more examples
How Important is it to avoid nested queries.
I have always learnt to avoid them like a plague. But they are the most natural thing to me. When I am designing a query, the first thing I write is a nested query. Then I convert it to joins, which sometimes takes a lot of time to get right. And rarely gives a big performance improvement (sometimes it does)
So are they really so bad. Is there a way to use nested queries without temp tables and filesort
It really depends, I had situations where I improved some queries by using subqueries.
The factors that I am aware are:
if the subquery uses fields from outer query for comparison or not (correlated or not)
if the relation between the outer query and sub query is covered by indexes
if there are no usable indexes on the joins and the subquery is not correlated and returns a small result it might be faster to use it
i have also run into situations where transforming a query that uses order by into a query that does not use it and than turning it into a simple subquery and sort that improves performance in mysql
Anyway, it is always good to test different variants (with SQL_NO_CACHE please), and turning correlated queries into joins is a good practice.
I would even go so far to call it a very useful practice.
It might be possible that if correlated queries are the first that come to your mind that you are not primarily thinking in terms of set operations, but primarily in terms of procedural operations and when dealing with relational databases it is very useful to fully adopt the set perspective on the data model and transformations on it.
EDIT:
Procedural vs Relational
Thinking in terms of set operations vs procedural boils down to equivalence in some set algebra expressions, for example selection on a union is equivalent to union of selections. There is no difference between the two.
But when you compare the two procedures, such as apply the selection criteria to every element of an union with make a union and then apply selection, the two are distinctly different procedures, which might have very different properties (for example utilization of CPU, I/O, memory).
The idea behind relational databases is that you do not try to describe how to get the result (procedure), but only what you want, and that the database management system will decide on the best path (procedure) to fulfil your request. This is why SQL is called 4th generation language (4GL).
One of the tricks that help you do that is to remind yourself that tuples have no inherent order (set elements are unordered).
Another is realizing that relational algebra is quite comprehensive and allows translation of requests (requirements) directly to SQL (if semantics of your model represent well the problem space, or in another words if meaning attached to the name of your tables and relationships is done right, or in another words if your database is designed well).
Therefore, you do not have to think how, only what.
In your case, it was just preference over correlated queries, so it might be that I am not telling you anything new, but you emphasized that point, hence the comment.
I think that if you were completely comfortable with all the rules that transform queries from one form into another (rules such as distributiveness) that you would not prefer correlated subqueries (that you would see all forms as equal).
(Note: above discusses theoretical background, important for database design; practically the above concepts deviate - not all equivalent rewrites of a query are necessarily executed as fast, clustered primary keys do make tables have inherit order on disk, etc... but these deviations are only deviations; the fact that not all equivalent queries execute as fast is an imperfection of the actual DBMS and not the concepts behind it)
Personally I prefer to avoid nested queries until they are necessary for the simple reason that nested queries can make the code less human readable and make debugging and collaboration more painful. I think nesting is acceptable if the nested query is something trivial or if temporary storage of large tables becomes an issue. But too many times I've seen complex nested queries within nested queries and it makes debugging painful.
I'm not sure how it looks like in MySQL 5.1 or 5.5, but in 5.0.x nested queries have usually horrible performance, because MySQL performs subquery for each row fetched from main query.
This probably isn't the case for more mature databases like MsSQL, which internally can rewrite nested queries to joins, but I've never used MsSQL so I don't know for sure.
http://dev.mysql.com/doc/refman/5.0/en/rewriting-subqueries.html
It is also true that on some occasions, it is not only possible to rewrite a query without a subquery, but it can be more efficient to make use of some of these techniques rather than to use subqueries. - which is rather funny statement, taking into account that for me so far all subqueries make database crawl.
Subqueries vs joins
I try to avoid nested queries because they're less readable.
But I do agree that they are easier to write. I mean, it's just easier to conceptualize when writing the code, IMO. But then, the deep nesting just makes reading the code very difficult. Make you you add some comment to tell the reader what the subquery is doing, so they don't need to read your subquery if they don't need to.
Also, as soon as it starts getting difficult to read, you might want to consider converting the subquery into a Common Table Expression. The conversion is easy to do, and also makes it much easier to read, since each CTE has a specific purpose.