I have a question regarding the SQL joins -
Whenever we join two different tables on some fields, what will happen exactly inside oracle which will result in the query output?
Does Oracle create/use a temporary table just for presenting the query output?
There is an overview of join mechanisms used in Oracle and a couple of Oracle wiki pages about join:
Cluster join
Hash join
Nested loops join
Sort merge join
The Cost-based optimizer documentation gives plenty of detail pertaining to access paths, how blocks of data are read, which scans are used etc.etc.
http://download.oracle.com/docs/cd/B10501_01/server.920/a96533/optimops.htm#35891
I don't think it will be temporary table, I guess it will table in the memory to speed up the operation.
If by "temporary table" you mean an Oracle global temporary table (GTT), the answer is No, Oracle never uses a GTT just for presenting the query output, but on the other hand, Yes, it might use a GTT for storing intermediate results depending on the query plan.
Related
I have written a very complicated query in Amazon Redshift which comprises of 3-4 temporary tables along with sub-queries.Since, Query is slow in execution, I tried to replace it with another query, which uses derived tables instead of temporary tables.
I just want to ask, Is there any way to compare the "Explain" Output for both the queries, so that we can conclude which query is working better in performance(both space and time).
Also, how much helpful is replacing temporary tables with derived tables in redshift ?
When Redshift generates it's own temporary tables (visible in the plan) then you may be able to tune the query by creating them as temporary tables yourself, specifying compression and adding distribution and sort keys that help with joins done on the table.
Very slow queries typically use a nested loop join style. The fastest join type is a merge join. If possible, rewrite the query or modify the tables to use merge join or at least hash join. Details here: https://docs.aws.amazon.com/redshift/latest/dg/query-performance-improvement-opportunities.html
Resources to better understand Redshift query planning and execution:
Query Planning And Execution Workflow:
https://docs.aws.amazon.com/redshift/latest/dg/c-query-planning.html
Reviewing Query Plan Steps:
https://docs.aws.amazon.com/redshift/latest/dg/reviewing-query-plan-steps.html
Mapping the Query Plan to the Query Summary:
https://docs.aws.amazon.com/redshift/latest/dg/query-plan-summary-map.html
Diagnostic Queries for Query Tuning:
https://docs.aws.amazon.com/redshift/latest/dg/diagnostic-queries-for-query-tuning.html
I have two tables. one is a small table and another one is a large table. While joining between two table, which table i will keep in left and which one in right so that the query optimiser will search quicker or it does not matter where i will join the table..
for example :
--1
SELECT smalltable.column1,
largetable.column1
FROM smalltable
INNER JOIN largetable
ON smalltable.column1 = largetable.column1 ;
--2
SELECT smalltable.column1,
largetable.column1
FROM smalltable
INNER JOIN largetable
ON largetable.column1 = smalltable.column1 ;
Which query will make it faster or it doesnot matter.
If you're talking about Microsoft SQL Server, both queries are equivalent to the query optimizer. In fact, to almost any cost-based query optimizer they'll be equivalent. You can try it by looking at the execucution plan (here for details http://www.simple-talk.com/sql/performance/execution-plan-basics/).
The query optimizer for most decent SQL Server variants will solve that. Some pritimitive ones dont (have a query optimizer - older MySQL, Access come to my mind). SOme may get overlaoded with complex decisions (this is simple).
But in general - trust the query optimizer first.
It should not matter which order you use, as your SQL Server should optimise the query execution for you. However, (if you are using Microsoft SQL Server) you could use SQL Server Profiler (found under the Tools menu of SQL Server Management Studio) to check the execution plans of both options.
If one of the tables is smaller that the other table.
Place the smaller table first and then the larger table as it will have less work to do and more over this will help the query optimizer to choose a plan that uses a Hash Join.
Then run the query profiler and check that the Hash join is used because this is the best and fastest in this scenario.
If there are no indexes on the joined tables then optimizer will select hash join.
You can force a Hash join by using OPTION (HASH JOIN) after inner join statement
From MSDN,http://blogs.msdn.com/b/irenak/archive/2006/03/24/559855.aspx
The column name that joins the table is called a hash key. In the example above, it’ll be au_id. SQL Server examines the two tables being joined, chooses the smaller table (so called build input), and builds a hash table applying a hash algorithm to the values of a hash key. Each row is inserted into a hash bucket depending on the hash value computed for the hash key. If build input is done completely in-memory, the hash join is called an “in-memory hash join”. If SQL Server doesn’t have enough memory to hold the entire build input, the process will be done in chunks, and is called “grace hash join”.
Before running both the queries,select'Include Actual Execution Plan' from the menu & then run the queries. The Sql server will show the execution plan which is the best tool to create the optimized queries. See more about Execution Plan here.
The order of the join columns does matter. See this post for more detail. Also there has been no discussion of indexing in this thread. It is the combination of optimal join table order AND useful indexing that results in the fastest executing queries.
what are the main purposes for which temporary tables are used? I want to know the practical and commercial uses of temporary tables in actual softwares working in small and large scale companies.
In my experience, temporary tables are often used to store intermediate calculations in a complex series of CREATE or UPDATE queries that produce some sort of analysis result. An example might be the creation of summary tables for an OLAP database.
Temporary tables are also sometimes used to increase performance, in certain situations.
I tend to prefer using CTEs (common table expressions) to temporary tables. There is a good discussion in answer to a similar question.
In general, Temporary tables are a way to store datasets (generally the result of a complex query) that need to be used further. Many times you can achieve this using subquery. But subquery drags performance. Temp tables help in improving performance by narrowing down the dataset.
Suppose you have a query that needs to join 7–8 tables. Then it's advisable to join the tables having the most minor rows and store their output in the temp table and then join this temp table to other more extensive tables.
It's a good idea to break your bigger joins into smaller ones. In one of the dev projects where I faced these scenarios, the temp table came to rescue me.
As opposed to your assumption Its not used as a draft for a table
whats the efficient query to count table rows in SQL server 2008
For an accurate answer, the sane way to get the exact nr of rows in a table in a normalized database is to execute:
select count(*) from table;
On a table without any indexes, the database will perform a full table scan. If you have indexed a non-null column, the database may use the (potentially much smaller) index to resolve your query.
If you need a faster solution, you have to keep track of the nr of rows in the table without resorting to counting the rows themselves. For example, you can do this yourself in another table (as part of your carefully designed API or by using triggers etc) or by creating an *indexed view for this purpose.
However, if you only need an aproximation, and it is critical that you get the answer fast, others have shown ways to do that by using the dictionary (John, Martin)
*Disclaimer: I haven't actually used indexed views in SQL server, I'm just reading the documentation and assume they can do the same thing as a materialized view in Oracle.
SELECT COUNT(*) FROM TableName
Does that work for you ?
SELECT COUNT(*) FROM table_name
will have a trivial plan. There is no scope to optimise this.
You can use WITH NOLOCK to reduce locking overhead though at the risk of some inaccuracy.
Or use
SELECT sum(rows)
FROM sys.partitions
where object_id=object_id('foo') and index_id < 2
However this also will not give you a transactionally consistent answer.
My question is not how to use inner join in sql. I know about how it matches between table a and table b.
I'd like to ask how is the internal working of inner working. What algorithm it involves? What happens internally when joining multiple tables?
There are different algorithms, depending on the DB server, indexes and data order (clustered PK), whether calculated values are joined or not etc.
Have a look at a query plan, which most SQL systems can create for a query, it should give you an idea what it does.
In MS Sql, different join algorithms will be used in different situations depending on the tables (their size, what sort of indexes are available, etc). I imagine other DB engines also use a variety of algorithms.
The main types of join used by Ms Sql are:
- Nested loops joins
- Merge joins
- Hash joins
You can read more about them on this page: Msdn -Advanced Query Tuning Concepts
If you get SQL to display the 'execution plan' for your queries you will be able to see what type of join is being used in different situations.
It depends on what database you're using, what you're joining (large/small, in sequence/random, indexed/non-indexed etc).
For example, SQL Server has several different join algorithms; loop joins, merge joins, hash joins. Which one is used is determined by the optimizer when it is working out an execution plan. Sometimes it makes a misjudgement and you can then force a specific join algorithm by using join hints.
You may find the following MSDN pages interesting:
http://msdn.microsoft.com/en-us/library/ms191318.aspx (loop)
http://msdn.microsoft.com/en-us/library/ms189313.aspx (hash)
http://msdn.microsoft.com/en-us/library/ms190967.aspx (merge)
http://msdn.microsoft.com/en-us/library/ms173815.aspx (hints)
In this case you should see how to saved data in b-tree after it i think you will understand JOIN algorithm.
All based set theory, been around a while.
Try not to link too many table at any one time, seems to conk out database resources with all the scanning. Indices help with performance, look at some sql sites and search on optimising sql queries to get some insight. SQL Management Studio has some inbuilt execution plan utility that's often interesting, especially for large complex queries.
The optimizer will (or should) choose the fastest join algo.
However there are two different kinds of determining what is fast:
You measure the time that it takes to return all the joined rows.
You measure the time that it takes to return the first joined rows.
If you want to return all the rows as fast as possible the optimizer will often choose a hash join or a merge join. If you want to return the first few rows as fast as possible the optimzer will choose a nested loops join.
It creates a Cartesian Product of the two tables and then selects the rows out of it. Read Korth book on Databases for the same.