I have a sql query, which hangs when changing the select statement from * to only one column. Where could it possibly hang? Isn't that supposed to result faster, since I request only 1 column, instead of 50?
select *
from table1 t1, table2 t2
where t1.id1 = t2.id2 and t2.columnX = :x
select t1.column1 from table1 t1, table2 t2 where t1.id1 = t2.id2 and t2.columnX = :x
p.s. the columns have indexes.
Regards
On the surface, it appears there should be no difference between the results. Start by comparing the EXPLAIN PLAN output for each query. If the cost is the same, then there's something else beyond the queries themselves that's at issue here. As #tbone states in the comment, it could be something as simple as caching.
Related
calling all sql experts. I have the following select statement:
SELECT 1
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id
WHERE t1.field = xyz
I'm a little bit worried about the performance here. Is the where clause evaluated before or after the join? If its evaluated after, is there way to first evaluate the where clause?
The whole table could easily contain more than a million entries but after the where clause it may be only 1-10 entries left so in my opinion it really is a big performance difference depending on when the where clause is evaluated.
Thanks in advance.
Dimi
You could rewrite your query like this:
SELECT 1
FROM (SELECT * FROM table1 WHERE field = xyz) t1
JOIN table2 t2 ON t1.id = t2.id
But depending on the database product the optimiser might still decide that the best way to do this is to JOIN table1 to table2 and then apply the constraint.
For this query:
SELECT 1
FROM table1 t1 JOIN
table2 t2
ON t1.id = t2.id
WHERE t1.field = xyz;
The optimal indexes are table1(field, id), table2(id).
How the query is executed depends on the optimizer. It is tasked with choosing the based execution plan, given the table statistics and environment.
Each DBMS has its own query optimizer. So by logic of things in case like yours WHERE will be executed first and then JOINpart of the query
As mentioned in the comments and other answers with performance the answer is always "it depends" depending on your dbms and the indexing of the base tables the query may be fine as is and the optimizer may evaluate the where first. Or the join may be efficient anyway if the indexes cover the join requirements.
Alternatively you can force the behavior you require by reducing the dataset of t1 before you do the join using a nested select as Richard suggested or adding the t1.field = xyz to the join for example
ON t1.field = xyz AND t1.id = t2.id
personally if i needed to reduce the dataset before the join I would use a cte
With T1 AS
(
SELECT * FROM table1
WHERE T1.Field = 'xyz'
)
SELECT 1
FROM T1
JOIN Table2 T2
ON T1.Id = T2.Id
For example, does the first query get processed different than the second query?
Query 1
SELECT t1.var1, t2.var2 FROM table1 t1
INNER JOIN table2 t2
ON t1.key = t2.key
WHERE t2.ID = 'ABCD'
Query 2
SELECT t1.var1, t2.var2 FROM table1 t1
INNER JOIN (
SELECT var2, key from table2
WHERE ID = 'ABCD'
) t2
ON t1.key = t2.key
WHERE t2.ID = 'ABCD'
At a glance, it seems as if the second query would be more efficient - table2 is reduced before the join begins, whereas the first query appears to join the tables first, then reduce later. I'm using teradata, if it matters.
Depends on vendor, version and configuration.
Teradata older version/legacy configuration might spool the sub-query as a first stage for Query 2 leading to reduced performance in comparison to Query 1 in depends with the table's' primary indexes and join algorithm.
I would suggest to avoid this kind of "optimization".
P.s.
Check if you get the same execution plan for both plans or different execution plans.
Check the query log for AMPCPUTime (for start)
I Want to query a table with two parameters in where condition and both the parameters are returned from another query.
Let us say a table t1 with column1 and column2, if I use a query select column1,column2 from t1,suppose it return 10 records
I want to query something like below
for(int j=0;j<10;j++)
{
select *
from t2
where t2.column1=t1.column1(jth position)
and t2.column2=t1.column2(jth position)
}
Lots of ways to accomplish it. I like using embedded queries (SQL Server). I am not sure you can do this in other SQL languages (I've never tried).
select *
from t2
where
t2.column1 = (select column1 from t1 where t1.id = j)
and t2.column2 = (select column2 from t1 where t1.id = j)
Please note you should look at this more like pseudo code. I made up a column id for the example but the point is that you do something like this, your embedded select statements must only return one result, not an array of results.
If you want to work with larger conditional data sets, you can use joins to accomplish the same thing:
select *
from t2
inner join t1 on t1.column1 = t2.column1
and t1.column2 = t2.column2
Hope this helps.
I have two tables, Table1 and Table2 and am trying to select values from Table1 based on values in Table2. I am currently writing my query as follows:
SELECT Value From Table1
WHERE
(Key1 in
(SELECT KEY1 FROM Table2 WHERE Foo = Bar))
AND
(Key2 in
(SELECT KEY2 FROM Table2 WHERE Foo = Bar))
This seems a very inefficent way to code the query, is there a better way to write this?
It depends on how the table(s) are indexed. And it depends on what SQL implementation you're using (SQL Server? MySq1? Oracle? MS Access? something else?). It also depends on table size (if the table(s) are small, a table scan may be faster than something more advanced). It matters, too, whether or not the indices are covering indices (meaning that the test can be satisfied with data in the index itself, rather than requiring an additional look-aside to fetch the corresponding data page.) Unless you look at the execution plan, you can't really say that technique X is "better" than technique Y.
However, in general, for this case, you're better off using correlated subqueries, thus:
select *
from table1 t1
where exists( select *
from table2 t2
where t2.key1 = t1.key1
)
and exists( select *
from table2 t2
where t2.key2 = t1.key2
)
A join is a possibility, too:
select t1.*
from table1 t1
join table2 t2a = t2a.key1 = t1.key1 ...
join table2 t2b = t2b.key2 = t1.key2 ...
though that will give you 1 row for every matching combination, though that can be alleviated by using the distinct keyword. It should be noted that a join is not necessarily more efficient than other techniques. Especially, if you have to use distinct as that requires additional work to ensure distinctness.
I have a table T1 with 60 rows and 5 columns: ID1, ID2, info1, info2, info3.
I have a table T2 with 1.2 million rows and another 5 columns: ID3, ID2, info4, info5, info6.
I want to get (ID1, ID2, info4, info5, info6) from all the rows where the ID2s match up. Currently my query looks like this:
SELECT T1.ID1, T2.ID2,
T2.info4, T2.info5, T2.info6
FROM T1, T2
WHERE T1.ID2 = T2.ID2;
This takes about 15 seconds to run. My question is - should it take that long, and if not, how can I speed it up? I figure it shouldn't since T1 is so small.
I asked PostgreSQL to EXPLAIN the query, and it says that it hashes T2, then hash joins that hash with T1. It seems hashing T2 is what takes so long. Is there any way to write the query so it doesn't have to hash T2? Or, is there a way to have it cache the hash of T2 so it doesn't re-do it? The tables will only be updated every few days.
If it makes a difference, T1 is a temporary table created earlier in the session.
It should not take that long :)
Creating an index on T2( ID2 ) should improve the performance of your query:
CREATE INDEX idx_t2_id2 ON t2 (id2);
May be using JOIN increase speed of query:
SELECT T1.ID1, T2.ID2,
T2.info4, T2.info5, T2.info6
FROM T1
JOIN T2 ON T2.ID2 = T1.ID2;
I don't know exactly but may be your query firstly join all row in both table, and after that apply WHERE conditions and it's problem.
And of course, as Peter Lang saw, you should create index.
First, a make a join.
SELECT T1.ID1, T2.ID2,
T2.info4, T2.info5, T2.info6
FROM T1
JOIN T2 ON T1.ID2 = T2.ID2;
Then try creating and index on T2.d2.
If not, if possible, you can add ID1 column to T2. Update it accordingly every few days as you claim. Then it just a simple query on T2 with no joins.
SELECT T2.ID1, T2.ID2,
T2.info4, T2.info5, T2.info6
FROM T2
WHERE T2.ID2 = A_VALUE;
Again, an index on T2.ID2 will be recommended.