How can I improve the performance by joining two big tables and sort by 1st able unique index?
I need only 1st table data with sort by. without order query will performance so fast.
Here is the example of queries
select a.* from T1 a, T2 b where a.c1 = b.c2;
select a.* from T1 a, T2 b where a.c1 = b.c2 order by a.id;
just FYI, T1 and T2 have the proper index.
T1 table count is "54483938" T2 table count is "54483820"
I am more interest in T1 data with sort by which T1 Records exist on T2.
I tried using an in operator query, it took me into 300 sec.
You can try the three forms of the query:
join (which you have)
in (which you claim to have run)
exists
The exists version is:
select a.*
from T1 a
where exists (select 1 from T2 b where a.c1 = b.c2)
order by a.id;
For this query, I would recommend indexes on T1(id, c1) and T2(c2).
Related
I want to fetch all records with all columns of a table, Records which are not in the other 2 tables. Please help.
I have tried below query, it is working fine for comparing one column. But I want to compare 5 columns.
select * from A
WHERE NOT EXISTS
(select * from B b where b.id=a.id) AND
NOT EXISTS
(select * from C c where c.id=a.id)
A general solution might look like:
SELECT t1.*
FROM table1 t1
WHERE NOT EXISTS (SELECT 1 FROM table2 t2 WHERE t2.id = t1.t2_id) AND
NOT EXISTS (SELECT 1 FROM table3 t3 WHERE t3.id = t1.t3_id);
This assumes that you want to target table1 for records, ensuring that no matches can found in table2 and table3.
I prefer this approach:
SELECT t1.*
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t1.id = t2.t1_id
LEFT JOIN table3 AS t3
ON t1.id = t3.t1_id
WHERE t2.id IS NULL
AND t3.id IS NULL;
While this might be a bit less intuitive than using sub queries I think odds of making mistakes with aliases are less likely as in the example below, which happens more often than you might expect:
SELECT *
FROM table1
WHERE NOT EXISTS (SELECT 1 FROM table2 WHERE id = id) AND
NOT EXISTS (SELECT 1 FROM table3 WHERE id = id);
To your question about checks on 5 columns, that can still be done using either of these methods by adding conditions either in the left joins or in the where clause of each sub query.
I have two tables, t1 and t2, with identical columns(id, desc) and data. But one of the columns, desc, might have different data for the same primary key, id.
I want to select all those rows from these two tables such that t1.desc != t2.desc
select a.id, b.desc
FROM (SELECT * FROM t1 AS a
UNION ALL
SELECT * FROM t2 AS b)
WHERE a.desc != b.desc
For example, if t1 has (1,'aaa') and (2,'bbb') and t2 has(1,'aaa') and (2,'bbb1') then the new table should have (2,'bbb') and (2,'bbb1')
However, this does not seem to work. Please let me know where I am going wrong and what is the right way to do it right.
Union is not going to compare the data.You need Join here
SELECT *
FROM t1 AS a
inner join t2 AS b
on a.id =b.id
and a.desc != b.desc
UNION ALL dumps all rows of the second part of the query after the rows produced by the first part of the query. You cannot compare a's fields to b's, because they belong to different rows.
What you are probably trying to do is locating records of t1 with ids matching these of t2, but different description. This can be achieved by a JOIN:
SELECT a.id, b.desc
FROM t1 AS a
JOIN t2 AS b ON a.id = b.id
WHERE a.desc != b.desc
This way records of t1 with IDs matching records of t2 would end up on the same row of joined data, allowing you to do the comparison of descriptions for inequality.
I want both the rows to be selected is the descriptions are not equal
You can use UNION ALL between two sets of rows obtained through join, with tables switching places, like this:
SELECT a.id, b.desc -- t1 is a, t2 is b
FROM t1 AS a
JOIN t2 AS b ON a.id = b.id
WHERE a.desc != b.desc
UNION ALL
SELECT a.id, b.desc -- t1 is b, t2 is a
FROM t2 AS a
JOIN t1 AS b ON a.id = b.id
WHERE a.desc != b.desc
The UNION operator is used to combine the result-set of two or more SELECT statements.
Notice that each SELECT statement within the UNION must have the same number of columns. The columns must also have similar data types.
So, if it has same number of columns and same datatype, then use Union otherwise join only Can be used.
SELECT *
FROM t1 AS a
inner join t2 AS b
on a.id =b.id
and a.desc != b.desc
I have a table t1. It has columns [id] and [id2].
Select count(*) from t1 where id=1;
returns 31,189 records
Select count(*) from t1 where id=2;
returns 31,173 records
I want to know the records where id2 is in id=1 but not in id=2.
So, I use the following:
Select * from t1 a left join t1 b on a.id2=b.id2
Where a.id=2 And b.id=1
And b.id2 Is Null;
It returns zero records.
Using an inner join to see how many records have id2 in common, I do...
Select * from t1 a inner join t1 b on a.id2=b.id2
Where a.id=2 And b.id=1;
And that returns 31,060. So where are the extra records in my first query that don't match?
I am sure I must be missing something obvious.
Sample Data
id id2
1 101
1 102
1 103
2 101
2 102
My expected results is to find the record with '103' in it. 'id2' not shared.
Thanks for any help.
Jeff
You are attempting to do what is generally called an exclude join. This involves doing a LEFT JOIN between two tables, then using a WHERE clause to only select rows where the right table is null, i.e. there was no record to join. In this way, you select everything from the left table except what exists in the right table.
With this data, it would look something like this:
SELECT
t1.id,
t1.id2
FROM test_table t1
LEFT JOIN
(SELECT
id,
id2
FROM test_table
WHERE id = 2) t2
ON t2.id2 = t1.id2
WHERE t1.id = 1
AND t2.id IS NULL --This is what makes the exclude join happen
And here is a SQLFiddle demonstrating this in MySQL 5.7 with the sample data you provided.
I think maybe Access changes the left join to an inner join when you add a where clause to filter rows (I know SQL Server does this), but if you do the filtering in derived tables it should work:
select
a.*
from
(select * from t1 where id = 1) a
left join
(select * from t1 where id = 2) b
on a.id2 = b.id2
where b.id2 is null
I have searched but have not found a definitive answer. Which of these is better for performance in SQL Server:
SELECT T.*
FROM dbo.Table1 T
LEFT JOIN Table2 T2 ON T.ID = T2.Table1ID
LEFT JOIN Table3 T3 ON T.ID = T3.Table1ID
WHERE T2.Table1ID IS NOT NULL
OR T3.Table1ID IS NOT NULL
or...
SELECT T.*
FROM dbo.Table1 T
JOIN Table2 T2 ON T.ID = T2.Table1ID
UNION
SELECT T.*
FROM dbo.Table1 T
JOIN Table3 T3 ON T.ID = T3.Table1ID
I have tried running both but it's hard to tell for sure. I'd appreciate an explanation of why one is faster than the other, or if it depends on the situation.
Your two queries do not do the same things. In particular, the first will return duplicate rows if values are duplicated in either table.
If you are looking for rows in Table1 that are in either of the other two tables, I would suggest using exists:
select t1.*
from Table1 t1
where exists (select 1 from Table2 t2 where t2.Table1Id = t1.id) or
exists (select 1 from Table3 t3 where t3.Table1Id = t1.id);
And, create indexes on Table1Id in both Table2 and Table3.
Which of your original queries is faster depends a lot on the data. The second has an extra step to remove duplicates (union verses union all). On the other hand, the first might end up creating many duplicate rows.
For example1:
select T1.*, T2.*
from TABLE1 T1, TABLE2 T2
where T1.id = T2.id
and T1.name = 'foo'
and T2.name = 'bar';
That will first join T1 and T2 together by id, then select the records that satisfy the name conditions?
Or select the records that satisfy the name condition in T1 or T2, then join those together?
And, Is there a difference in performance between example1 and example2(DB2)?
example2:
select *
from
(
select * from TABLE1 T1 where T1.name = 'foo'
) A,
(
select * from TABLE2 T2 where T2.name = 'bar'
) B
where A.id = B.id;
How the query will be executed depends on what the query planner does with it. Depending on the available indexes and how much data is in the tables the query plan may look different. The planner tries to do the work in the order that it thinks is most efficient.
If the planner does a good job, the plan for both queries should be the same, otherwise the first query is likely to be faster because the second would create two intermediate results that doesn't have any indexes.
Exemple 1 is more efficient because it has no embedded queries. About how the result set is build, I have no idea - I don't know DB2.