SQL conditional INNER JOIN with two different tables - sql

I have a following simplified sub-query that should be executed first:
(SELECT app.recordId, left(userid,1) userid, count(*)FROM
app group by userid,recordId) y
Based on the result of this subquery, if userid='A' I would like to INNER JOIN with tableA, If userid='B', I'd like to INNER JOIN with tableB.
I also want the result of both inner joins to appear. Is there a way so that I can do this without re-executing the subquery to minimize the execution time?

One approach is to use left join twice, probably with coalesce():
SELECT y.*, COALESCE(a.col1, b.col1) as col1
FROM (SELECT app.recordId, left(userid, 1) as userid, count(*) as cnt
FROM app
GROUP BY recordId, left(userid, 1)
) y LEFT JOIN
tableA a
ON y.userId = 'A' and . . . LEFT JOIN
tableB b
ON y.userId = 'B' and . . .
WHERE y.userId IN ('A', 'B');
The . . . is space for additional join conditions, which the question does not specify.
EDIT:
If you want to filter out rows that have no matches (ala an "inner join" approach):
WHERE y.userId IN ('A', 'B') and not (a.col is null and b.col is null)
where a.col and b.col are two columns used for the join condition.

You didn't say how you want the columns from the tables A and B to be handled. We also don't know how the tables are related. So all I can give you is an example.
Let's say you want a description that either comes from table A or B:
select agg.userid, a_and_b.description, agg.recordid, agg.rec_count
from
(
select left(userid,1) as userid, recordid, count(*) as rec_count
from app
group by left(userid,1), recordid
) agg
join
(
select 'A' as userid, recordid, description from a
union all
select 'B' as userid, recordid, description from b
) a_and_b on a_and_b.userid = agg.userid and a_and_b.recordid = agg.recordid;

Related

oracle12c,sql,difference between count(*) and sum()

Tell me the difference between sql1 and sql2:
sql1:
select count(1)
from table_1 a
inner join table_2 b on a.key = b.key where a.id in (
select id from table_1 group by id having count(1) > 1
)
sql2:
select sum(a) from (
select count(1) as a
from table_1 a
inner join table_2 b on a.key = b.key group by a.id having count(1) > 1
)
Why is the output not the same?
The queries are not even similar. They are very different. Let's check the first one:
select count(1)
from table_1 a
inner join table_2 b
on a.key = b.key
where a.id in (
select id from table_1 group by id having count(1) > 1
) ;
You are first making an inner join:
select count(1)
from table_1 a
inner join table_2 b
on a.key = b.key
In this case, you can use count(1), count(id), count(*), it's equivalent. You are counting the common elements in both tables: those ones that have in common the key field.
After that, you are enforcing this:
where a.id in (
select id from table_1 group by id having count(1) > 1
)
In other words, that every "id" of the table_1 must be at least two times in the table_1 table.
And lastly, you are doing this:
select count(1)
In other words, counting those elements. So, translated into english you have done this:
get every record of table_1 and pair with records of table_2 for the id, and get only those that match
for the result above, filter out only the elements whose id of the table_1 appears more than one time
count that result
Let's see what happens with the second query:
select sum(a) from (
select count(1) as a
from table_1 a
inner join table_2 b
on a.key = b.key
group by a.id
having count(1) > 1
);
You are making the same inner join:
select count(1) as a
from table_1 a
inner join table_2 b
on a.key = b.key
but, you are grouping it by the id of the table:
group by a.id
and then filtering out only those elements who appear more than one time:
having count(1) > 1
The result so far are a set of records that have in common the key field in both tables, but grouped by the id: this means that only those fields that are at leas two times in the table_b are outputed of this join. After that, you group by id, collapsing those results into the table_1.id field and counting the result. I presume that very few records will match this strict criteria.
And lastly, you sum all those set.
When you use count(*) you count ALL the rows. The SUM() function is an aggregate function that returns the sum of all or distinct values in a set of values.

SUB query execution strategy in Oracle on a table with huge number of tuple

We have a table with good amount of rows(150,000+) and each row has to be selected based on a SUB query on another table. The values return by SUB query is independent to the column in this table. So, will oracle run the SUB query for every tuple ?
Example
TableZ
id,
location
TableA (150K+)
name,
id,
type
TableB
type,
color
Query
select * from TableZ
join
(select name, id, type from TableA where type is null or type in
(select type from TableB where color='red')
) tblA_RED on TableZ.id=tblA_RED.id
My question is how many time will the SUB query select type from TableB where color='red' execute ?
Typically a DB engine would process query (select type from TableB where color='red') only once and use the result to create an inline view equivalent for (select name, id, type from TableA where type is null or type in (select type from TableB where color='red') ) and finally execute the outer select by joining with TableZ.
You may want to add distinct in the query that picks up type from TableB like so
(select distinct type from TableB where color='red')
This may give slightly better performance
The specific answer to your question is that Oracle should evaluate the subquery only once.
However, your query is phrased with unnecessary subqueries. You can start with:
select z.*, a.name, a.id, a.type
from TableZ z join
TableA a
on z.id = a.id
where a.type in (select b.type from TableB b where b.color = 'red');
This is unlikely to affect performance, but it simplifies what you are doing. Next, TableB does not appear to have duplicate values, so I would suggest:
select z.*, a.name, a.id, a.type
from TableZ z join
TableA a
on z.id = a.id left join
TableB b
on b.type = a.type
where b.color = 'red' or a.type is null;
Phrasing the query as a join often gives the optimizer more choice -- and more choice often means faster queries.

MSSQL 2012 - Returning multiple columns in a subquery

I'd like to return multiple columns with a sub query.
E.G,
select a.name, a.age
from table1 a, ( select b.race, b.weight from table2 b where dateDiff(dd, b.date1, b.date2 ) < 30 )
where a.age > 24
Some of you have said "Just use a join" - I do not want the dateDiff in the subquery affecting the results of the parent query. Again, my real query is more complex then this but this should be sufficient in explaining my issue.
Use left join to do this, left join will return NULL values
SELECT a.name, b.score, ...
FROM (select id, name, ... from table1 where ???) a
LEFT JOIN (select id, score, ... from table2 where ???) b on (a.id = b.id)
WHERE clause

LEFT JOIN - How to join tables and include extra row even if you have right match

I have two tables
Table A
-------
ID
ProductName
Table B
-------
ID
ProductID
Size
I want to join these two tables
SELECT * FROM
(SELECT * FROM A)
LEFT JOIN
(SELECT * FROM B)
ON A.ID = B.ProductID
This is easy, I will get all rows from A multiplied by rows matched in B, and NULL fields if there is no match.
But here comes the tricky question, how can I get all rows from A with NULL fields for table B, even if there is a match, so I get an extra line with NULL values plus all the matches?
SELECT A.*
, B3.ID
, B3.ProductID
, B3.Size
FROM A
LEFT JOIN
(
SELECT ProductID as MatchID
, ID
, ProductID
, Size
FROM B
UNION ALL
SELECT ID
, null
, null
, null
FROM A A2
) B3
ON A.ID = B3.MatchID
Live example at SQL Fiddle.
Instead of using UNION ALL in a subquery as suggested by others, you could also (and I would) use UNION ALL at the outer level, which keeps the query simpler:
SELECT A.ID, A.ProductName, B.ID, B.Size
FROM A
INNER JOIN B
ON B.ProductID = A.ID
UNION ALL
SELECT A.ID, A.ProductName, NULL, NULL
FROM A
Since every join is going to be successful, we can switch to a full/inner join:
SELECT
*
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size FROM B
UNION ALL
SELECT NULL,ID,NULL FROM A) B
ON
A.ID = B.ProductID
Now would be a very good time to switch to naming columns explicitly, rather than using SELECT *
Or, if, as per #Andomar's comment, you need all of the B columns to be NULL:
SELECT
A.ID,A.ProductName,
B.ID,B.ProductID,B.Size
FROM
A
INNER JOIN
(SELECT ID,ProductID,Size,ProductID as MatchID FROM B
UNION ALL
SELECT NULL,NULL,NULL,ID FROM A) B
ON
A.ID = B.MatchID

SQL query to find record with ID not in another table

I have two tables with binding primary key in database and I desire to find a disjoint set between them. For example,
Table1 has columns (ID, Name) and sample data: (1 ,John), (2, Peter), (3, Mary)
Table2 has columns (ID, Address) and sample data: (1, address2), (2, address2)
So how do I create a SQL query so I can fetch the row with ID from table1 that is not in table2. In this case, (3, Mary) should be returned?
PS: The ID is the primary key for those two tables.
Try this
SELECT ID, Name
FROM Table1
WHERE ID NOT IN (SELECT ID FROM Table2)
Use LEFT JOIN
SELECT a.*
FROM table1 a
LEFT JOIN table2 b
on a.ID = b.ID
WHERE b.id IS NULL
There are basically 3 approaches to that: not exists, not in and left join / is null.
LEFT JOIN with IS NULL
SELECT l.*
FROM t_left l
LEFT JOIN
t_right r
ON r.value = l.value
WHERE r.value IS NULL
NOT IN
SELECT l.*
FROM t_left l
WHERE l.value NOT IN
(
SELECT value
FROM t_right r
)
NOT EXISTS
SELECT l.*
FROM t_left l
WHERE NOT EXISTS
(
SELECT NULL
FROM t_right r
WHERE r.value = l.value
)
Which one is better? The answer to this question might be better to be broken down to major specific RDBMS vendors. Generally speaking, one should avoid using select ... where ... in (select...) when the magnitude of number of records in the sub-query is unknown. Some vendors might limit the size. Oracle, for example, has a limit of 1,000. Best thing to do is to try all three and show the execution plan.
Specifically form PostgreSQL, execution plan of NOT EXISTS and LEFT JOIN / IS NULL are the same. I personally prefer the NOT EXISTS option because it shows better the intent. After all the semantic is that you want to find records in A that its pk do not exist in B.
Old but still gold, specific to PostgreSQL though: https://explainextended.com/2009/09/16/not-in-vs-not-exists-vs-left-join-is-null-postgresql/
Fast Alternative
I ran some tests (on postgres 9.5) using two tables with ~2M rows each. This query below performed at least 5* better than the other queries proposed:
-- Count
SELECT count(*) FROM (
(SELECT id FROM table1) EXCEPT (SELECT id FROM table2)
) t1_not_in_t2;
-- Get full row
SELECT table1.* FROM (
(SELECT id FROM table1) EXCEPT (SELECT id FROM table2)
) t1_not_in_t2 JOIN table1 ON t1_not_in_t2.id=table1.id;
Keeping in mind the points made in #John Woo's comment/link above, this is how I typically would handle it:
SELECT t1.ID, t1.Name
FROM Table1 t1
WHERE NOT EXISTS (
SELECT TOP 1 NULL
FROM Table2 t2
WHERE t1.ID = t2.ID
)
SELECT COUNT(ID) FROM tblA a
WHERE a.ID NOT IN (SELECT b.ID FROM tblB b) --For count
SELECT ID FROM tblA a
WHERE a.ID NOT IN (SELECT b.ID FROM tblB b) --For results