SQL query inner join and where on the second table - sql

I have a oracle database and I'm trying to query data in table1 and inner join with another table2 where one of the columns(date) is equal to the most recent date and another column in table2(built) is equal to 'yes'. This query below is not picking up the where function and can't pinpoint why
SELECT id, b, c, d
FROM table1 a
INNER JOIN table2 b on b.id = a.id
WHERE b.date =(SELECT MAX(date) FROM table2) AND b.built = 'yes'
Actual query
SELECT m_tp_str, m_tp_trn, m_tp_dte, m_tp_buy, m_tp_qtyeq, m_tp_nom, m_instr,
m_tp_p, m_tp_status2
FROM HA_PRD_DM.TP_ALL_REP a INNER JOIN HA_PRD_DM.UDF_CURR_REP b
ON a.m_udf_ref2 = b.m_nb
WHERE b.m_rep_date2 = (SELECT MAX(c.m_rep_date2) FROM HA_PRD_DM.UDF_CURR_REP c)
AND b.m_purpose = 'yes'

You can do this using analytic functions:
SELECT id, b, c, d
FROM table1 a INNER JOIN
(SELECT b.*, MAX(date) OVER (PARTITION BY b.id) as max_date
FROM table2 b
WHERE built = 'yes'
) b
ON b.id = a.id AND b.max_date = b.date;

Related

SQL: Modifying Inner Join to Select One Row

I have two tables, A and B that I want to inner join on location. However, for each row in A, there are many rows in B whose location matches. I want to end up with at most the same number of rows as in A. Specifically, I want to take the row in B where date is earliest. Here's what I have so far:
SELECT *
FROM A
INNER JOIN B ON A.location = B.location
How would I modify this so that each row in A only gets joined with a single row in B (using the earliest date)?
Attempt:
SELECT *
FROM A
INNER JOIN B ON A.location = B.location
AND B.date = (SELECT MIN(date) FROM B)
Is that the right approach?
You can use the ANSI/ISO standard row_number() function:
SELECT *
FROM A INNER JOIN
(SELECT B.*, ROW_NUMBER() OVER (PARTITION BY B.location ORDER BY B.date) as seqnum
FROM B
) B
ON A.location = B.location AND seqnum = 1;
SELECT TOP(1) * FROM A
INNER JOIN B ON
A.LOCATION=B.LOCATION
ORDER BY B.DATE

SQL summations with multiple outer joins

I have tables a, b, c, and d whereby:
There are 0 or more b rows for each a row
There are 0 or more c rows for each a row
There are 0 or more d rows for each a row
If I try a query like the following:
SELECT a.id, SUM(b.debit), SUM(c.credit), SUM(d.other)
FROM a
LEFT JOIN b on a.id = b.a_id
LEFT JOIN c on a.id = c.a_id
LEFT JOIN d on a.id = d.a_id
GROUP BY a.id
I notice that I have created a cartesian product and therefore my sums are incorrect (much too large).
I see that there are other SO questions and answers, however I'm still not grasping how I can accomplish what I want to do in a single query. Is it possible in SQL to write a query which aggregates all of the following data:
SELECT a.id, SUM(b.debit)
FROM a
LEFT JOIN b on a.id = b.a_id
GROUP BY a.id
SELECT a.id, SUM(c.credit)
FROM a
LEFT JOIN c on a.id = c.a_id
GROUP BY a.id
SELECT a.id, SUM(d.other)
FROM a
LEFT JOIN d on a.id = d.a_id
GROUP BY a.id
in a single query?
Your analysis is correct. Unrelated JOIN create cartesian products.
You have to do the sums separately and then do a final addition. This is doable in one query and you have several options for that:
Sub-requests in your SELECT: SELECT a.id, (SELECT SUM(b.debit) FROM b WHERE b.a_id = a.id) + ...
CROSS APPLY with a similar query as the first bullet then SELECT a.id, b_sum + c_sum + d_sum
UNION ALL as you suggested with an outer SUM and GROUP BY on top of that.
LEFT JOIN to similar subqueries as above.
And probably more... The performance of the various solutions might be slightly different depending on how many rows in A you want to select.
SELECT a.ID, debit, credit, other
FROM a
LEFT JOIN (SELECT a_id, SUM(b.debit) as debit
FROM b
GROUP BY a_id) b ON a.ID = b.a_id
LEFT JOIN (SELECT a_id, SUM(b.credit) as credit
FROM c
GROUP BY a_id) c ON a.ID = c.a_id
LEFT JOIN (SELECT a_id, SUM(b.other) as other
FROM d
GROUP BY a_id) d ON a.ID = d.a_id
Can also be done with correlated subqueries:
SELECT a.id
, (SELECT SUM(debit) FROM b WHERE a.id = b.a_id)
, (SELECT SUM(credit) FROM c WHERE a.id = c.a_id)
, (SELECT SUM(other) FROM d WHERE a.id = d.a_id)
FROM a

sql - multiple layers of correlated subqueries

I have table A, B and C
I want to return all entries in table A that do not exist in table B and of that list do not exist in table C.
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
this gives me the first result of entries in A that are not in B. But now I want only those entries of this result that are also not in C.
I tried flavours of:
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
AND
where not exists (select 1 from table_C as c
where a.id = c.id)
But that isnt the correct logic. If there is a way to store the results from the first query and then select * from that result that are not existent in table C. But I'm not sure how to do that. I appreciate the help.
Try this:
select * from (
select a.*, b.id as b_id, c.id as c_id
from table_A as a
left outer join table_B as b on a.id = b.id
left outer join table_C as c on c.id = a.id
) T
where b_id is null
and c_id is null
Another implementation is this:
select a1.*
from table_A as a1
inner join (
select a.id from table_A
except
select b.id from table_B
except
select c.id from table_c
) as a2 on a1.id = a2.id
Note the restrictions on the form of the sub-query as described here. The second implementation, by most succinctly and clearly describing the desired operation to SQL Server, is likely to be the most efficient.
You have two WHERE clauses in (the external part of) your second query. That is not valid SQL. If you remove it, it should work as expected:
select * from table_A as a
where not exists (select 1 from table_B as b
where a.id = b.id)
AND
not exists (select 1 from table_C as c -- WHERE removed
where a.id = c.id) ;
Tested in SQL-Fiddle (thnx #Alexander)
how about using LEFT JOIN
SELECT a.*
FROM TableA a
LEFT JOIN TableB b
ON a.ID = b.ID
LEFT JOIN TableC c
ON a.ID = c.ID
WHERE b.ID IS NULL AND
c.ID IS NULL
SQLFiddle Demo
One more option with NOT EXISTS operator
SELECT *
FROM dbo.test71 a
WHERE NOT EXISTS(
SELECT 1
FROM (SELECT b.ID
FROM dbo.test72 b
UNION ALL
SELECT c.ID
FROM dbo.test73 c) x
WHERE a.ID = x.ID
)
Demo on SQLFiddle
Option from #ypercube.Thank for the present;)
SELECT *
FROM dbo.test71 a
WHERE NOT EXISTS(
SELECT 1
FROM dbo.test72 b
WHERE a.ID = b.ID
UNION ALL
SELECT 1
FROM dbo.test73 c
WHERE a.ID = c.ID
);
Demo on SQLFiddle
I do not like "not exists" but if for some reason it seems to be more logical to you; then you can use a alias for your first query. Subsequently, you can re apply another "not exists" clause. Something like:
SELECT * FROM
( select * from tableA as a
where not exists (select 1 from tableB as b
where a.id = b.id) )
AS A_NOT_IN_B
WHERE NOT EXISTS (
SELECT 1 FROM tableC as c
WHERE c.id = A_NOT_IN_B.id
)

Getting MIN date

I have a table(A) that looks something like:
ID Date
1 2012/01/12
2 2012/01/01
3 2012/01/03
4 2012/03/12
If I wanted to grab the MIN date for this query, would I just group by?
select
a.ID,
MIN(a.DATE),
b.name,
c.price
FROM
tablea a inner join tableb b on a.ID = b.ID
inner join tablec c b.ID = c.ID
You want a window function. The correct expression is:
select a.id,
min(a.date) over () as mindate,
b.name, c.price
. . .
This says to get the min of the date over the data. There is no partition, so it gets it over all the data.
If you are looking for those that had the minimum date, then you can do this:
select
a.ID,
a.DATE,
b.name,
c.price
FROM tablea a
INNER JOIN
(
SELECT Id, MIN(Date) AS MinDate
FROM tablea
GROUP BY Id
) As minA ON a.date = mina.mindate AND a.id = mina.id
inner join tableb b on a.ID = b.ID
inner join tablec c b.ID = c.ID
WITH recordList
as
(
select a.ID,
a.DATE,
b.name,
c.price,
DENSE_RANK() OVER (PARTITION BY a.ID
ORDER BY a.Date ASC) rn
FROM tablea a
inner join tableb b on a.ID = b.ID
inner join tablec c b.ID = c.ID
)
SELECT ID, DATE, name, Price
FROM recordList
WHERE rn = 1

Aliasing derived table which is a union of two selects

I can't get the syntax right for aliasing the derived table correctly:
SELECT * FROM
(SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1
I'm getting a Duplicate column name of B_id. Any suggestions?
The problem isn't the union, it's the select a.*, b.* in each of the inner select statements - since a and b both have B_id columns, that means you have two B_id cols in the result.
You can fix that by changing the selects to something like:
select a.*, b.col_1, b.col_2 -- repeat for columns of b you need
In general, I'd avoid using select table1.* in queries you're using from code (rather than just interactive queries). If someone adds a column to the table, various queries can suddenly stop working.
In your derived table, you are retrieving the column id that exists in table a and table b, so you need to choose one of them or give an alias to them:
SELECT * FROM
(SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1
First, you could use UNION ALL instead of UNION. The two subqueries will have no common rows because of the excluding condtion on a.flag.
Another way you could write it, is:
SELECT a.*, b.*
FROM a
INNER JOIN b
ON a.B_id = b.B_id
WHERE ( a.flag IS NULL
AND b.date < NOW()
)
OR
( a.flag IS NOT NULL
AND EXISTS
( SELECT *
FROM c
WHERE a.C_id = c.C_id
AND c.date < NOW()
)
)
ORDER BY RAND()
LIMIT 1