I have two tables, A and B that I want to inner join on location. However, for each row in A, there are many rows in B whose location matches. I want to end up with at most the same number of rows as in A. Specifically, I want to take the row in B where date is earliest. Here's what I have so far:
SELECT *
FROM A
INNER JOIN B ON A.location = B.location
How would I modify this so that each row in A only gets joined with a single row in B (using the earliest date)?
Attempt:
SELECT *
FROM A
INNER JOIN B ON A.location = B.location
AND B.date = (SELECT MIN(date) FROM B)
Is that the right approach?
You can use the ANSI/ISO standard row_number() function:
SELECT *
FROM A INNER JOIN
(SELECT B.*, ROW_NUMBER() OVER (PARTITION BY B.location ORDER BY B.date) as seqnum
FROM B
) B
ON A.location = B.location AND seqnum = 1;
SELECT TOP(1) * FROM A
INNER JOIN B ON
A.LOCATION=B.LOCATION
ORDER BY B.DATE
Related
I have two tables A (group_id, id, subject) and B (id, date). Below is the joint table of tables A and B on id. I have tried using distinct and partition to remove the duplicates in group_id(field) only, but no luck:
My code:
select
a.group_id, a.id, a.subject, b.date
from
A a
inner join
(select
b.*,
row_number() over (partition by group_id order by date asc) as seqnum
from
B b) b on a.id = b.id and seqnum = 1
order by
date desc;
I got this error when I ran the code:
Partitioning can not be used stand-alone in query near 'partition by group_id order by date asc) as seqnum from B' at line 1
This is my expected result:
Thank you in advance!
It looks like you want the earliest date for each row in the table you show. Your question mentions two tables, but you only show one.
I recommend a correlated subquery in most databases:
select b.*
from b
where b.date = (select min(b2.date)
from b b2
where b2.group_id = b.group_id
);
I see. You need to join first and then use row_number():
select ab.*
from (select a.group_id, a.id, a.subject, b.date,
row_number() over (partition by a.group_id order by b.date) as seqnum
from A a join
B b
on a.id = b.id
) ab
where seqnum = 1
order by date desc;
You are almost there. But the column that you try to use to partition (ie group_id) comes from table a, which is not available in the subquery.
You would need to JOIN and assign the row number in a subquery, and then filter in the outer query.
select *
from (
select
a.group_id,
a.id,
a.subject,
b.date,
row_number() over (partition by a.group_id order by b.date asc) as seqnum
from a
inner join b on ON a.id = b.id
)
where seqnum = 1
ORDER BY date desc;
Another way to achieve your goal though it may not be the efficient one
SELECT
A.group_id, A.id, B.Date, A.subject
FROM A
INNER JOIN B
ON A.Id = B.Id
INNER JOIN
(
SELECT
A.Group_id, MIN(B.Date) AS Date
FROM A
INNER JOIN B
ON A.Id = B.Id
GROUP BY A.group_id
) AS supportTable
ON A.group_id = supportTable.group_id
AND B.Date = supportTable.Date
I have a oracle database and I'm trying to query data in table1 and inner join with another table2 where one of the columns(date) is equal to the most recent date and another column in table2(built) is equal to 'yes'. This query below is not picking up the where function and can't pinpoint why
SELECT id, b, c, d
FROM table1 a
INNER JOIN table2 b on b.id = a.id
WHERE b.date =(SELECT MAX(date) FROM table2) AND b.built = 'yes'
Actual query
SELECT m_tp_str, m_tp_trn, m_tp_dte, m_tp_buy, m_tp_qtyeq, m_tp_nom, m_instr,
m_tp_p, m_tp_status2
FROM HA_PRD_DM.TP_ALL_REP a INNER JOIN HA_PRD_DM.UDF_CURR_REP b
ON a.m_udf_ref2 = b.m_nb
WHERE b.m_rep_date2 = (SELECT MAX(c.m_rep_date2) FROM HA_PRD_DM.UDF_CURR_REP c)
AND b.m_purpose = 'yes'
You can do this using analytic functions:
SELECT id, b, c, d
FROM table1 a INNER JOIN
(SELECT b.*, MAX(date) OVER (PARTITION BY b.id) as max_date
FROM table2 b
WHERE built = 'yes'
) b
ON b.id = a.id AND b.max_date = b.date;
I have 3 tables like
A. (Aid,person)
B. (Bid,event,InsertDate)
C. (Cid,Aid,Bid)
now I need to get last recent event base on B.InsertDate desc from joined B.event and A.Person (last event of each person)
I tried join but that make multiple A.Person and B.event.
Can you guide me please?
Update:
for now I just add another lastUpdate column to person table and update that for each insert in Event table and made that equal to InsertDate. so my query is like:
SELECT
A.person, B.event
from tableA A
join tableC C
on A.Aid = C.Aid
join tableB B
on B.Bid = C.Bid and B.InsertDate = A.lastUpdate
Try this:
SELECT
A.person,
B.event,
MAX(B.InsertDate)
FROM
A
JOIN C ON A.Aid = C.Aid
JOIN B ON B.Bid = C.Bid
GROUP BY
A.person,
B.event
May be something lyk this
WITH cte
AS (SELECT Row_number() OVER (partition BY person ORDER BY B.InsertDate DESC) rn,
A.person,
B.event,
B.InsertDate AS LastEventDate
FROM B
JOIN (SELECT B.event,
Max(B.InsertDate) InsertDate
FROM B
GROUP BY B.event) sub
ON sub.event = B.event
AND sub.InsertDate = b.InsertDate
JOIN A
ON A.Bid = C.Bid
JOIN C
ON B.Bid = C.Bid)
SELECT *
FROM cte
WHERE rn = 1
I have tables a, b, c, and d whereby:
There are 0 or more b rows for each a row
There are 0 or more c rows for each a row
There are 0 or more d rows for each a row
If I try a query like the following:
SELECT a.id, SUM(b.debit), SUM(c.credit), SUM(d.other)
FROM a
LEFT JOIN b on a.id = b.a_id
LEFT JOIN c on a.id = c.a_id
LEFT JOIN d on a.id = d.a_id
GROUP BY a.id
I notice that I have created a cartesian product and therefore my sums are incorrect (much too large).
I see that there are other SO questions and answers, however I'm still not grasping how I can accomplish what I want to do in a single query. Is it possible in SQL to write a query which aggregates all of the following data:
SELECT a.id, SUM(b.debit)
FROM a
LEFT JOIN b on a.id = b.a_id
GROUP BY a.id
SELECT a.id, SUM(c.credit)
FROM a
LEFT JOIN c on a.id = c.a_id
GROUP BY a.id
SELECT a.id, SUM(d.other)
FROM a
LEFT JOIN d on a.id = d.a_id
GROUP BY a.id
in a single query?
Your analysis is correct. Unrelated JOIN create cartesian products.
You have to do the sums separately and then do a final addition. This is doable in one query and you have several options for that:
Sub-requests in your SELECT: SELECT a.id, (SELECT SUM(b.debit) FROM b WHERE b.a_id = a.id) + ...
CROSS APPLY with a similar query as the first bullet then SELECT a.id, b_sum + c_sum + d_sum
UNION ALL as you suggested with an outer SUM and GROUP BY on top of that.
LEFT JOIN to similar subqueries as above.
And probably more... The performance of the various solutions might be slightly different depending on how many rows in A you want to select.
SELECT a.ID, debit, credit, other
FROM a
LEFT JOIN (SELECT a_id, SUM(b.debit) as debit
FROM b
GROUP BY a_id) b ON a.ID = b.a_id
LEFT JOIN (SELECT a_id, SUM(b.credit) as credit
FROM c
GROUP BY a_id) c ON a.ID = c.a_id
LEFT JOIN (SELECT a_id, SUM(b.other) as other
FROM d
GROUP BY a_id) d ON a.ID = d.a_id
Can also be done with correlated subqueries:
SELECT a.id
, (SELECT SUM(debit) FROM b WHERE a.id = b.a_id)
, (SELECT SUM(credit) FROM c WHERE a.id = c.a_id)
, (SELECT SUM(other) FROM d WHERE a.id = d.a_id)
FROM a
I can't get the syntax right for aliasing the derived table correctly:
SELECT * FROM
(SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1
I'm getting a Duplicate column name of B_id. Any suggestions?
The problem isn't the union, it's the select a.*, b.* in each of the inner select statements - since a and b both have B_id columns, that means you have two B_id cols in the result.
You can fix that by changing the selects to something like:
select a.*, b.col_1, b.col_2 -- repeat for columns of b you need
In general, I'd avoid using select table1.* in queries you're using from code (rather than just interactive queries). If someone adds a column to the table, various queries can suddenly stop working.
In your derived table, you are retrieving the column id that exists in table a and table b, so you need to choose one of them or give an alias to them:
SELECT * FROM
(SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1
First, you could use UNION ALL instead of UNION. The two subqueries will have no common rows because of the excluding condtion on a.flag.
Another way you could write it, is:
SELECT a.*, b.*
FROM a
INNER JOIN b
ON a.B_id = b.B_id
WHERE ( a.flag IS NULL
AND b.date < NOW()
)
OR
( a.flag IS NOT NULL
AND EXISTS
( SELECT *
FROM c
WHERE a.C_id = c.C_id
AND c.date < NOW()
)
)
ORDER BY RAND()
LIMIT 1