MAX plus other data with JOIN - sql

This should be a simple one I reckon (at least I thought it would be when I started doing it a couple hours ago).
I am trying to select the MAX value from one table and join it with another to get the pertinent data.
I have two tables: ACCOUNTS & ACCOUNT_BALANCES
Here they are:
ACCOUNTS
ACC_ID | NAME | IMG_LOCATION
------------------------------------
0 | Cash | images/cash.png
500 | MyBank | images/mybank.png
and
ACCOUNT_BALANCES
ACC_ID | BALANCE | UPDATE_DATE
-------------------------------
500 | 100 | 2017-11-10
500 | 250 | 2018-01-11
0 | 100 | 2018-01-05
I would like the end result to look like:
ACC_ID | NAME | IMG_LOCATION | BALANCE | UPDATE_DATE
----------------------------------------------------------------
0 | Cash | images/cash.png | 100 | 2018-01-05
500 | MyBank | images/mybank.png | 250 | 2018-01-11
I thought I could select the MAX(UPDATE_DATE) from the ACCOUNT_BALANCES table, and join with the ACCOUNTS table to get the account name (as displayed above), but having to group by means my end result includes all records from the ACCOUNT_BALANCES table.
I can use this query to select only the records from ACCOUNT_BALANCES with the max UPDATE_DATE, but I can't include the balance.
SELECT
a.ACC_ID,
a.IMG_LOCATION,
a.NAME,
x.UDATE
FROM
ACCOUNTS a
RIGHT JOIN
(
SELECT
b.ACC_ID,
MAX(b.UPDATE_DATE) as UDATE
FROM
ACCOUNT_BALANCES b
GROUP BY
b.ACC_ID
) x
ON
a.ACC_ID = x.ACC_ID
If I include ACCOUNT_BALANCES.BALANCE in the above query (like so):
SELECT
a.ACC_ID,
a.IMG_LOCATION,
a.NAME,
x.UDATE
FROM
ACCOUNTS a
RIGHT JOIN
(
SELECT
b.ACC_ID,
b.BALANCE,
MAX(b.UPDATE_DATE) as UDATE
FROM
ACCOUNT_BALANCES b
GROUP BY
b.ACC_ID, b.BALANCE
) x
ON
a.ACC_ID = x.ACC_ID
The results returned looks like this:
ACC_ID | NAME | IMG_LOCATION | BALANCE | UPDATE_DATE
----------------------------------------------------------------
0 | Cash | images/cash.png | 100 | 2018-01-05
500 | MyBank | images/mybank.png | 100 | 2018-01-11
500 | MyBank | images/mybank.png | 250 | 2018-01-11
Which is obviously the case, since I'm grouping by BALANCE in the subquery.
I'm super hesitant to post this, as this seems exactly the kind of questions that's been answered n times, but I've searched around a lot, but couldn't find anything that really helped me.
This one has a really good answer, but isn't exactly what I'm looking for
I'm obviously missing something really simple and even pointers in the right direction will help. Thank you.

Try This
;WITH CTE
AS
(
SELECT
RN = ROW_NUMBER() OVER(PARTITION BY AC.ACC_ID ORDER BY AB.UPDATE_DATE DESC),
AC.ACC_ID,
NAME,
IMG_LOCATION,
BALANCE,
UPDATE_DATE
FROM ACCOUNTS AC
INNER JOIN ACCOUNT_BALANCES AB
ON AC.ACC_ID = AB.ACC_ID
)
SELECT
*
FROM CTE
WHERE RN = 1

I think the simplest method is outer apply:
select a.*, ab.*
from accounts a outer apply
(select top 1 ab.*
from account_balances ab
where ab.acc_id = a.acc_id
order by ab.update_date desc
) ab;
apply implements what is technically known as a "lateral join". This is a very powerful type of join -- something like a generalization of correlated subqueries.

The accepted answer will work. However, for large data sets try to avoid using the row_number() function. Instead, you can try something like this (assuming update_date is part of a unique constraint):
SELECT
a.Acc_Id,
a.Name,
a.Img_Location,
bDetail.Balance,
bDetail.Update_Date
FROM
#accounts AS a LEFT JOIN
(
SELECT Acc_Id, MAX(Update_Date) AS Update_Date
FROM #account_balances AS b
GROUP BY Acc_Id
) AS maxDate ON a.Acc_Id = maxDate.Acc_Id
LEFT JOIN #account_balances AS bDetail ON maxDate.Acc_Id = bDetail.Acc_Id AND
maxDate.Update_Date = bDetail.Update_Date

Related

Running Total OVER clause, but Select Distinct instead of sum?

I have the following data set:
| EMAIL | SIGNUP_DATE |
| A#ABC.COM | 1/1/2021 |
| B#ABC.COM | 1/2/2021 |
| C#ABC.COM | 1/3/2021 |
In order to find the running total of email signups as of a certain day, I ran the following sql query:
select
signup_date,
count(email) OVER (order by signup_date ASC) as running_total_signups
I got the following results:
| SIGNUP_DATE | RUNNING_TOTAL_SIGNUPS |
| 1/1/21 | 1 |
| 1/2/21 | 2 |
| 1/3/21 | 3 |
However for my next step, I want to be able to see not just the running total signups, but the actual signup names themselves. Therefore I want to run the same window function (count(email) OVER (order by signup_date ASC)) but instead of a count(email) just a select distinct email. This would hopefully result in the following output:
| SIGNUP_DATE | RUNNING_TOTAL_SIGNUPS |
| 1/1/21 | a#abc.com |
| 1/2/21 | a#abc.com |
| 1/2/21 | b#abc.com |
| 1/3/21 | a#abc.com |
| 1/3/21 | b#abc.com |
| 1/3/21 | c#abc.com |
How would I do this? I'm getting an error on this code:
select
signup_date,
distinct email OVER (order by signup_date ASC) as running_total_signups
One way would be to cross-join the results and filter the joined table having a total <= to the running total:
with counts as (
select *,
Count(*) over (order by SIGNUP_DATE asc) as tot
from t
)
select c1.EMAIL, c1.SIGNUP_DATE
from counts c1
cross join counts c2
where c2.tot <= c1.tot
I want to run the same window function (count(email) OVER (order by
signup_date ASC)) but instead of a count(email) just a select distinct
email
Why do you want COUNT() window function?
It has nothing to do with with your reqirement.
All you need is a simple self join:
SELECT t1.SIGNUP_DATE, t2.EMAIL
FROM tablename t1 INNER JOIN tablename t2
ON t2.SIGNUP_DATE <= t1.SIGNUP_DATE
ORDER BY t1.SIGNUP_DATE, t2.EMAIL;
which will work for your sample data, but just in case there are more than 1 rows for each day in your table you should use:
SELECT t1.SIGNUP_DATE, t2.EMAIL
FROM (SELECT DISTINCT SIGNUP_DATE FROM tablename) t1 INNER JOIN tablename t2
ON t2.SIGNUP_DATE <= t1.SIGNUP_DATE
ORDER BY t1.SIGNUP_DATE, t2.EMAIL;
See the demo.
It's actually slightly simpler than Stu proposed:
select
x2.signup_date,
x1.email
from
signups x1
INNER JOIN signups x2 ON x1.signup_date <= x2.signup_date
order by signup_date
If you join the table to itself but for any date that is less than or equal to, it causes a half cartesian explosion. The lowest dated row matches with only itself. The next one matches with itself and the earlier one, so one of the table aliases has its data repeated.. This continues adding more rows to the explosion as the dates increase:
In this resultset we can see we want the emails from x1, and the dates from x2

Join the records with sql?

I have a table with those 5 rows.
code | type_id | status
-----+--------+--------
123 | 123456 | DONE
123 | 456789 | DONE
321 | 654321 | DONE
321 | 897321 | DONE
456 | 999888 | DONE
456 | 777666 | FAIL
And I want to change it to below with DONE only.
code | type_id1 | type_id2
-----+----------+---------
123 | 123456 | 456789
321 | 654321 | 897321
456 | 999888 | null
How can I join them to show the result?
You can use aggregation
select
code,
min(type) type1,
case when count(*) > 1 then max(type) end type2
from mytable
group by code
Note that this only works as expected if a code has 1 or 2 types.
If I understand correctly that you want one row per code, you can use aggregation:
select code,
min(type_id) as type_id1,
(case when min(type_id) <> max(type_id) then max(type_id) end) as type_id2
from t
where status = 'DONE'
group by code;
Note that SQL tables represent unordered sets. With your sample data, there is no way to preserve "the original" order of the values, because that is undefined -- unless another column specifies that ordering.
You can make a LEFT JOIN, for example:
SELECT
A.code,
A.type_id,
B.type_id
FROM table A
LEFT JOIN
table B ON A.code = B.code AND A.type_id <> B.type_id AND A.status = B.status
WHERE A.status = 'DONE'
You can use cte as (select code, type_id, status, row_number() over(partition by code order by code) as rank from table)
select * from cte where rank =1 and status ='Done'

How do I join to another table and return only the most recent matching row?

I have a table that stores the lines on a contract. Each contract line his it's own unique ID, it also has the ID of its parent contract. Example:
+-------------+---------+
| contract_id | line_id |
+-------------+---------+
| 1111 | 100 |
| 1111 | 101 |
| 1111 | 102 |
+-------------+---------+
I have another table that stores the historical changes to contract lines. For example, every time the number of units on a contract line is changed a new row is added to the table. Example:
+-------------+---------+--------------+-------+
| contract_id | line_id | date_changed | units |
+-------------+---------+--------------+-------+
| 1111 | 100 | 2016-01-01 | 1 |
| 1111 | 100 | 2016-02-01 | 2 |
| 1111 | 100 | 2016-03-01 | 3 |
+-------------+---------+--------------+-------+
As you can see the contract line with ID 100 belonging to the contract with ID 1111 has been edited 3 times over 3 months. The current value is 3 units.
I'm running a query against the contract lines table to select all data. I want to join to the historical data table and select the most recent row for each contract line and show the units in my results. How do I do this?
Expected results (there would single results for 101 and 102 as well):
+-------------+---------+-------+
| contract_id | line_id | units |
+-------------+---------+-------+
| 1111 | 100 | 3 |
+-------------+---------+-------+
I've tried the query below with a left join but it returns 3 rows instead of 1.
Query:
SELECT *, T1.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, units, MAX(date_changed) AS maxdate
FROM contract_history
GROUP BY contract_id, line_id, units) AS T1
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
Actual results:
+-------------+---------+-------+
| contract_id | line_id | units |
+-------------+---------+-------+
| 1111 | 100 | 1 |
| 1111 | 100 | 2 |
| 1111 | 100 | 3 |
+-------------+---------+-------+
An extra join to contract_history along with maxdate will work
SELECT contract_lines.*,T2.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, MAX(date_changed) AS maxdate
FROM contract_history
GROUP BY contract_id, line_id) AS T1
JOIN contract_history T2 ON
T1.contract_id=T2.contract_id and
T1.line_id= T2.line_id and
T1.maxdate=T2.date_changed
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
Output
This is my preferred style because it doesn't require self joining and cleanly expresses your intent. Also, it competes very well with the ROW_NUMBER() method in terms of performance.
select a.*
, b.units
from contract_lines as a
join (
select a.contract_id
, a.line_id
, a.units
, Max(a.date_changed) over(partition by a.contract_id, a.line_id) as max_date_changed
from contract_history as a
) as b
on a.contract_id = b.contract_id
and a.line_id = b.line_id
and b.date_changed = b.max_date_changed;
Another possible solution to this. This uses RANK to sort/filter this. Similar to what you did, just a different tact.
SELECT contract_lines.*, T1.units
FROM contract_lines
LEFT JOIN (
SELECT contract_id, line_id, units,
RANK() OVER (PARTITION BY contract_id, line_id ORDER BY date_changed DESC) AS [rank]
FROM contract_history) AS T1
ON contract_lines.contract_id = T1.contract_id
AND contract_lines.line_id = T1.line_id
AND T1.rank = 1
WHERE T1.units IS NOT NULL
You could change this to a INNER JOIN and remove the IS NOT NULL in the WHERE clause if you expect data to be present all the time.
Glad you figured it out!
Try this simple query:
SELECT TOP 1 T1.*
FROM contract_lines T0
INNER JOIN contract_history T1
ON T0.contract_id = T1.contract_id and
T0.line_id = T1.line_id
ORDER BY date_changed DESC
As always seems to be the way after spending an hour looking at it and shouting at StackOverflow for having a rare period of maintenance I solve my own problem not long after posting a question.
In an effort to help anyone else who's stuck I'll show what I found. It might not be an efficient way to achieve this so if someone has a better suggestion I'm all ears.
I adapted the answer from here: T-SQL Subquery Max(Date) and Joins
SELECT *,
Units = (SELECT TOP 1 units
FROM contract_history
WHERE contract_lines.contract_id = contract_history.contract_id
AND contract_lines.line_id = contract_history.line_id
ORDER BY date_changed DESC
)
FROM ....

How do I select records by comparing values in columns B, C among records with same value in column A?

I want to select rows from my Postgres database that meet the following criteria:
There are other rows with the same value in column A
Those other rows have a specific value in column B
Those other rows have a larger value in column C
So if I had a table like this:
User | Item | Date
-----+------+------
Fred | Ball | 5/1/2015
Jane | Pen | 5/7/2015
Fred | Cup | 5/11/2015
Mike | Ball | 5/13/2015
Jane | Ball | 5/18/2015
Fred | Pen | 5/20/2015
Jane | Bat | 5/22/2015
The search might be "what did people buy after they bought a ball?" The output I would want would be:
User | Item | Date
-----+------+------
Fred | Cup | 5/11/2015
Fred | Pen | 5/20/2015
Jane | Bat | 5/22/2015
I've gotten as far as SELECT * FROM orders AS or WHERE or.user IN (SELECT or2.second_id FROM orders AS or2 GROUP BY or2.user HAVING count(*) > 1);, which gives me all of Fred's and Jane's orders (since they ordered multiple things). But when I try to put additional limitations on the WHERE clause (e.g. SELECT * FROM orders AS or WHERE or.item = 'Ball' AND or.user IN (SELECT or2.second_id FROM orders AS or2 GROUP BY or2.user HAVING count(*) > 1);, I get something that isn't what I expect at all -- a list of records where item = 'Ball' that seems to have ignored the second part of the query.
Thanks for any help you can provide.
Edit: Sorry, I misled some people at the end by describing the bad approach I was taking. (I was working on getting a list of the Ball purchases, which I could use as a subquery in a next step, but people correctly noted that this is an unnecessarily complex/expensive approach.)
I think this might give the result you are looking for:
SELECT orders.user, orders.item, orders.date
FROM orders, (SELECT * FROM orders WHERE item = 'ball') ball_table
WHERE orders.user = ball_table.user AND orders.date > ball_table.date;
select b.*
from user_table a
join user_table b
on b.user = a.user
and b.date > a.date
and b.item = 'Ball'
SELECT DISTINCT t3.*
FROM mytable t1
INNER JOIN mytable t2
ON t1.Item = t2.Item AND t1.User <> t2.User
INNER JOIN mytable t3
ON t2.User = t3.User AND t2.Date <= t3.Date
WHERE t1.Item = "Ball"

Oracle join on first row of a subquery

This may seem simple, but somehow it isn't. I have a table of historical rate data called TBL_A that looks like this:
| id | rate | added_date |
|--------|--------|--------------|
| bill | 7.50 | 1/24/2011 |
| joe | 8.50 | 5/3/2011 |
| ted | 8.50 | 4/17/2011 |
| bill | 9.00 | 9/29/2011 |
In TBL_B, I have hours that need to be joined to a single row of TBL_A in order to get costing info:
| id | hours | added_date |
|--------|---------|--------------|
| bill | 10 | 2/26/2011 |
| ted | 4 | 7/4/2011 |
| bill | 9 | 10/14/2011 |
As you can see, for Bill there are two rates in TBL_A, but they have different dates. To properly get Bill's cost for a period of time, you have to join each row of TBL_B on an row in TBL_A that is appropriate for the date.
I figured this would be easy; because this didn't have to an exceptionally fast query, I could just do a separate subquery for each row of costing info. However, joined subqueries apparently cannot "see" other tables that they are joined on. This query throws an invalid identifier (ORA-00904) on anything in the subquery that has the "h" alias:
SELECT h.id, r.rate * h.hours as "COST", h.added_date
FROM TBL_B h
JOIN (SELECT * FROM (
SELECT i.id, i.rate
FROM TBL_A i
WHERE i.id = h.id and i.added_date < h.added_date
ORDER BY i.added_date DESC)
WHERE rownum = 1) r
ON h.id = r.id
If the problem is simply scoping, I don't know if the approach I took can ever work. But all I'm trying to do here is get a single row based on some criteria, so I'm definitely open to other methods.
EDIT: The desired output would be this:
| id | cost | added_date |
|--------|---------|--------------|
| bill | 75 | 2/26/2011 |
| ted | 34 | 7/4/2011 |
| bill | 81 | 10/14/2011 |
Note that Bill has two different rates in the two entries in the table. The first row is 10 * 7.50 = 75 and the second row is 9 * 9.00 = 81.
Try using not exists:
select
b.id,
a.rate,
b.hours,
a.rate*b.hours as "COST",
b.added_date,
a.added_date
from
tbl_b b
inner join tbl_a a on
b.id = a.id
where
a.added_date < b.added_date
and not exists (
select
1
from
tbl_a a2
where
a2.added_date > a.added_date
and a2.added_date < b.added_date
and a2.id = a.id
)
As an explanation why this is happening: Only correlated subqueries are aware of the context in which they're being run, since they're run for each row. A joined subquery is actually executed prior to the join, and so it has no knowledge of the surrounding tables. You need to return all identifying information with it to make the join in the top level of the query, rather than trying to do it within the subquery.
select id, cost, added_date from (
select
h.id,
r.rate * h.hours as "COST",
h.added_date,
-- For each record, assign r=1 for 'newest' rate
row_number() over (partition by h.id, h.added_date order by r.added_date desc) r
from
tbl_b h,
tbl_a r
where
r.id = h.id and
-- Date of rate must be entered before
-- hours billed:
r.added_date < h.added_date
)
where r = 1
;