SQL get closest value by date - sql

can't wrap my mind around the next problem
I have a table with historical data TableA:
uniq_id item_id item_clust date
11111 1 a 2020-02-12
11112 1 a 2020-01-13
11113 1 b 2020-02-01
11114 2 b 2020-01-01
I also have a table with historical data for clusters TableB:
item_id item_clust item_pos date
1 a 1 2020-01-01
1 a 2 2020-02-01
1 a 3 2020-03-01
1 b 1 2020-01-10
I would like to receive the latest position for every item_id + item_clust for date based on dates in TableB
If no rows found, I would like to insert item_pos = 0
Desired result:
uniq_id item_id item_clust date item_pos
11111 1 a 2020-02-12 2
11112 1 a 2020-01-13 1
11113 1 b 2020-02-01 1
11114 2 b 2020-01-01 0
So, for item 1 in cluster a on 2020-02-12 the latest position is at 2020-02-01 = 2.

This looks like a left join:
select a.*, coalesce(b.item_pos, 0) as item_pos
from a left join
(select distinct on (b.item_id, b.item_clust) b.*
from b
order by b.item_id, b.item_clust, b.date desc
) b
using (item_id, item_clust);
Or a lateral join:
select a.*, coalesce(b.item_pos, 0) as item_pos
from a left join lateral
(select b.*
from b
where b.item_id = a.item_id and
b.item_clust = a.item_clust
order by b.date desc
limit 1
) b
on true; -- always do the left join even when there are no matches
EDIT:
If you want the most recent position "as of" the date in A, then use the lateral join:
select a.*, coalesce(b.item_pos, 0) as item_pos
from a left join lateral
(select b.*
from b
where b.item_id = a.item_id and
b.item_clust = a.item_clust and
b.date <= a.date
order by b.date desc
limit 1
) b
on true; -- always do the left join even when there are no matches

Related

SQL, left join table, How to keep only one value?

I would like to join two tables.
The table 1 is like this:
id Date
a 01.01.2021
a 02.01.2021
a 03.01.2021
b 01.01.2021
b 02.01.2021
b 03.01.2021
c 01.01.2021
c 02.01.2021
c 03.01.2021
The table 2 is like this:
id value
a 12
a 8
b 50
As final result, I would like to have a table like this:
for an existing id from table 2, the sum of the value of the same id should be stored in the last date in the table 1.
id Date Value_FINAL
a 01.01.2021 0
a 02.01.2021 0
a 03.01.2021 20
b 01.01.2021 0
b 02.01.2021 0
b 03.01.2021 50
c 01.01.2021 0
c 02.01.2021 0
c 03.01.2021 0
I tried to use left join to join these two tables at first,
with t3 as ( select id, sum(value) Value_FINAL from t2 group by id)
select t1.*, t3.value_FINAL from t1 left join t3 on t1.id = t3.id;
After this, I can get this:
id Date Value_FINAL
a 01.01.2021 20
a 02.01.2021 20
a 03.01.2021 20
b 01.01.2021 50
b 02.01.2021 50
b 03.01.2021 50
c 01.01.2021 0
c 02.01.2021 0
c 03.01.2021 0
But, it is not I want. can someone help with this? How can I keep the value only in the last Date in the column 'value_FINAL'
I am also thinking about to use last_value(value) over (partition by id order by date). But I need to create an extra table or column.
maybe someone has a good idea how to deal this problem?
A join based alternative
select a.*, coalesce(c.value,0)
from t1 a
left join (select id, max(date) date from t1 group by id) b on a.id = b.id and a.date = b.date
left join (select id, sum(value) value from t2 group by id) c on b.id = c.id
Use row_number(). One method is:
select t1.*,
(case when row_number() over (partition by id order by date desc) = 1
then (select coalesce(sum(t2.value), 0) from table2 t2 where t2.id = t1.id)
else 0
end) as value_final
from table1 t1;
You can use ROW_NUMBER to identify the row where you want to place the total value.
For example:
select
t1.id, t1.date, b.total,
case when
row_number() over(partition by t1.id order by t1.date desc) = 1
then b.total
else 0 end as value_final
from t1
left join (select id, sum(value) as total from t2 group by id) b
on b.id = t1.id

Having problem with joining condition while joining 3 tables

I am having following structure of the tables:
Table A:
SSN a_id b_id. Date Sent
123 1 2 12/11/2020 1
Table B:
SSN a_id b_id Date. OPen
123 1 2 13/11/2020 1
123. 1 2. 14/11/2020 1
Table C:
SSN a_id b_id Date. Clicks
123 1 2 13/11/2020 1
123 1 2 14/11/2020 1
123 1 2 14/11/2020 1
123 1 2 14/11/2020 1
123 1 2 15/11/2020 1
I am using:
select *
from Table A
left join Table B on A.SSN = B.SSN and A.a_id = B.a_id and A.b_id = B.b_id
left join Table C on A.SSN = C.SSN and A.a_id = C.a_id and A.b_id = C.b_id
I want the following output:
Table Ans
SSN a_id b_id Date. Sent Open Clicks
123 1 2 12/11/2020 1 0 0
123 1 2 12/11/2020 0 1 1
123 1 2 12/11/2020 0 1 1
123 1 2 12/11/2020 0 0 1
123 1 2 12/11/2020 0 0 1
123 1 2 12/11/2020 0 0 1
The order of 1 and 0 in each column doesn't matter. But the count of it should be same as there in Original tables. How can I achieve this?
I assume that, this is the result set you actually wanted to get or something similar to it.
with a as (select * from (values (123,1,2,'2020-11-12',1)) a(ssn,a_id,b_id,"date",sent))
,b as(
select * from (values (123,1,2,'2020-11-13',1)
,(123,1,2,'2020-11-14',1)
) a(ssn,a_id,b_id,"date",open))
,c as
(select * from (values (123,1,2,'2020-11-13',1)
,(123,1,2,'2020-11-14',1)
,(123,1,2,'2020-11-14',1)
,(123,1,2,'2020-11-14',1)
,(123,1,2,'2020-11-15',1)
) a(ssn,a_id,b_id,"date",clicks)
)
select ssn, a_id, b_id,"date", sum(sent) as sent, sum(open) as open, sum(clicks) as clicks
from (
select ssn, a_id, b_id,"date",sent,0 as open,0 as clicks,"date" as hidendate from a
union all
select a.ssn,a.a_id,a.b_id,a."date",0 as sent,open,0 as clicks,b."date" as hidendate from a,b where a.a_id = b.a_id and a.b_id = b.b_id
union all
select a.ssn,a.a_id,a.b_id,a."date",0 as sent,0 as open,clicks,c."date" as hidendate from a,c where a.a_id = c.a_id and a.b_id = c.b_id
) q1
group by ssn,a_id,b_id,"date",hidendate
order by date
I think you want full join:
select *
from a full join
b
using (ssn, a_id, b_id, date) full join
c
using (ssn, a_id, b_id, date);
This returns the 0s as NULLs.
If you want 0s, use:
select ssn, a_id, b_id, date,
coalesce(a.sent, 0) as sent,
coalesce(b.open, 0) as open,
coalesce(c.click, 0) as click
from a full join
b
using (ssn, a_id, b_id, date) full join
c
using (ssn, a_id, b_id, date);

SQL Server : smallest ROW_NUM in where condition, with subgroup pre-condition

Thanks all in advance! I am trying to describe this as clear as I can.
I got two sub-tables, 1st table retrieves Comfirmed_Date and the 2nd table retrieves Mail_Date with condition Mail_Date >= Comfirmed_Date.
select
a.ID
,g.ROWNUM
,f.CORM_DT
,g.MAIL_DT
from
SOURCE_U a
left join
(select
a.SOURCE_ID
, Max(Cast(b.ATUF_DATE3 as date)) as [CORM_DT]
from
ATTACH_U a
inner join
USERFLD_D b on a.DEST_CK = b.DEST_CK
group by
a.SOURCE_ID) f on f.SOURCE_ID = a.SOURCE_ID
left join
(select
a.SOURCE_ID
, cast(b.MAILED_DT as date) as MAIL_DT
, row_number() over (partition by SOURCE_ID order by CREATE_DT) as ROWNUM
from
ATTACH_U a
left join
LETTER_D b on b.DEST_CK = a.DEST_CK) g on g.SOURCE_ID = a.SOURCE_ID
and g.MAIL_DT >= f.CORM_DT
I need the first line (smallest row_num) for the tables, how can I achieve that?
Original I think I can make condition like
where g.ROWNUM = 1
but because I have the condition on joint table, it does not work for below situations.
ID gROWNUM CORM_DT MAIL_DT
1001 3 2020-10-20 2020-10-22
1001 4 2020-10-20 2020-10-30
1002 2 2020-10-20 2020-10-21
1002 3 2020-10-20 2020-10-23
1002 4 2020-10-20 2020-10-28
1003 1 2020-10-20 2020-10-30
1004 1 2020-10-20 2020-10-21
1004 2 2020-10-20 2020-10-23
1005 4 2020-10-20 2020-10-28
1006 1 2020-10-20 2020-10-30
I only want one line for each ID here.
Try this:
SELECT TOP 1
a.ID
, g.ROWNUM
, f.CORM_DT
, g.MAIL_DT
FROM SOURCE_U a
LEFT JOIN (
SELECT
a.SOURCE_ID
, Max(Cast(b.ATUF_DATE3 as date)) as [CORM_DT]
FROM ATTACH_U a
INNER JOIN USERFLD_D b
ON a.DEST_CK = b.DEST_CK
GROUP BY a.SOURCE_ID
) f
ON f.SOURCE_ID = a.SOURCE_ID
LEFT JOIN (
SELECT
a.SOURCE_ID
, CAST( b.MAILED_DT AS date) AS MAIL_DT
, ROW_NUMBER() OVER( PARTITION BY SOURCE_ID ORDER BY CREATE_DT ) AS ROWNUM
FROM ATTACH_U a
LEFT JOIN LETTER_D b
ON b.DEST_CK = a.DEST_CK
) g
ON g.SOURCE_ID = a.SOURCE_ID
AND g.MAIL_DT >= f.CORM_DT
ORDER BY
g.ROWNUM;
All you need is a window function in your select.
select rows, columns... from (
select dense_rank() over ( partition by a.ID order by MAIL_DT) as rows, columns...
...
)
where rows = 1

How to get the value from table B whose max date is less than the date in table A

I have two tables; table A and table B. Table A has StoreNumber, MatNumber and Date. Table B has StoreNumber, MatNumber, Date and ShipmentValue. I have to get the Shipment value from table B for StoreNumber and MatNumber given that the Maximum Date in Table B for the StoreNumber and MatNumber should be less than the Date for the same StoreNumber and MatNumber in Table A (each row in Table A) . Please see the output table.
Table A:
StoreNumber MatNumber Date
A 9 3/30/2020
A 9 3/30/2020
B 10 3/18/2020
B 10 3/18/2020
A 9 3/13/2020
Table B:
StoreNumber MatNumber Date ShipmentValue
A 9 3/10/2020 2
A 9 3/12/2020 3
A 9 3/18/2020 4
B 10 3/4/2020 7
B 10 3/7/2020 9
B 10 3/16/2020 10
Output:
StoreNumber MatNumber A.Date B.Date ShipmentValue
A 9 3/30/2020 3/18/2020 4
A 9 3/30/2020 3/18/2020 4
B 10 3/18/2020 3/16/2020 10
B 10 3/18/2020 3/16/2020 10
A 9 3/13/2020 3/12/2020 3
Tried with ROW_NUMBER and selecting 1st row after ordering date by desc.
SELECT A.StoreNumber
,A.MatNumber
,A.Date
,B.Date AS B_Date
,B.ShipmentValue
FROM TableA A
LEFT JOIN
(
SELECT StoreNumber ,MatNumber , Date , ShipmentValue
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY StoreNumber, MatNumber ORDER BY DATE DESC ) AS ID,*
FROM TableB
) A
WHERE ID = 1
) B
ON A.StoreNumber = B.StoreNumber
AND A.MatNumber = B.MatNumber
This is a place where a lateral join is handy:
select a.*, b.date, b.shipmentvalue
from a left join lateral
(select b.*
from b
where b.storenumber = a.storenumber and
b.matnumber = a.matnumber and
b.date <= a.date
order by b.date desc
fetch first 1 row only
) b
on 1=1; -- returns rows in a even when there are no matches
EDIT:
Wow. Snowflake implements lateral joins and then limits them in a fundamental way. Another method is more expensive but should work:
select ab.*, b.shipmentValue
from (select a.StoreNumber, a.MatNumber, a.Date, max(b.date) as b_date, b.shipmentvalue
from a left join
b
on b.storenumber = a.storenumber and
b.matnumber = a.matnumber and
b.date <= a.date
group by a.StoreNumber, a.MatNumber, a.Date
) ab join
b
on b.storenumber = ab.storenumber and
b.matnumber = ab.matnumber and
b.date <= ab.b_date

How to query to get only rows where a change took place? (changes can go back and forth)

I'm working with a table that has dozens of rows per customer, each with a date and several columns representing various statuses. I'm only interested in pulling the rows where a change took place in one particular column (specifically 0 to 1 or 1 to 0, see status column below).
I can't simply use row_number() over (partition by customer_id, status order by date) because the status can go back and forth between 0 and 1.
Here's a sample of what I'm trying to do (note that there are two different Customer IDs in this example):
Original Table
Row CustomerID Status Date
1 ABC 0 3/12/2013
2 ABC 0 3/31/2013
3 ABC 1 4/13/2013
4 ABC 1 4/15/2013
5 ABC 1 5/17/2013
6 ABC 0 6/25/2013
7 ABC 0 6/28/2013
8 XYZ 0 8/2/2013
9 XYZ 1 5/10/2013
10 XYZ 0 5/18/2013
11 XYZ 1 8/23/2013
12 XYZ 1 9/7/2013
Desired Query Output
Customer ID Status Date
ABC 1 4/13/2013
ABC 0 6/25/2013
XYZ 1 5/10/2013
XYZ 0 5/18/2013
XYZ 1 8/23/2013
You were on the right track with ROW_NUMBER. It can be especially helpful in joining the table to itself in cases such as yours.
The following should get you what you're looking for:
WITH CTE AS (
SELECT Row,
CustomerID,
Status,
Date,
ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY Row) AS N
FROM OriginalTable
)
SELECT A.CustomerID,
A.Status,
A.Date
FROM CTE A
JOIN CTE B
ON A.N = B.N+1
AND A.CustomerID = B.CustomerID
WHERE A.Status <> B.Status
ORDER BY
A.Row
select distinct b.CustomerID, b.status, min(b.date)
From customer a, customer b
where a.CustomerID = b.CustomerID and a.status <> b.status and a.date < b.date
group by b.CustomerID, b.status, a.date;