sql: max value by 2 columns in another table - sql

I have 2 tables and for every id in the first table I need to find max value in the date_2 column that would be lower than a value in the date_1 column.
Tables:
table 1
id
date_1
1
01.01.2020
1
11.01.2020
2
02.11.2020
2
02.12.2020
3
12.12.2020
3
31.01.2021
table 2
id
date_2
1
30.12.2019
1
05.01.2020
2
01.11.2020
2
30.10.2020
3
10.11.2020
3
31.12.2020
outcome needed:
id
date_1
max(date_2) within id,date_1
1
01.01.2020
30.12.2019
1
11.01.2020
05.01.2020
2
02.11.2020
01.11.2020
2
02.12.2020
01.11.2020
3
12.12.2020
10.11.2020
3
31.01.2021
31.12.2020
appreciate your help with this!

you could rank each row (I'm doing it here with row_number() function) then match on the id and the ranking.
with t1 as (select id, date_1,
row_number() over (partion by id order by date1) as rn
from table1),
t2 as (select id, date_2,
row_number() over (partion by id order by date2) as rn
from table2 ),
select id, date1, date2
from t1 inner join t2 on t1.id = t2.id and t1.rn = t2.rn

You can pretty much write a simple correlated query using exists that mirrors the English narrative:
select id, (
select Max(date_2) /* find max value in the date_2 column */
from t2
where t2.id = t1.id /* for every id in the first table */
and t2.date_2 < t1.date_1 /* lower than a value in the date_1 column */
) as "max(date_2) within id,date_1"
from t1;

Related

Match by id and date between 2 tables, OR last known match id

Trying to make work following:
T1: Take id per dt where name = A which is most recent by load_id
Notice 2 records on 5-Jan-23, with load_id 2 and 3 => take load_id = 3
T2: And display corresponding id per dt for each param rows, with most recent load_id
Notice only load_id = 13 is kept on 05-Jan-23
T2: In case of date now available in T1, keep T2 rows matching last known id
Fiddle: https://dbfiddle.uk/-JO16GSj
My SQL seems a bit wild. Can it be simplified?
SELECT t2.dt, t2.param, t2.load_id, t2.id FROM
(SELECT
dt,
param,
load_id,
MAX(load_id) OVER (PARTITION BY dt, param) AS max_load_id,
id
FROM table2) t2
LEFT JOIN
(SELECT * FROM
(SELECT
dt,
id,
load_id,
MAX(load_id) OVER (PARTITION BY dt) AS max_load_id
FROM table1
WHERE name = 'A') t1_prep
WHERE t1_prep.load_id = t1_prep.max_load_id) t1
ON t1.dt = t2.dt and t1.id = t2.id
WHERE t2.load_id = t2.max_load_id
ORDER BY 1, 2
Your query can be rewritten as:
SELECT t2.*
FROM ( SELECT *
FROM table2
ORDER BY RANK() OVER (PARTITION BY dt, param ORDER BY load_id DESC)
FETCH FIRST ROW WITH TIES
) t2
LEFT OUTER JOIN
( SELECT *
FROM table1
WHERE name = 'A'
ORDER BY RANK() OVER (PARTITION BY dt ORDER BY load_id DESC)
FETCH FIRST ROW WITH TIES
) t1
ON t1.dt = t2.dt and t1.id = t2.id
ORDER BY t2.dt, t2.param
However, since the columns from t1 are never output and are joined with a LEFT OUTER JOIN (and will only output single rows per dt) then it is irrelevant whether a match is found or not with t1 and that table can be eliminated from the query simplifying it to:
SELECT *
FROM (
SELECT *
FROM table2
ORDER BY RANK() OVER (PARTITION BY dt, param ORDER BY load_id DESC)
FETCH FIRST ROW WITH TIES
)
ORDER BY dt, param;
or using your query:
SELECT dt, param, load_id, id
FROM (
SELECT dt, param, load_id, id,
MAX(load_id) OVER (PARTITION BY dt, param) AS max_load_id
FROM table2
)
WHERE load_id = max_load_id
ORDER BY dt, param
Which, for the sample data, all output:
DT
PARAM
LOAD_ID
ID
04-JAN-23
0
11
1
04-JAN-23
1
11
1
05-JAN-23
0
13
3
05-JAN-23
1
13
3
06-JAN-23
0
14
3
06-JAN-23
1
14
3
07-JAN-23
1
14
3
08-JAN-23
1
15
3
09-JAN-23
0
16
3
09-JAN-23
1
16
3
10-JAN-23
0
17
3
10-JAN-23
1
17
3
fiddle

How to differentiate the continuous and non-continuous date ranges based on ID column

ID STRT_DT, ENT_DT
1 9/14/2020,10/5/2020
1 10/6/2020,10/8/2020
1 10/9/2020,12/31/2199
2 7/14/2020,11/5/2020
2 11/21/2020,11/22/2020
2 11/23/2020,12/31/2199
Upon observing the above data for ID 1 and 2, The date ranges belongs to 1 are continuous and the ID 2 are non-continuous . I need pull the ID's which are continuous in SQL.
Expected o/p : If any of the date range is not continuous (grouping by ID), that should not come into select clause. So the expectation of the SQL output is to get ID=1
Query using:
SELECT tab.ID,TAB.STRT_DT,TAB.ENT_DT,
STRT_DT - MIN(ENT_DT) OVER (PARTITION BY ID ORDER BY ENT_DT ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS diff,
ENT_DT - MAX(STRT_DT) OVER (PARTITION BY ID ORDER BY ENT_DT ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS diff2
FROM tabLE QUALIFY diff <> 1 OR diff2 <> -1
select ID
from
(
select
ID,
-- flag non-continous ranges, i.e. previous end is not equal to the day before current start
case when STRT_DT - 1
<> LAG(ENT_DT) OVER (PARTITION BY ID ORDER BY STRT_DT)
then 1
else 0
end as flag
from table
) as dt
group by ID
having sum(flag) = 0 -- only continous ranges exist
You can use self join without window function.
select t1.id,
t2.start_dt prev_start_dt, t2.end_dt prev_end_dt,
t1.start_dt, t1.end_dt,
to_date(t1.start_dt, 'MM/DD/YYYY') - to_date(t2.end_dt, 'MM/DD/YYYY') diff
from t t1 inner join t t2 on t1.id = t2.id
where to_date(t1.start_dt, 'MM/DD/YYYY') - to_date(t2.end_dt, 'MM/DD/YYYY') = 1
order by t1.id, t1.start_dt
Result:
ID PREV_START_DT PREV_END_DT START_DT END_DT DIFF
1 9/14/2020 10/5/2020 10/6/2020 10/8/2020 1
1 10/6/2020 10/8/2020 10/9/2020 12/31/2199 1
2 11/21/2020 11/22/2020 11/23/2020 12/31/2199 1
Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=e20d48300f81e746826e44d8ee6982be
If you want to get only ID that all rows continued, you can use left join to see joined row and not joined row.
select t1.id
from t t1 left join t t2
on t1.id = t2.id
and to_date(t1.start_dt, 'MM/DD/YYYY') - to_date(t2.end_dt, 'MM/DD/YYYY') = 1
group by t1.id
having count(t1.id) - 1 = count(t2.id)
Result:
ID
1
Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=85679be30f170e3728d7e8b0b0da533e

SQL(BigQuery) Join two tables with lag() function

Considering I have a two tables.
First one:
user_id
name
timestamp1
1
purchase
12
1
purchase
14
2
purchase
22
2
purchase
14
Second one:
user_id
event_name
timestamp2
1
event1
10
1
event2
11
2
event12
20
2
event10
12
A want to add to the table one some fields(event_name, timestamp2) from the table two with the closest previous values by user_id, order by timestamp for every event from table one
Desired table should look like this
user_id
name
timestamp1
event_name
timestamp2
1
purchase
12
event2
11
1
purchase
14
event2
11
2
purchase
22
event12
20
2
purchase
14
event10
12
Help me please with sql query!
Thanks.
You can use join on user_id then using row_number() ordered by the distance between timestamp1 and timestamp2 to get the closest row from table2:
SELECT user_id, name, timestamp1, event_name, timestamp2
FROM (
SELECT t1.*, t2.event_name, t2.timestamp2,
ROW_NUMBER() OVER(PARTITION BY t1.user_id, t1.timestamp1 ORDER BY ABS(t1.timestamp1 - t2.timestamp2)) AS rn
FROM table1 t1
INNER JOIN table2 t2
ON t1.user_id = t2.user_id
)
WHERE rn = 1
Output:
select any_value(t1).*,
array_agg(struct(event_name,timestamp2) order by timestamp2 desc limit 1)[offset(0)].*
from `project.dataset.table1` t1
cross join `project.dataset.table2` t2
where t2.user_id = t1.user_id and timestamp2 < timestamp1
group by format('%t', t1)
if to apply to sample data in your question - output is

SQL select columns group by

If I have a table which is of the following format:
ID NAME NUM TIMESTAMP BOOL
1 A 5 09:50 TRUE
1 B 6 13:01 TRUE
1 A 1 10:18 FALSE
2 A 3 12:20 FALSE
1 A 1 05:30 TRUE
1 A 12 06:00 TRUE
How can I get the ID, NAME and NUM for each unique ID, NAME pair with the latest Timestamp and BOOL=TRUE.
So for the above table the output should be:
ID NAME NUM
1 A 5
1 B 6
I tried using Group By but I cannot seem to get around that either I need to put an aggregator function around num (max, min will not work when applied to this example) or specifying it in group by (which will end up matching on ID, NAME, and NUM combined). Both as far as I can think will break in some case.
PS: I am using SQL Developer (that is the SQL developed by Oracle I think, sorry I am a newbie at this)
If you're using at least SQL-Server 2005 you can use the ROW_NUMBER function:
WITH CTE AS
(
SELECT ID, NAME, NUM,
RN = ROW_NUMBER()OVER(PARTITION BY ID, NAME ORDER BY TIMESTAMP DESC)
FROM Table
WHERE BOOL='TRUE'
)
SELECT ID, NAME, NUM FROM CTE
WHERE RN = 1
Result:
ID NAME NUM
1 A 5
1 B 6
Here's the fiddle: http://sqlfiddle.com/#!3/a1dc9/10/0
select t1.* from table as t1 inner join
(
select NAME, NUM, max(TIMESTAMP) as TIMESTAMP from table
where BOOL='TRUE'
) as t2
on t1.name=t2.name and t1.num=t2.num and t1.timestamp=t2.timestamp
where t1.BOOL='TRUE'
select t1.*
from TABLE1 as t1
left join
TABLE1 as t2
on t1.name=t2.name and t1.TIMESTAMP>t2.TIMESTAMP
where t1.BOOL='TRUE' and t2.id is null
should do it for you.

Group by with 2 distinct columns in SQL Server

I have data like below
ID Stat Date
1 1 2009-06-01
2 1 2009-06-20
3 1 2009-06-10
4 2 2009-06-10
O/P to be like this..
ID Stat CDate
2 1 2009-06-20
4 2 2009-06-10
I have tried with below query and was unsuccessful, please suggest.
Select Distinct stat,MAX(Cdate) dt,id From testtable
Group By stat,id
Got the solution..
Select f1.id,f1.stat,f1.cdate From testtableas F1
Join(Select stat,MAX(cdate) as dt from testtable group by stat) as F2
On f2.stat=F1.stat and f2.dt=f1.cdate
SELECT t1.id, t1.stat, t1.date
FROM testtable t1
JOIN (SELECT stat, MAX(date) date FROM testtable GROUP BY stat) t2 ON t1.stat = t2.stat AND t1.date = t2.date
GROUP BY stat
I'm assuming you want the stat belonging to the maximum date, right?
select t1.id, t1.stat, t1.cdate
from testtable t1,
(select stat, max(cdate) max_date from testtable
group by stat) t2
where t1.stat = t2.stat and t1.cdate = t2.max_date
You cannot add the id here. Because grouping on id will result will not be the desired result. id is distinct in its nature by default. so grouping on id will result all the data.
;with CTE AS
(
Select stat,MAX(Cdate)Over(Partition by stat) as dt,id
From testtable
)
Select ID,stat,dt
From CTE
Inner JOIn testtable On testtable.id=CTE.ID and testtable.date=CTE.dt
Group By stat
I liked the solution by nicktrs, though. If you are using SQL SERVER 2005 or later, this might work for you;
select k.id, k.stat, k.cdate from(
select id, stat, cdate, row_num=rownumber()
over (partition by stat order by cdate desc) as k from testtab )
where k.row_num=1;
output of inner query goes like this:
ID Stat Date Row_num
2 1 2009-06-20 1
3 1 2009-06-10 2
1 1 2009-06-01 3
4 2 2009-06-10 1
Output after full query is executed:
ID Stat Date
2 1 2009-06-20
4 2 2009-06-10
Hope this helps. Adieu.