Choosing the most recent when joining in Snowflake (SQL)

Choosing the most recent when joining in Snowflake (SQL) - sql

So I have a list as follows:
Table 1
ID TIMESTAMP GROUP
001 2021-04-01 12:51:12.063 A
001 2021-04-04 12:51:12.063 G
001 2021-04-14 10:47:03.022 B
002 2021-01-13 09:46:23.012 C
003 2021-09-10 03:32:53.043 D
004 2021-04-13 01:12:54.056 D
004 2021-04-13 11:12:26.054 A
004 2021-04-13 21:53:36.023 D
005 2021-04-01 13:53:13.023 F
005 2021-04-11 13:53:13.023 J
003 2022-04-13 20:32:11.011 G
006 2021-08-13 20:32:11.011 G
And I also have a list of events:
TABLE 2
EVENT ID TIMESTAMP
eventA 001 2021-04-02 12:51:12.063
eventB 001 2021-04-13 12:51:12.063
eventA 002 2021-04-01 12:51:12.063
eventA 002 2021-04-13 12:51:12.063
eventA 002 2021-04-14 12:51:12.063
eventA 003 2021-10-17 12:51:12.063
eventB 005 2021-04-10 12:51:12.063
eventB 005 2021-04-21 12:51:12.063
eventA 006 2021-05-01 20:32:11.011
And my goal here is for every event in TABLE 2, I want to join the most recent entry from table 1 based on ID. If there are no preceding entries in Table 1, though they exist, they should be null on the join.
So in short, for every row in Table 2, we need to find the most recent group for that ID based on timestamp.
Final Result
EVENT ID TIMESTAMP group
eventA 001 2021-04-02 12:51:12.063 A
eventB 001 2021-04-13 12:51:12.063 G
eventA 002 2021-04-01 12:51:12.063 NULL
eventA 002 2021-04-13 12:51:12.063 C
eventA 002 2021-04-14 12:51:12.063 C
eventA 003 2021-10-17 12:51:12.063 D
eventB 005 2021-04-10 12:51:12.063 F
eventB 005 2021-04-21 12:51:12.063 J
eventA 006 2021-05-01 20:32:11.011 NULL

So if you do a LEFT JOIN based on prior (equal?) timestamps and then prune the overmatches to just the most recent with a QUALIFY this can be done with:
SELECT t2.event
t2.id
t2.timestamp
t1.group
FROM table2 AS t2
LEFT JOIN table1 AS t1
ON t2.id = t1.id AND t2.timestamp >= t1.timestamp
QUALIFY ROW_NUMBER() OVER (
PARTITON BY t2.id, t2.timestamp
ORDER BY t1.timestamp DESC NULLS LAST
) = 1
ORDER BY 1,2,3;
this will work as long as Table2 has no duplicate ID, Timestamp values

Window functions with QUALIFY ROW_NUMBER() work to get the latest row as Simeon shows. I've found that for this type of join (often called an AsOf join) if the tables are very large this join, find the max timestamp and rejoin approach usually completes faster than using a window function:
select J."EVENT", J.ID, J."TIMESTAMP", "GROUP" from
(select * from T2,
lateral (select max(T1."TIMESTAMP") TS from T1 where T1.ID = T2.ID and T1.TIMESTAMP < T2."TIMESTAMP")) J
left join T1 on J.TS = T1."TIMESTAMP"
;

Related

Finding duplicate lines

I am looking for the best solution to find duplicated rows that have in in specific column NULL value and some INTGER value as shown bellow.
Result
This is what I expect to get from query
TREATY_NUMBER
SECTION_NUMBER
DT_PERIOD_START
INVOLVEMENT
1
001
20190101
NULL
1
001
20190101
58
1
001
20200101
NULL
1
001
20200101
58
2
001
20200101
NULL
2
001
20200101
77
2
001
20200101
NULL
2
001
20210101
77
I was trying to do something like this to find all TREATY_NUMBERs that have INTEGER value and than to join them to same table to get all data.
select distinct v.*
from STREATY v
join
(select TREATY_NUMBER, SECTION_NUMBER, DT_PERIOD_START, max(INVOLVEMENT) INV
from TREATY group by TREATY_NUMBER, SECTION_NUMBER, DT_PERIOD_START
having count(*) >1) a
on a.TREATY_NUMBER=v.TREATY_NUMBER and a.DT_PERIOD_START=v.DT_PERIOD_START
where a.INV is not null
But in this case I got also a lines that have only INTEGER value but do not have any NULL value
This is what I get now from query
TREATY_NUMBER
SECTION_NUMBER
DT_PERIOD_START
INVOLVEMENT
1
001
20190101
NULL
1
001
20190101
58
1
001
20200101
NULL
1
001
20200101
58
2
001
20200101
NULL
2
001
20200101
77
2
001
20200101
NULL
2
001
20210101
77
6038
001
20200101
6
6038
001
20200101
7
6038
001
20200101
8

Counting distinct ID based on date

So I have a table as follows:
ID create_date
001 01/01/2021
002 02/04/2021
003 07/22/2021
004 01/29/2021
005 03/01/2021
ID is unique for the table.
I have another table (below) where these IDs appear multiple times alongside another variable, titled code_id.
ID code_id date data
001 A 01/01/2021 xxx
002 W 02/08/2021 xxx
002 B 03/06/2021 xxx
001 A 01/19/2021 xxx
002 C 05/01/2021 xxx
004 D 12/01/2021 xxx
001 K 01/02/2021 xxx
001 J 01/15/2021 xxx
005 A 03/01/2021 xxx
005 A 03/01/2021 xxx
005 B 03/05/2021 xxx
005 B 03/30/2021 xxx
005 C 03/30/2021 xxx
005 D 04/01/2021 xxx
What I want to do is create a new table (preferably via CTE, but open to join options) which show the distinct count of code_id after both 5 and 30 days from table1.create_date.
So in other words, how many different code_id's appear for each ID after x days from create_date, where x is equal to 5 and 30 respectively.
Here is the resulting table I seek:
ID distinct_code_id_5_day distinct_code_id_30_day distinct_code_id_total
001 2 3 3
002 1 2 3
003 0 0 0
004 0 0 1
005 2 3 4
In the case of ID = 001,we show all code_id's that appeared from 01/01/2021 - 01/05/2021, inclusive for distinct_code_id_5_day and 01/01/2021 - 01/30/2021, inclusive for distinct_code_id_30_day.

You should be able to solve this with a join and a couple iff() with date math:
with ids as (
select split(value, ' ') x, x[0] id, x[1]::date create_date
from table(split_to_table('001 01/01/2021
002 02/04/2021
003 07/22/2021
004 01/29/2021
005 03/01/2021', '\n'))
), data as(
select split(value, ' ') x, x[0] id, x[7] code_id, x[9]::date date, x[11] data
from table(split_to_table('001 A 01/01/2021 xxx
002 W 02/08/2021 xxx
002 B 03/06/2021 xxx
001 A 01/19/2021 xxx
002 C 05/01/2021 xxx
004 D 12/01/2021 xxx
001 K 01/02/2021 xxx
001 J 01/15/2021 xxx
005 A 03/01/2021 xxx
005 A 03/01/2021 xxx
005 B 03/05/2021 xxx
005 B 03/30/2021 xxx
005 C 03/30/2021 xxx
005 D 04/01/2021 xxx', '\n')))
select id, count(distinct code5), count(distinct code30), count(distinct code_id)
from (
select a.id, iff(a.create_date + 5 >= b.date, b.code_id, null) code5
, iff(a.create_date + 30 >= b.date, b.code_id, null) code30
, b.code_id
from ids a
left outer join data b
where a.id=b.id
)
group by 1

postgreSQL query - find next lesson

I'm trying to write a PostgreSQL query to list the two instruments with the lowest monthly rental fee, also tell when the next lesson for each listed instrument is scheduled. I have these two tables:
//Table lesson
lesson_id | instrument_id | start
001 | 01 | 2021-01-01 10:00:00
002 | 01 | 2021-01-02 10:00:00
003 | 02 | 2021-01-04 10:00:00
004 | 02 | 2021-01-05 10:00:00
//Table instrument
instrument_id | fee_per_month
01 | 300
02 | 400
03 | 500
And I want:
instrument_id | fee_per_month | lesson_id | start
01 | 300 | 001 | 2021-01-01 10:00:00
02 | 400 | 003 | 2021-01-04 10:00:00
Getting the two instruments with lowest fee has been solved. How do I get the next lesson for these two instrument with lowest fee?

One option uses a lateral join:
select i.*, l.lesson_id, l.start
from instrument i
left join lateral (
select l.*
from lesson l
where l.instrument_id = i.instrument_id and l.start >= current_date
order by l.start
limit 1
) l on true
This brings the first lesson today or after today's date for each instrument (if any).
You could also use distinct on:
select distinct on (i.instrument_id) i.*, l.lesson_id, l.start
from instrument i
left join lesson l on l.instrument_id = i.instrument_id and l.start >= current_date
order by i.instrument_id, l.start

select data from another table with max date in hive

I have one table t1 like this
A B
1 2020-05-01
1 2020-05-04
1 2020-05-05
1 2020-05-06
2 2020-04-10
and another table t2
A C
1 2020-04-30
5 2020-04-08
and I need out like this:
A B c
1 2020-05-01 2020-04-30
1 2020-05-04 2020-04-30
1 2020-05-05 2020-04-30
1 2020-05-06 2020-04-30
2 2020-04-10 2020-04-08
As you can see i am getting last max date as c from table t2 which less than B
here 2020-04-30 is the max date less than 2020-05-01,04,05 and 06, and for 2020-04-10 the date is 2020-04-08.
I am trying it like this but getting wrong answer:
select t1.*,t2.C, max(C) over (partition by t2.A ) from t1 inner join t2 on t1.A=t2.A and t2.C<t1.B

You could try this approach.
I use CTE(Common Table Expresion) and query the CTE with MAX and GROUP BY
WITH t AS(
SELECT t1.a, t1.b, t2.c
FROM t1, t2
WHERE t1.b > t2.c)
SELECT a, b, MAX(c) AS c
FROM t
GROUP BY a,b;
expected output
+----+-------------+-------------+--+
| a | b | c |
+----+-------------+-------------+--+
| 1 | 2020-05-01 | 2020-04-30 |
| 1 | 2020-05-04 | 2020-04-30 |
| 1 | 2020-05-05 | 2020-04-30 |
| 1 | 2020-05-06 | 2020-04-30 |
| 2 | 2020-04-10 | 2020-04-08 |
+----+-------------+-------------+--+

You can try this:
Select t1.A,t1.B,MAX(t2.B) from t1 t1 join t2 t2 on t1.A=t2.A group by t1.A,t1.B;

how to interchange date oracle

please i have a table like
customer_no product_code
1345 001
1345 002
1345 003
i want a new table that will show me these details
customer_no product_code, product_code
1345 001 002
1345 001 003
1345 002 001
1345 002 003
1345 003 001
1345 003 002

This will give you the desired output.
create yourNewTableName as (
select t1.customer_no,
t1.product_code,
t2.product_code
from yourOldTableName t1
inner join yourOldTableName t2
on t1.customer_no = t2.customer_no
where t1.product_code != t2.product_code
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Choosing the most recent when joining in Snowflake (SQL) - sql

Related

Finding duplicate lines

Counting distinct ID based on date

postgreSQL query - find next lesson

select data from another table with max date in hive

how to interchange date oracle

Categories

Resources