Counting distinct ID based on date - sql

So I have a table as follows:
ID create_date
001 01/01/2021
002 02/04/2021
003 07/22/2021
004 01/29/2021
005 03/01/2021
ID is unique for the table.
I have another table (below) where these IDs appear multiple times alongside another variable, titled code_id.
ID code_id date data
001 A 01/01/2021 xxx
002 W 02/08/2021 xxx
002 B 03/06/2021 xxx
001 A 01/19/2021 xxx
002 C 05/01/2021 xxx
004 D 12/01/2021 xxx
001 K 01/02/2021 xxx
001 J 01/15/2021 xxx
005 A 03/01/2021 xxx
005 A 03/01/2021 xxx
005 B 03/05/2021 xxx
005 B 03/30/2021 xxx
005 C 03/30/2021 xxx
005 D 04/01/2021 xxx
What I want to do is create a new table (preferably via CTE, but open to join options) which show the distinct count of code_id after both 5 and 30 days from table1.create_date.
So in other words, how many different code_id's appear for each ID after x days from create_date, where x is equal to 5 and 30 respectively.
Here is the resulting table I seek:
ID distinct_code_id_5_day distinct_code_id_30_day distinct_code_id_total
001 2 3 3
002 1 2 3
003 0 0 0
004 0 0 1
005 2 3 4
In the case of ID = 001,we show all code_id's that appeared from 01/01/2021 - 01/05/2021, inclusive for distinct_code_id_5_day and 01/01/2021 - 01/30/2021, inclusive for distinct_code_id_30_day.

You should be able to solve this with a join and a couple iff() with date math:
with ids as (
select split(value, ' ') x, x[0] id, x[1]::date create_date
from table(split_to_table('001 01/01/2021
002 02/04/2021
003 07/22/2021
004 01/29/2021
005 03/01/2021', '\n'))
), data as(
select split(value, ' ') x, x[0] id, x[7] code_id, x[9]::date date, x[11] data
from table(split_to_table('001 A 01/01/2021 xxx
002 W 02/08/2021 xxx
002 B 03/06/2021 xxx
001 A 01/19/2021 xxx
002 C 05/01/2021 xxx
004 D 12/01/2021 xxx
001 K 01/02/2021 xxx
001 J 01/15/2021 xxx
005 A 03/01/2021 xxx
005 A 03/01/2021 xxx
005 B 03/05/2021 xxx
005 B 03/30/2021 xxx
005 C 03/30/2021 xxx
005 D 04/01/2021 xxx', '\n')))
select id, count(distinct code5), count(distinct code30), count(distinct code_id)
from (
select a.id, iff(a.create_date + 5 >= b.date, b.code_id, null) code5
, iff(a.create_date + 30 >= b.date, b.code_id, null) code30
, b.code_id
from ids a
left outer join data b
where a.id=b.id
)
group by 1

Related

Finding duplicate lines

I am looking for the best solution to find duplicated rows that have in in specific column NULL value and some INTGER value as shown bellow.
Result
This is what I expect to get from query
TREATY_NUMBER
SECTION_NUMBER
DT_PERIOD_START
INVOLVEMENT
1
001
20190101
NULL
1
001
20190101
58
1
001
20200101
NULL
1
001
20200101
58
2
001
20200101
NULL
2
001
20200101
77
2
001
20200101
NULL
2
001
20210101
77
I was trying to do something like this to find all TREATY_NUMBERs that have INTEGER value and than to join them to same table to get all data.
select distinct v.*
from STREATY v
join
(select TREATY_NUMBER, SECTION_NUMBER, DT_PERIOD_START, max(INVOLVEMENT) INV
from TREATY group by TREATY_NUMBER, SECTION_NUMBER, DT_PERIOD_START
having count(*) >1) a
on a.TREATY_NUMBER=v.TREATY_NUMBER and a.DT_PERIOD_START=v.DT_PERIOD_START
where a.INV is not null
But in this case I got also a lines that have only INTEGER value but do not have any NULL value
This is what I get now from query
TREATY_NUMBER
SECTION_NUMBER
DT_PERIOD_START
INVOLVEMENT
1
001
20190101
NULL
1
001
20190101
58
1
001
20200101
NULL
1
001
20200101
58
2
001
20200101
NULL
2
001
20200101
77
2
001
20200101
NULL
2
001
20210101
77
6038
001
20200101
6
6038
001
20200101
7
6038
001
20200101
8

Choosing the most recent when joining in Snowflake (SQL)

So I have a list as follows:
Table 1
ID TIMESTAMP GROUP
001 2021-04-01 12:51:12.063 A
001 2021-04-04 12:51:12.063 G
001 2021-04-14 10:47:03.022 B
002 2021-01-13 09:46:23.012 C
003 2021-09-10 03:32:53.043 D
004 2021-04-13 01:12:54.056 D
004 2021-04-13 11:12:26.054 A
004 2021-04-13 21:53:36.023 D
005 2021-04-01 13:53:13.023 F
005 2021-04-11 13:53:13.023 J
003 2022-04-13 20:32:11.011 G
006 2021-08-13 20:32:11.011 G
And I also have a list of events:
TABLE 2
EVENT ID TIMESTAMP
eventA 001 2021-04-02 12:51:12.063
eventB 001 2021-04-13 12:51:12.063
eventA 002 2021-04-01 12:51:12.063
eventA 002 2021-04-13 12:51:12.063
eventA 002 2021-04-14 12:51:12.063
eventA 003 2021-10-17 12:51:12.063
eventB 005 2021-04-10 12:51:12.063
eventB 005 2021-04-21 12:51:12.063
eventA 006 2021-05-01 20:32:11.011
And my goal here is for every event in TABLE 2, I want to join the most recent entry from table 1 based on ID. If there are no preceding entries in Table 1, though they exist, they should be null on the join.
So in short, for every row in Table 2, we need to find the most recent group for that ID based on timestamp.
Final Result
EVENT ID TIMESTAMP group
eventA 001 2021-04-02 12:51:12.063 A
eventB 001 2021-04-13 12:51:12.063 G
eventA 002 2021-04-01 12:51:12.063 NULL
eventA 002 2021-04-13 12:51:12.063 C
eventA 002 2021-04-14 12:51:12.063 C
eventA 003 2021-10-17 12:51:12.063 D
eventB 005 2021-04-10 12:51:12.063 F
eventB 005 2021-04-21 12:51:12.063 J
eventA 006 2021-05-01 20:32:11.011 NULL
So if you do a LEFT JOIN based on prior (equal?) timestamps and then prune the overmatches to just the most recent with a QUALIFY this can be done with:
SELECT t2.event
t2.id
t2.timestamp
t1.group
FROM table2 AS t2
LEFT JOIN table1 AS t1
ON t2.id = t1.id AND t2.timestamp >= t1.timestamp
QUALIFY ROW_NUMBER() OVER (
PARTITON BY t2.id, t2.timestamp
ORDER BY t1.timestamp DESC NULLS LAST
) = 1
ORDER BY 1,2,3;
this will work as long as Table2 has no duplicate ID, Timestamp values
Window functions with QUALIFY ROW_NUMBER() work to get the latest row as Simeon shows. I've found that for this type of join (often called an AsOf join) if the tables are very large this join, find the max timestamp and rejoin approach usually completes faster than using a window function:
select J."EVENT", J.ID, J."TIMESTAMP", "GROUP" from
(select * from T2,
lateral (select max(T1."TIMESTAMP") TS from T1 where T1.ID = T2.ID and T1.TIMESTAMP < T2."TIMESTAMP")) J
left join T1 on J.TS = T1."TIMESTAMP"
;

SQL- selecting two id's who share something in common

I have a table that has player_id, team_id
I want to find all players who played on the same 3 or more teams.
The expected output would be :
player1, player2, number_of_teams
so far i have something like
SELECT player_id as player1, player_id as player2, count(team_id) as number_of_teams
FROM player_history
WHERE ....
Sample Data:
player_id | team_id
--------------------
001 | 23
001 | 15
001 | 21
002 | 23
002 | 21
002 | 15
002 | 34
003 | 23
003 | 15
003 | 34
003 | 21
004 | 12
004 | 11
004 | 23
should return:
player1 | player2 | number_of_teams
-----------------------------------
001 | 002 | 3
001 | 003 | 3
002 | 003 | 4
What you should do is join your table with itself, on the same team but different players, once found, you should group the result table and count
Since I assume there's more than 2 players in each team and you're looking for different players in the same year as implied (not really specified) in your question, I took the liberty to add it to the join conditions
You can, of course, remove it
SELECT
p1,
p2,
COUNT(team_id) as total
FROM
(
SELECT
h1.team_id,
h1.player_id as p1,
h2.player_id as p2
FROM
player_history h1
INNER JOIN player_history h2 ON h1.team_id = h2.team_id AND h1.player_id != h2.player_id AND h1.year = h2.year
GROUP BY
h1.team_id,
h1.player_id
) sameteam
GROUP BY
p1,
p2
HAVING
total >= 3
Notice that your example result doesn't fit the example data. play 4 should not be on the list
SQLFiddle here
hope it helps

how to interchange date oracle

please i have a table like
customer_no product_code
1345 001
1345 002
1345 003
i want a new table that will show me these details
customer_no product_code, product_code
1345 001 002
1345 001 003
1345 002 001
1345 002 003
1345 003 001
1345 003 002
This will give you the desired output.
create yourNewTableName as (
select t1.customer_no,
t1.product_code,
t2.product_code
from yourOldTableName t1
inner join yourOldTableName t2
on t1.customer_no = t2.customer_no
where t1.product_code != t2.product_code
);

How to get difference of price of today and yesterday price for a product and also find out whether product was available yesterday or not?

I have a table "price_hist" in AmazonRedshift (Postgresql) which has product and price data for 10 countries on daily basis twice a day. I want only latest data for each day for each product
For Example below is the table
Country Product Price(string) Created_On
US 001 $2,300 2015/02/16 00:46:20
US 001 $2,300 2015/02/16 13:27:12
DK 006 kr1,700 2015/02/16 00:46:20
DK 006 kr1,700 2015/02/16 13:27:12
US 002 $5,300 2015/02/15 00:46:20
US 002 $5,300 2015/02/15 13:27:12
US 001 $2,200 2015/02/15 00:46:20
US 001 $2,200 2015/02/15 13:27:12
DK 007 kr28 2015/02/15 00:46:20
DK 007 kr28 2015/02/15 13:27:12
US 001 $2,100 2015/02/14 00:46:20
US 002 $5,200 2015/02/14 13:27:12
DK 007 kr9,100 2015/02/14 00:46:20
DK 007 kr9,100 2015/02/14 13:27:12
Now I want a query which should show always data for today and yesterday with price difference and with a flag for product whether it was available yesterday or not.
Required Output :
Country Product P_today p_yesterday p_change flag created_on
US 001 2300 2200 100 Both 2015/02/16 13:27:12
US 002 0 5300 -5300 Removed 2015/02/15 13:27:12
DK 006 1700 0 1700 Added 2015/02/16 13:27:12
DK 007 0 9100 -9100 Removed 2015/02/15 13:27:12
where column P_Change - Show price changes between today's and yesterday's products.
flag - Create a column to reflect new products added in Today's data and the ones which got removed.
You can do it with something like that:
select country,product,P_today,P_yesterday, (P_today - P_yesterday) as P_change ,
CASE
WHEN P_today > 0 and P_yesterday > 0 then 'both'
WHEN P_today = 0 and P_yesterday > 0 then 'removed'
WHEN P_today > 0 and P_yesterday = 0 then 'added'
END
from
(select
isnull(q1.country,q2.country) as country,isnull(q1.product, q2.product) as product ,isnull(q1.price, 0) as P_today, isnull(q2.price,0) as P_yesterday
from
(select * from product where created_on in (select max(created_on) from product where date_trunc('day', created_on) = '2015-02-16 00:00:00+00' group by product,country)) as q1
full outer join
(select * from product where created_on in (select max(created_on) from product where date_trunc('day', created_on) = '2015-02-15 00:00:00+00' group by product,country)) as q2
on q1.country = q2.country and q1.product = q2.product )
I tested it and it gave me something similar to what you are looking for, see below:
country | product | p_today | p_yesterday | p_change | case
---------+---------+---------+-------------+----------+---------
US | 001 | 2300 | 2300 | 0 | both
US | 002 | 0 | 2300 | -2300 | removed
DK | 006 | 700 | 0 | 700 | added
DK | 007 | 0 | 2300 | -2300 | removed
Hope that helps.