create a new id, by groups - sql

I want to insert into my table a new ID, which makes it possible to cluster lines into one group. My data contain authors who published together, author 1 (auid1) published with author 2 (auid2). I would like to find out if there are groups of authors in my data who published together and build a network. So every group_id will mark one network.
There is an additional condition: authors belong to the same group if every author published with everyone else in his group. That means, one auid can be in more than one group.
Here is an example of my data:
auid_1 auid_2
--------------------
001 002
008 002
010 007
001 008
007 005
005 010
008 003
007 012
004 005
006 005
004 006
004 009
The result should look like this:
auid_1 auid_2 group_id
---------------------------------
001 002 1
008 002 1
010 007 2
001 008 1
007 005 2
005 010 2
008 003 3
007 012 4
004 005 5
006 005 5
004 006 5
004 009 6
Additional information:
I use Qracle 11g, enterprise edition
We have pairs of IDs, examples:
ID1 ID2
--------
1 2
3 2
1 3
4 5
...
We want to allocate a group ID for all pairs which have a relationship to each other. In my example, ID 1 and 2 and 3 (every ID belongs to the other) are one cluster. The next cluster would be 4, 5, ....
We need a SQL query which do this clustering for us. I think, we need recursion? We donĀ“t know the count of IDs per cluster.
Is it understandable now?

Related

Counting distinct ID based on date

So I have a table as follows:
ID create_date
001 01/01/2021
002 02/04/2021
003 07/22/2021
004 01/29/2021
005 03/01/2021
ID is unique for the table.
I have another table (below) where these IDs appear multiple times alongside another variable, titled code_id.
ID code_id date data
001 A 01/01/2021 xxx
002 W 02/08/2021 xxx
002 B 03/06/2021 xxx
001 A 01/19/2021 xxx
002 C 05/01/2021 xxx
004 D 12/01/2021 xxx
001 K 01/02/2021 xxx
001 J 01/15/2021 xxx
005 A 03/01/2021 xxx
005 A 03/01/2021 xxx
005 B 03/05/2021 xxx
005 B 03/30/2021 xxx
005 C 03/30/2021 xxx
005 D 04/01/2021 xxx
What I want to do is create a new table (preferably via CTE, but open to join options) which show the distinct count of code_id after both 5 and 30 days from table1.create_date.
So in other words, how many different code_id's appear for each ID after x days from create_date, where x is equal to 5 and 30 respectively.
Here is the resulting table I seek:
ID distinct_code_id_5_day distinct_code_id_30_day distinct_code_id_total
001 2 3 3
002 1 2 3
003 0 0 0
004 0 0 1
005 2 3 4
In the case of ID = 001,we show all code_id's that appeared from 01/01/2021 - 01/05/2021, inclusive for distinct_code_id_5_day and 01/01/2021 - 01/30/2021, inclusive for distinct_code_id_30_day.
You should be able to solve this with a join and a couple iff() with date math:
with ids as (
select split(value, ' ') x, x[0] id, x[1]::date create_date
from table(split_to_table('001 01/01/2021
002 02/04/2021
003 07/22/2021
004 01/29/2021
005 03/01/2021', '\n'))
), data as(
select split(value, ' ') x, x[0] id, x[7] code_id, x[9]::date date, x[11] data
from table(split_to_table('001 A 01/01/2021 xxx
002 W 02/08/2021 xxx
002 B 03/06/2021 xxx
001 A 01/19/2021 xxx
002 C 05/01/2021 xxx
004 D 12/01/2021 xxx
001 K 01/02/2021 xxx
001 J 01/15/2021 xxx
005 A 03/01/2021 xxx
005 A 03/01/2021 xxx
005 B 03/05/2021 xxx
005 B 03/30/2021 xxx
005 C 03/30/2021 xxx
005 D 04/01/2021 xxx', '\n')))
select id, count(distinct code5), count(distinct code30), count(distinct code_id)
from (
select a.id, iff(a.create_date + 5 >= b.date, b.code_id, null) code5
, iff(a.create_date + 30 >= b.date, b.code_id, null) code30
, b.code_id
from ids a
left outer join data b
where a.id=b.id
)
group by 1

SQL- selecting two id's who share something in common

I have a table that has player_id, team_id
I want to find all players who played on the same 3 or more teams.
The expected output would be :
player1, player2, number_of_teams
so far i have something like
SELECT player_id as player1, player_id as player2, count(team_id) as number_of_teams
FROM player_history
WHERE ....
Sample Data:
player_id | team_id
--------------------
001 | 23
001 | 15
001 | 21
002 | 23
002 | 21
002 | 15
002 | 34
003 | 23
003 | 15
003 | 34
003 | 21
004 | 12
004 | 11
004 | 23
should return:
player1 | player2 | number_of_teams
-----------------------------------
001 | 002 | 3
001 | 003 | 3
002 | 003 | 4
What you should do is join your table with itself, on the same team but different players, once found, you should group the result table and count
Since I assume there's more than 2 players in each team and you're looking for different players in the same year as implied (not really specified) in your question, I took the liberty to add it to the join conditions
You can, of course, remove it
SELECT
p1,
p2,
COUNT(team_id) as total
FROM
(
SELECT
h1.team_id,
h1.player_id as p1,
h2.player_id as p2
FROM
player_history h1
INNER JOIN player_history h2 ON h1.team_id = h2.team_id AND h1.player_id != h2.player_id AND h1.year = h2.year
GROUP BY
h1.team_id,
h1.player_id
) sameteam
GROUP BY
p1,
p2
HAVING
total >= 3
Notice that your example result doesn't fit the example data. play 4 should not be on the list
SQLFiddle here
hope it helps

SQL QUERy to Retrieve minimum date for a record with multiple rows

Need help for an Oracle query to retrieve account numbers which has minimum effective date.
For example, I have a table like this;
TABLE A
Account_Number Transaction Number Effective_Date
1111 001 01-Jan-2016
1111 002 01-Feb-2016
1111 003 01-Mar-2016
2222 001 01-Jun-2016
2222 002 01-Jul-2016
2222 003 01-Aug-2016
3333 001 01-Dec-2016
3333 002 01-Jan-2017
4444 001 01-May-2014
4444 002 01-Jun-2014
4444 003 01-Jul-2014
Output should be:
1111 01-Jan-2016
2222 01-Jun-2016
3333 01-Dec-2016
4444 01-May-2014
Sounds like you just need the MIN() function. If your question is more complicated than this, please clarify.
select Acct_No, min(Eff_Date)
from The_Table
group by Acct_No

how to interchange date oracle

please i have a table like
customer_no product_code
1345 001
1345 002
1345 003
i want a new table that will show me these details
customer_no product_code, product_code
1345 001 002
1345 001 003
1345 002 001
1345 002 003
1345 003 001
1345 003 002
This will give you the desired output.
create yourNewTableName as (
select t1.customer_no,
t1.product_code,
t2.product_code
from yourOldTableName t1
inner join yourOldTableName t2
on t1.customer_no = t2.customer_no
where t1.product_code != t2.product_code
);

How to get difference of price of today and yesterday price for a product and also find out whether product was available yesterday or not?

I have a table "price_hist" in AmazonRedshift (Postgresql) which has product and price data for 10 countries on daily basis twice a day. I want only latest data for each day for each product
For Example below is the table
Country Product Price(string) Created_On
US 001 $2,300 2015/02/16 00:46:20
US 001 $2,300 2015/02/16 13:27:12
DK 006 kr1,700 2015/02/16 00:46:20
DK 006 kr1,700 2015/02/16 13:27:12
US 002 $5,300 2015/02/15 00:46:20
US 002 $5,300 2015/02/15 13:27:12
US 001 $2,200 2015/02/15 00:46:20
US 001 $2,200 2015/02/15 13:27:12
DK 007 kr28 2015/02/15 00:46:20
DK 007 kr28 2015/02/15 13:27:12
US 001 $2,100 2015/02/14 00:46:20
US 002 $5,200 2015/02/14 13:27:12
DK 007 kr9,100 2015/02/14 00:46:20
DK 007 kr9,100 2015/02/14 13:27:12
Now I want a query which should show always data for today and yesterday with price difference and with a flag for product whether it was available yesterday or not.
Required Output :
Country Product P_today p_yesterday p_change flag created_on
US 001 2300 2200 100 Both 2015/02/16 13:27:12
US 002 0 5300 -5300 Removed 2015/02/15 13:27:12
DK 006 1700 0 1700 Added 2015/02/16 13:27:12
DK 007 0 9100 -9100 Removed 2015/02/15 13:27:12
where column P_Change - Show price changes between today's and yesterday's products.
flag - Create a column to reflect new products added in Today's data and the ones which got removed.
You can do it with something like that:
select country,product,P_today,P_yesterday, (P_today - P_yesterday) as P_change ,
CASE
WHEN P_today > 0 and P_yesterday > 0 then 'both'
WHEN P_today = 0 and P_yesterday > 0 then 'removed'
WHEN P_today > 0 and P_yesterday = 0 then 'added'
END
from
(select
isnull(q1.country,q2.country) as country,isnull(q1.product, q2.product) as product ,isnull(q1.price, 0) as P_today, isnull(q2.price,0) as P_yesterday
from
(select * from product where created_on in (select max(created_on) from product where date_trunc('day', created_on) = '2015-02-16 00:00:00+00' group by product,country)) as q1
full outer join
(select * from product where created_on in (select max(created_on) from product where date_trunc('day', created_on) = '2015-02-15 00:00:00+00' group by product,country)) as q2
on q1.country = q2.country and q1.product = q2.product )
I tested it and it gave me something similar to what you are looking for, see below:
country | product | p_today | p_yesterday | p_change | case
---------+---------+---------+-------------+----------+---------
US | 001 | 2300 | 2300 | 0 | both
US | 002 | 0 | 2300 | -2300 | removed
DK | 006 | 700 | 0 | 700 | added
DK | 007 | 0 | 2300 | -2300 | removed
Hope that helps.