I have a many-many relationship table, and I want to find the overlapped groups and merge them into one.
In the example below, user 2 is in groups 7 and 8, so groups 7 and 8 should be merged into one that contains the records 1, 2, 4. The merged group id can be either 7 or 8, it doesn't matter.
user_id
group
1
7
2
7
2
8
4
8
5
9
6
9
I wish to see output like this:
user_id
group
1
7
2
7
4
7
5
9
6
9
Answering my own question here, below is the SQL I built that fits my needs. This is inspired by #pankaj 's answer.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(23,52),
(24,51),(24,52),(25,26)
), group_members as (
select
group_id, array_agg(user_id) users
from data
group by group_id
), overlapped_group as (
select
c1.group_id g1,
c2.group_id g2,
-- c1.users,
-- c2.users,
least(g1, coalesce(g2, g1)) as min_group,
min(min_group) over (partition by g2) as merge_to
from group_members c1
left join
group_members c2 on arrays_overlap(c1.users, c2.users)
and g1 <> g2
), merge_mapping as (
select distinct
g1 as group_id,
iff(g2 is null, g1, min(merge_to) over (partition by g1)) as merge_to
from overlapped_group
)
select
user_id,
m.merge_to as group_id
from data
left join merge_mapping m using(group_id);
This is similar to the one asked earlier, where-in grouping needs to be done to the top level in hierarchy.
The below query aggregates user_id based on group_id into array and then compares those arrays with each other.
When two arrays match they both get same group id.
Once arrays match and they have been assigned their parent group id based on minimum group value, we need to get the top of the hierarchy.
There could also be multiple hierarchies in the data-set, so we set starting point of each hierarchy as NULL.
Lastly, we use hierarchical query to get the final grouping.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(23,52),
(24,51),(24,52),(25,26)
),cte_1 as
(select group_id,array_agg(user_id) arr
from data
group by group_id
), cte_2 as
(select c1.group_id g1, c2.group_id g2 ,
c1.arr arr1, c2.arr arr2,
case when arrays_overlap(arr1, arr2) then g1 end flag,
min(flag) over (partition by g2) grp,
case when g2 <> grp then grp end final_grp
from cte_1 c1, cte_1 c2
), cte_3 as
(select distinct g2, connect_by_root g2 as parent from cte_2
start with final_grp is null
connect by final_grp = prior g2
order by g2
), cte_4 as
(select c3.parent, c1.arr
from cte_1 c1 left join cte_3 c3
where c1.group_id = c3.g2
) select distinct value, parent as final_group
from cte_4,
lateral flatten(input=>arr)
order by value;
VALUE
FINAL_GROUP
1
7
2
7
4
7
5
7
6
7
21
51
22
51
23
51
24
51
25
26
70
7
Adding another query, that is simpler.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(22,52),
(22,53),(23,52),(25,26)
), cte_1 as
(select a.group_id grp1, b.group_id grp2
from data a, data b
where a.user_id = b.user_id
and a.group_id < b.group_id
), cte_2 as
(select grp2, connect_by_root grp1 as parent
from cte_1
start with grp1 not in (select grp2 from cte_1)
connect by grp1 = prior grp2
) select a.user_id,
coalesce(b.parent, a.group_id) final_grp
from data a left join cte_2 b
on a.group_id = b.grp2;
One way:
select user_id, STRTOK(listagg(group, ', ') within group (ORDER BY user_id ),',',1)
from <table>
GROUP BY user_id ORDER BY user_id;
Given the following table with 2 columns:
c1 c2
------------
a1 | b1
a1 | b1
a2 | b2
a2 | b3
a3 | b3
I want to return those values from column c2 where the value of c2 column appears multiple times for the same c1 value. I am doing the following SQL query to return the required result:
SELECT DISTINCT ( c2 ) AS c
FROM ( SELECT c1 , c2 , COUNT (*) AS rowcount
FROM table
GROUP BY c1 , c2 HAVING rowcount > 1 )
Result:
c
---
b1
Is there any alternative SQL statement of the above query?
Based on your description, you can use:
select distinct c1
from (select t.*, count(*) over (partition by c2) as cnt
from t
) t
where cnt >= 2;
Based on your sample results:
select c1
from t
group by c1
having count(*) >= 2;
And based on the revised question:
select c2
from t
group by c2
having count(*) >= 2;
Use count in having clause instead of using subquery:-
select c1
from table
group by c1
having count(c2) > 1
Most answers above will work if you want all the values in c1 that appear more than once in the table (even with the same value on c2).
If you want to measure only values of c1 that may have multiple DISTINCT values on c2 you can use:
SELECT c1
FROM table
GROUP BY c1
HAVING COUNT(DISTINCT c2) > 1
Please consider a hive table - TableA as mentioned below.
This basic SQL syntax works fine when we want to get "all" the rows that matches the condition in the where clause. I want to limit the returned rows to a number - say N - for each of the matches of where clause.
Let me explain with an example:
(1)
Consider this table:
TableA
c1 c2
1. a
1 b
1 c
2. d
2. e
2. f
(2) Consider this query:
SELECT c1, c2
FROM TableA
WHERE c1 in (1,2)
(3) As you can imagine, it would produce this result:
Actual Results:
c1 c2
1. a
1 b
1 c
2. d
2. e
2. f
(4)
Desired Result:
c1 c2
1. a
1 b
2. d
2. e
Question: How do I modify the query in #2) to get the desired output mention in #4).
You can use row_number function to do this.
select c1,c2
from (SELECT c1, c2, row_number() over(partition by c1 order by c2) as rnum
FROM TableA
--add a where clause as needed
) t
where rnum <= 2
Only 2 values for c1
SELECT c1, c2 FROM TableA WHERE c1 = 1 ORDER BY c2 LIMIT 2
UNION ALL
SELECT c1, c2 FROM TableA WHERE c1 = 2 ORDER BY c2 LIMIT 2
More than 2 values, use rank()
select c1,c2 from
(
select c1,c2,rank() over (partition by c1 order by c2) as rank
from TableA
) t
where rank < 3;
I need to populate a column (C3) with autoincrement IDs based on the values of two other columns (Unique ID for Unique C1-C2 values)
Current
C1 C2 C3
------------------------------
X A null
X A null
Y A null
Z B null
Z B null
Z B null
Desired
C1 C2 C3
------------------------------
X A 1
X A 1
Y A 2
Z B 3
Z B 3
Z B 3
Your result is described by:
select c1, c2, dense_rank() over (order by c1)
from t;
You might intend:
select c1, c2, dense_rank() over (order by c1, c2)
from t;
(But this is more complicated than needed for your sample data.)
This depends on the ordering of the values columns themselves. I am guessing that you have some sort of id and you want the rows ordered by that id. The same idea still holds, but you use the minimum id:
select c1, c2,
dense_rank() over (order by minid)
from (select t.*, min(id) over (partition by c1, c2) as minid
from t
) t;
Hi I am new to sql and stuck in a problem.
Below is the sample of my table. This is not the exact table but a sample of what i am trying to achieve
Name Classification Hits
A A1 2
A A2 3
A A3 4
A A4 8
A A5 9
B B1 9
B B2 3
B B3 4
B B4 8
B B5 9
c c1 8
c c2 9
c c3 4
c c4 8
c c5 9
...
And i am looking for the result based on top Hits . For example
Name Classification Hits
A A4 8
A A5 9
B B1 9
B B5 9
c c2 9
c c5 9
i have tried this query
SELECT TOP (2) Name , Classification , Hits
FROM Table4
Group By Name , Classification , Hits
Order By Hits
But i am only getting two values. What i am doing wrong here any suggestions?
You can use a CTE with the Row_Number() function
;WITH CTE AS(
SELECT Name,
Classification,
Hits,
Row_Number() OVER(Partition by name ORDER BY Hits DESC) AS RowNum
FROM Table4
)
SELECT Name,
Classification,
Hits
FROM CTE
WHERE RowNum <= 2
ORDER BY Name, Hits
SQL FIDDLE DEMO
This will also work. Without using ROW_NUMBER().
Select a.* from MyTable as M1
Cross apply
(
Select top 2 * from Mytable m2
where m1.name = m2.name
order by m2.Hits desc
)as a
where a.Classification = m1.Classification
Fiddle Demo
But I don't know about performance.
I'm working from memory so I'm not sure of the syntax, and there's probably a more efficient way to do this, but you'd want to do something like
;with rawdata as (
select Name, Classification, Hits,
Row_number() over (partition by Name order by Hits desc) as x
)
select Name, Classification, Hits from rawdata where x < 3