Sql Getting Top 2 results in each classification - sql

Hi I am new to sql and stuck in a problem.
Below is the sample of my table. This is not the exact table but a sample of what i am trying to achieve
Name Classification Hits
A A1 2
A A2 3
A A3 4
A A4 8
A A5 9
B B1 9
B B2 3
B B3 4
B B4 8
B B5 9
c c1 8
c c2 9
c c3 4
c c4 8
c c5 9
...
And i am looking for the result based on top Hits . For example
Name Classification Hits
A A4 8
A A5 9
B B1 9
B B5 9
c c2 9
c c5 9
i have tried this query
SELECT TOP (2) Name , Classification , Hits
FROM Table4
Group By Name , Classification , Hits
Order By Hits
But i am only getting two values. What i am doing wrong here any suggestions?

You can use a CTE with the Row_Number() function
;WITH CTE AS(
SELECT Name,
Classification,
Hits,
Row_Number() OVER(Partition by name ORDER BY Hits DESC) AS RowNum
FROM Table4
)
SELECT Name,
Classification,
Hits
FROM CTE
WHERE RowNum <= 2
ORDER BY Name, Hits
SQL FIDDLE DEMO

This will also work. Without using ROW_NUMBER().
Select a.* from MyTable as M1
Cross apply
(
Select top 2 * from Mytable m2
where m1.name = m2.name
order by m2.Hits desc
)as a
where a.Classification = m1.Classification
Fiddle Demo
But I don't know about performance.

I'm working from memory so I'm not sure of the syntax, and there's probably a more efficient way to do this, but you'd want to do something like
;with rawdata as (
select Name, Classification, Hits,
Row_number() over (partition by Name order by Hits desc) as x
)
select Name, Classification, Hits from rawdata where x < 3

Related

How to select first 5 records and group rest others records in sql?

Suppose I have 2 columns NAME and COUNT.
NAME
COUNT
a1
2
a2
4
a3
5
a4
1
a5
6
a6
2
a7
4
a8
6
a9
7
a10
4
a11
1
I want to select first 5 records and group the rest others as one record( naming that record as others)
The output I need is
NAME
COUNT
a1
2
a2
4
a3
5
a4
1
a5
6
others
24
In others I need sum of all the count values excluding first 5 records.
We can use a union approach with the help of ROW_NUMBER():
WITH cte AS (
SELECT t.*, ROW_NUMBER() OVER (ORDER BY NAME) rn
FROM yourTable t
)
SELECT NAME, COUNT
FROM
(
SELECT NAME, COUNT, 1 AS pos FROM cte WHERE rn <= 5
UNION ALL
SELECT 'others', SUM(COUNT), 2 FROM cte WHERE rn > 5
) t
ORDER BY pos, NAME;

How to merge overlapped groups in Snowflake

I have a many-many relationship table, and I want to find the overlapped groups and merge them into one.
In the example below, user 2 is in groups 7 and 8, so groups 7 and 8 should be merged into one that contains the records 1, 2, 4. The merged group id can be either 7 or 8, it doesn't matter.
user_id
group
1
7
2
7
2
8
4
8
5
9
6
9
I wish to see output like this:
user_id
group
1
7
2
7
4
7
5
9
6
9
Answering my own question here, below is the SQL I built that fits my needs. This is inspired by #pankaj 's answer.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(23,52),
(24,51),(24,52),(25,26)
), group_members as (
select
group_id, array_agg(user_id) users
from data
group by group_id
), overlapped_group as (
select
c1.group_id g1,
c2.group_id g2,
-- c1.users,
-- c2.users,
least(g1, coalesce(g2, g1)) as min_group,
min(min_group) over (partition by g2) as merge_to
from group_members c1
left join
group_members c2 on arrays_overlap(c1.users, c2.users)
and g1 <> g2
), merge_mapping as (
select distinct
g1 as group_id,
iff(g2 is null, g1, min(merge_to) over (partition by g1)) as merge_to
from overlapped_group
)
select
user_id,
m.merge_to as group_id
from data
left join merge_mapping m using(group_id);
This is similar to the one asked earlier, where-in grouping needs to be done to the top level in hierarchy.
The below query aggregates user_id based on group_id into array and then compares those arrays with each other.
When two arrays match they both get same group id.
Once arrays match and they have been assigned their parent group id based on minimum group value, we need to get the top of the hierarchy.
There could also be multiple hierarchies in the data-set, so we set starting point of each hierarchy as NULL.
Lastly, we use hierarchical query to get the final grouping.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(23,52),
(24,51),(24,52),(25,26)
),cte_1 as
(select group_id,array_agg(user_id) arr
from data
group by group_id
), cte_2 as
(select c1.group_id g1, c2.group_id g2 ,
c1.arr arr1, c2.arr arr2,
case when arrays_overlap(arr1, arr2) then g1 end flag,
min(flag) over (partition by g2) grp,
case when g2 <> grp then grp end final_grp
from cte_1 c1, cte_1 c2
), cte_3 as
(select distinct g2, connect_by_root g2 as parent from cte_2
start with final_grp is null
connect by final_grp = prior g2
order by g2
), cte_4 as
(select c3.parent, c1.arr
from cte_1 c1 left join cte_3 c3
where c1.group_id = c3.g2
) select distinct value, parent as final_group
from cte_4,
lateral flatten(input=>arr)
order by value;
VALUE
FINAL_GROUP
1
7
2
7
4
7
5
7
6
7
21
51
22
51
23
51
24
51
25
26
70
7
Adding another query, that is simpler.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(22,52),
(22,53),(23,52),(25,26)
), cte_1 as
(select a.group_id grp1, b.group_id grp2
from data a, data b
where a.user_id = b.user_id
and a.group_id < b.group_id
), cte_2 as
(select grp2, connect_by_root grp1 as parent
from cte_1
start with grp1 not in (select grp2 from cte_1)
connect by grp1 = prior grp2
) select a.user_id,
coalesce(b.parent, a.group_id) final_grp
from data a left join cte_2 b
on a.group_id = b.grp2;
One way:
select user_id, STRTOK(listagg(group, ', ') within group (ORDER BY user_id ),',',1)
from <table>
GROUP BY user_id ORDER BY user_id;

Selecting nth top row based on number of occurrences of value in 3 tables

I have three tables let's say A, B and C. Each of them has column that's named differently, let's say D1, D2 and D3. In those columns I have values between 1 and 26. How do I count occurrences of those values and sort them by that count?
Example:
TableA.D1
1
2
1
1
3
TableB.D2
2
1
1
1
2
3
TableC.D3
2
1
3
So the output for 3rd most common value would look like this:
3 -- number 3 appeared only 3 times
Likewise, output for 2nd most common value would be:
2 -- number 2 appeared 4 times
And output for 1st most common value:
1 -- number 1 appeared 7 times
You probably want :
select top (3) d1
from ((select d1 from tablea ta) union all
(select d2 from tableb tb) union all
(select d3 from tablec tc)
) t
group by d1
order by count(*) desc;
SELECT DQ3.X, DQ3.CNT
(
SELECT DQ2.*, dense_rank() OVER (ORDER BY DQ2.CNT DESC) AS RN
(SELECT DS.X,COUNT(DS.X) CNT FROM
(select D1 as X FROM TableA UNION ALL SELECT D2 AS X FROM TABLE2 UNION ALL SELECT D3 AS X FROM TABLE3) AS DS
GROUP BY DS.X
) DQ2
) DQ3 WHERE DQ3.RN = 3 --the third in the order of commonness - note that 'ties' can be handled differently
One of the things about SQL scripts: they get difficult to read very easily. I'm a big fan of making things as readable as absolute possible. So I'd recommend something like:
declare #topThree TABLE(entry int, cnt int)
select TOP 3 entry,count(*) as cnt
from
(
select d1 as entry from tablea UNION ALL
select d2 as entry from tableb UNION ALL
select d3 as entry from tablec UNION ALL
) as allTablesCombinedSubquery
order by count(*)
select TOP 1 entry
from #topThree
order by cnt desc
... it's extremely readable, and doesn't use any concepts that are tough to grok.

hive - how to select top N elements for each match

Please consider a hive table - TableA as mentioned below.
This basic SQL syntax works fine when we want to get "all" the rows that matches the condition in the where clause. I want to limit the returned rows to a number - say N - for each of the matches of where clause.
Let me explain with an example:
(1)
Consider this table:
TableA
c1 c2
1. a
1 b
1 c
2. d
2. e
2. f
(2) Consider this query:
SELECT c1, c2
FROM TableA
WHERE c1 in (1,2)
(3) As you can imagine, it would produce this result:
Actual Results:
c1 c2
1. a
1 b
1 c
2. d
2. e
2. f
(4)
Desired Result:
c1 c2
1. a
1 b
2. d
2. e
Question: How do I modify the query in #2) to get the desired output mention in #4).
You can use row_number function to do this.
select c1,c2
from (SELECT c1, c2, row_number() over(partition by c1 order by c2) as rnum
FROM TableA
--add a where clause as needed
) t
where rnum <= 2
Only 2 values for c1
SELECT c1, c2 FROM TableA WHERE c1 = 1 ORDER BY c2 LIMIT 2
UNION ALL
SELECT c1, c2 FROM TableA WHERE c1 = 2 ORDER BY c2 LIMIT 2
More than 2 values, use rank()
select c1,c2 from
(
select c1,c2,rank() over (partition by c1 order by c2) as rank
from TableA
) t
where rank < 3;

how to group data based on its sequence and group by other columns

I have a table with 3 columns c1,c2,c3 in Oracle like below:
c1 c2 c3
1 34 2
2 34 2
3 34 2
4 24 2
5 24 2
6 34 2
7 34 2
8 34 1
I need to group the col1 and get the min and max number (of col1) based on its sequence, col2 and col3.
i.e., I need the result as below:
c1_min c1_max c2 c3
1 3 34 2
4 5 24 2
6 7 34 2
8 8 34 1
There are a number of ways to approach a gaps-and-islands problem. As an alternative to Sylvain's lag version - not better, just different - you can use a trick with row numbers calculated analytically based on your grouping fields. This adds a 'chain' psuedocolumn to the table values, which will be unique for each contiguous group of c2/c3 pairs:
select c1, c2, c3,
dense_rank() over (partition by c2, c3 order by c1)
- dense_rank() over (partition by null order by c1) as chain
from t42
order by c1, c2, c3;
(I can't take credit for this - I first saw it here). You can then use that as an inline view to calculate your sum:
select min(c1) as c1_min, max(c1) as c1_max, c2, c3
from (
select c1, c2, c3,
dense_rank() over (partition by c2, c3 order by c1)
- dense_rank() over (partition by null order by c1) as chain
from t42
)
group by c2, c3, chain
order by c1_min;
C1_MIN C1_MAX C2 C3
---------- ---------- ---------- ----------
1 3 34 2
4 5 24 2
6 7 34 2
8 8 34 1
SQL Fiddle showing the intermediate stage too.
You can use other analytic functions like row_number() instead of dense_rank(); they may give slightly different results for some data, but you get the same result with this sample.
If I understand it well, you want to group consecutive rows together. This is far from being trivial. Or at least, I can't find right now a simple way of doing it. For ease of understanding, I will break the query in several steps:
Step 1:
The first thing is to identify your "groups" boundaries. Using the LAG analytic function might help you here:
CASE WHEN LAG("c2", 1) OVER(ORDER BY "c1") = "c2"
AND LAG("c3", 1) OVER(ORDER BY "c1") = "c3"
THEN 0
ELSE 1
END CLK,
T.* FROM T
ORDER BY "c1"
Step 2:
The second step, is to number each of your groups. A simple SUM over partition will do the trick. That leads to:
SELECT SUM(CLK) OVER (ORDER BY "c1"
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) GRP,
V.*
FROM (
SELECT
CASE WHEN LAG("c2", 1) OVER(ORDER BY "c1") = "c2"
AND LAG("c3", 1) OVER(ORDER BY "c1") = "c3"
THEN 0
ELSE 1
END CLK,
T.* FROM T
) V
ORDER BY "c1";
Final step:
Finally, you can wrap that in a simple GROUP BY query to obtain the desired output:
SELECT MIN("c1"), MAX("c1"), "c2", "c3" FROM
(
SELECT SUM(CLK) OVER (ORDER BY "c1"
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) GRP,
V.*
FROM (
SELECT
CASE WHEN LAG("c2", 1) OVER(ORDER BY "c1") = "c2"
AND LAG("c3", 1) OVER(ORDER BY "c1") = "c3"
THEN 0
ELSE 1
END CLK,
T.* FROM T
) V
)
GROUP BY GRP, "c2", "c3"
ORDER BY GRP
See http://sqlfiddle.com/#!4/7d57c/10