How to rank groups of data? - sql

Given the following, and tasked with ranking the raw data by the SUM(volume) within each group:
group_id volume
1 2
1 3
2 5
3 1
3 3
How can I obtain the following?
group_id volume group_volume rank
1 2 5 1
1 3 5 1
2 5 5 2
3 1 4 3
3 3 4 3
I can get group_volume easily, but am struggling on how to break the ties in rank without grouping by + ranking in a separate subquery and joining in.
SELECT *
, SUM(volume) OVER (PARTITION BY group_id) AS grouped_volume
, ... AS rank
FROM groups

Use CTE and Dense_rank
WITH CTE1 AS (SELECT group_id, volume,
sum(volume) over(partition by group_id) group_volume
from table1)
SELECT A.*, dense_rank() over( order by group_id, group_volume) rank FROM CTE1 A;

Use two levels of window functions for this:
select g.*,
dense_rank() over (order by group_volume desc, group_id) as rank
from (select g.*,
sum(volume) over (partition by group_id) as group_volume
from groups g
) g;
There is no need for a JOIN.

Related

How to find repetitive values per groups using SQL?

I'm trying to find repetitions between rows based on the column. I've tried window functions with row_number() / rank() but they group all the values that are found (similar to GROUP BY) which I do not expect.
How can I find repetitions of the values?
I tried to do something like this:
SELECT *, rank() OVER(PARTITION BY customer ORDER BY id) FROM customers ORDER BY id
And got the following result:
id
customer
rank
1
customer_1
1
2
customer_2
1
3
customer_2
2
4
customer_1
2
5
customer_3
1
6
customer_1
3
What I want to do:
id
customer
rank
1
customer_1
1
2
customer_2
1
3
customer_2
2
4
customer_1
1
5
customer_3
1
6
customer_1
1
You are looking for counts within adjacent rows. This is a type of gaps-and-islands problem. You can define the adjacent rows with the difference of row_numbers() and then enumerate them:
SELECT c.*,
ROW_NUMBER() OVER (PARTITION BY customer, seqnum - seqnum_2 ORDER BY id) as ranking
FROM (SELECT c.*,
ROW_NUMBER() OVER (ORDER BY id) as seqnum,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY id) as seqnum_2
FROM customers c
) c
ORDER BY id
You can use a recursive query:
WITH RECURSIVE repeatitions(id, customer, repeat_count) AS (
SELECT id, customer, 1 as repeat_count
FROM customers
UNION ALL
SELECT c.id, c.customer, r.repeat_count + 1
FROM customers c, repeatitions r
WHERE c.id = r.id + 1 AND c.customer = r.customer
)
SELECT id, customer, repeat_count
FROM repeatitions
ORDER by id
I created a working fiddle to demonstrate it.

SQL : Return joint most frequent values from a column

I have the following table named customerOrders.
ID user order
1 1 2
2 1 3
3 1 1
4 2 1
5 1 5
6 2 4
7 3 1
8 6 2
9 2 2
10 2 3
I want to return to users with most orders. Currently, I have the following QUERY:
SELECT user, COUNT(user) AS UsersWithMostOrders
FROM customerOrders
GROUP BY user
ORDER BY UsersWithMostOrders DESC;
This returns me all the values grouped by total orders like.
user UsersWithMostOrders
1 4
2 4
3 1
6 1
I only want to return the users with most orders. In my case that would be user 1 and 2 since both of them have 4 orders. If I use TOP 1 or LIMIT, it will only return the first user. If I use TOP 2, it will only work in this scenario, it will return invalid data when top two users have different count of orders.
Required Result
user UsersWithMostOrders
1 4
2 4
You can use TOP 1 WITH TIES:
SELECT TOP 1 WITH TIES
[user], COUNT(*) AS UsersWithMostOrders
FROM customerOrders
GROUP BY [user]
ORDER BY UsersWithMostOrders DESC;
See the demo.
Results:
> user | UsersWithMostOrders
> ---: | ------------------:
> 1 | 4
> 2 | 4
Option 1
Should work with most versions of SQL.
select *
from (
select *,
rank() over(order by numOrders desc) as rrank
from (
select `user`, count(*) as numOrders
from customerOrders
group by `user`
) summed
) ranked
where rrank = 1
Play around with the code here
Option 2
If your version of SQL allows window functions (with), here is a much more readable solution which does the same thing
with summed as (
select `user`, count(*) as numOrders
from customerOrders
group by `user`
),
ranked as (
select *,
rank() over(order by numOrders desc) as rrank
from summed
)
select *
from ranked
where rrank = 1
Play around with the code here
You can use a CTE to attain this Req:
;WITH CTE AS(
SELECT [user], COUNT(user) AS UsersWithMostOrders
FROM #T
GROUP BY [user])
SELECT M.* from CTE M
INNER JOIN ( SELECT
MAX(UsersWithMostOrders) AS MaximumOrders FROM CTE) S ON
M.UsersWithMostOrders=S.MaximumOrders
Below Oracle Query can help:
WITH test_table AS
(
SELECT user, COUNT(order) AS total_order , DENSE_RANK() OVER (ORDER BY
total_order desc) AS rank_orders FROM customerOrders
GROUP BY user
)
select * from test_table where rank_orders = 1

how to find the number has more than two consecutive appearences?

The source table:
id num
-------------------
1 1
2 1
3 1
4 2
5 2
6 1
The output:(appear at least 2 times)
num times
--------------
1 3
2 2
Based on the addition logic defined in the comments it appears this is what you're after:
WITH YourTable AS(
SELECT V.id,
V.num
FROM (VALUES(1,1),
(2,1),
(3,1),
(4,2),
(5,2),
(6,1),
(7,1))V(id,num)), --Added extra row due to logic defined in comments
Grps AS(
SELECT YT.id,
YT.num,
ROW_NUMBER() OVER (ORDER BY id) -
ROW_NUMBER() OVER (PARTITION BY Num ORDER BY id) AS Grp
FROM YourTable YT),
Counts AS(
SELECT num,
COUNT(num) AS Times
FROM grps
GROUP BY grp,
num)
SELECT num,
MAX(times) AS times
FROM Counts
GROUP BY num;
This uses a CTE and ROW_NUMBER to define the groups, and then an additional CTE to get the COUNT per group. Finally you can then get the MAX COUNT per num.
I would adress this with a gaps-and-islands technique:
select num, max(cnt)
from (
select num, count(*) cnt
from (
select
id,
num,
row_number() over(order by id) rn1,
row_number() over(partition by num order by id) rn2
from mytable
) t
group by num, rn1 - rn2
) t
group by num
The most inner query computes row numbers over the whole table and within num groups; the difference between the row numbers gives you the group of adjacent records that each record belong to (you can run that subquery independently and follow how the difference evolves to understand more).
Then, the next level count the number of records in each group of adjacent records. The most outer query takes the maximum count of adjacent records in for each num.
Demo on DB Fiddle:
num | (No column name)
--: | ---------------:
1 | 3
2 | 2
this will work for you
select num,count(num) times from Tabl
group by num

How to calculate quartiles grouped by?

Let's say I have a table
VAL PERSON
1 1
2 1
3 1
4 1
2 2
4 2
6 2
3 3
6 3
9 3
12 3
15 3
And I'd like to calculate the quartiles for each person.
I understand I can easily calculate those for a single person as such:
SELECT
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 1;
Will get me the desired results:
VAL QUARTILE
1 1
2 2
3 3
4 4
Problem is, I'd like to do this for every person. I know something like this would do the job:
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 1
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 2
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 3
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 4
But what if there's a new person on the table? Then I'd have to change the SQL code. Any suggestions?
Why don't you try to use partition by.
SELECT
PERSON,
VAL,
NTILE(4) OVER(PARTITION BY PERSON ORDER BY VAL) AS QUARTILE;
FROM TABLE
Greetings
ntile() doesn't handle ties very well. You can easily see this with an example:
select v.x, ntile(2) over (order by x) as tile
from (values (1), (1), (1), (1)) v(x);
which returns:
x tile
1 1
1 1
1 2
1 2
Same value. Different tiles. This gets worse if you are keeping track of which tile a value is in. Different rows can have different tiles on different runs of the same query -- even when the data does not change.
Normally, you would want rows with the same value to have the same quartile, even when the tiles are not the same size. For this reason, I recommend an explicit calculation using rank() instead:
select t.*,
((seqnum - 1) * 4 / cnt) + 1 as quartile
from (select t.*,
rank() over (partition by person order by val) as seqnum,
count(*) over (partition by person) as cnt
from t
) t;
If you actually want values split among tiles, then use row_number() rather than rank().

SQL rank grouping variation

I'm trying to achieve the following "rank" result given the original dataset composed by the column ID and CODE.
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 4
10 A 4
Using the RANK_DENSE instruction over the CODE column i get the following result (with the A code getting the same rank value also after "the break" between the rows)
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 1
10 A 1
Is it possible to achieve the results as shown in the first (example) table, with the A code changing rank when there is a separation between the group formed by id: 1-2-3 and the one formed by id: 9-10 without using a cursor?
Thanks
You want to find sequences of values and give them a rank. You can do this with a difference of row numbers approach. The following assigns a different number to each grouping:
select o.*, dense_rank() over (order by grp, code)
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o;
If you want the assignment in the same order as the original data, then you can order by the id, but that requires an additional window function:
select o.*, dense_rank() over (order by minid) as therank
from (select o.*, min(id) over (partition by grp, code) as minid
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o
) o;
SUM by if current is the same as previous row. Works from SQL Server 2012.
WITH CTE AS (
SELECT id, code,
CASE Code WHEN LAG(CODE) OVER (ORDER BY id) THEN 0 ELSE 1 END AS Diff
FROM Table1)
SELECT id, code, SUM(Diff) OVER (ORDER BY id) FROM CTE
Please also see similar question at How to make row numbering with ordering, partitioning and grouping