How to calculate quartiles grouped by? - sql

Let's say I have a table
VAL PERSON
1 1
2 1
3 1
4 1
2 2
4 2
6 2
3 3
6 3
9 3
12 3
15 3
And I'd like to calculate the quartiles for each person.
I understand I can easily calculate those for a single person as such:
SELECT
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 1;
Will get me the desired results:
VAL QUARTILE
1 1
2 2
3 3
4 4
Problem is, I'd like to do this for every person. I know something like this would do the job:
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 1
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 2
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 3
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 4
But what if there's a new person on the table? Then I'd have to change the SQL code. Any suggestions?

Why don't you try to use partition by.
SELECT
PERSON,
VAL,
NTILE(4) OVER(PARTITION BY PERSON ORDER BY VAL) AS QUARTILE;
FROM TABLE
Greetings

ntile() doesn't handle ties very well. You can easily see this with an example:
select v.x, ntile(2) over (order by x) as tile
from (values (1), (1), (1), (1)) v(x);
which returns:
x tile
1 1
1 1
1 2
1 2
Same value. Different tiles. This gets worse if you are keeping track of which tile a value is in. Different rows can have different tiles on different runs of the same query -- even when the data does not change.
Normally, you would want rows with the same value to have the same quartile, even when the tiles are not the same size. For this reason, I recommend an explicit calculation using rank() instead:
select t.*,
((seqnum - 1) * 4 / cnt) + 1 as quartile
from (select t.*,
rank() over (partition by person order by val) as seqnum,
count(*) over (partition by person) as cnt
from t
) t;
If you actually want values split among tiles, then use row_number() rather than rank().

Related

How to Rank By Partition with island and gap issue

Is it possible to rank item by partition without use CTE method
Expected Table
item
value
ID
A
10
1
A
20
1
B
30
2
B
40
2
C
50
3
C
60
3
A
70
4
A
80
4
By giving id to the partition to allow agitated function to work the way I want.
item
MIN
MAX
ID
A
10
20
1
B
30
40
2
C
50
60
3
A
70
80
4
SQL Version: Microsoft SQL Sever 2017
Assuming that the value column provides the intended ordering of the records which we see in your question above, we can try using the difference in row numbers method here. Your problem is a type of gaps and islands problem.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY value) rn1,
ROW_NUMBER() OVER (PARTITION BY item ORDER BY value) rn2
FROM yourTable
)
SELECT item, MIN(value) AS [MIN], MAX(value) AS [MAX], MIN(ID) AS ID
FROM cte
GROUP BY item, rn1 - rn2
ORDER BY MIN(value);
Demo
If you don't want to use a CTE here, for whatever reason, you may simply inline the SQL code in the CTE into the bottom query, as a subquery:
SELECT item, MIN(value) AS [MIN], MAX(value) AS [MAX], MIN(ID) AS ID
FROM
(
SELECT *, ROW_NUMBER() OVER (ORDER BY value) rn1,
ROW_NUMBER() OVER (PARTITION BY item ORDER BY value) rn2
FROM yourTable
) t
GROUP BY item, rn1 - rn2
ORDER BY MIN(value);
You can generate group IDs by analyzing the previous row item value that could be obtained with the LAG function and finally use GROUP BY to get the minimum and maximum value in item groups.
SELECT
item,
MIN(value) AS "min",
MAX(value) AS "max",
group_id + 1 AS id
FROM (
SELECT
*,
SUM(CASE WHEN item = prev_item THEN 0 ELSE 1 END) OVER (ORDER BY value) AS group_id
FROM (
SELECT
*,
LAG(item, 1, item) OVER (ORDER BY value) AS prev_item
FROM t
) items
) groups
GROUP BY item, group_id
Query produces output
item
min
max
id
A
10
20
1
B
30
40
2
C
50
60
3
A
70
80
4
You can check a working demo here

Grouping of PARTITION BY / GROUP BY only until next section to obtain a list of sections

I have a table like this:
id
section
1
6
2
6
3
7
4
7
5
6
and would like to obtain a grouped list that says
section
section_nr
first_id
6
1
1
7
2
3
6
3
5
Using ROW_NUMBER twice I am able to obtain something close:
SELECT section, ROW_NUMBER() OVER (ORDER BY id) AS section_nr, id as first_id
FROM (
SELECT id, section, ROW_NUMBER() OVER (PARTITION BY section ORDER BY id) AS nr_within
FROM X
)
WHERE nr_within = 1
section
section_nr
first_id
6
1
1
7
2
3
... but of course the second section 6 is missing, since PARTITION BY groups all section=6 together. Is it somehow possible to only group until the next section?
More generally (regarding GROUP BY instead of PARTITION BY), is there a simple solution to group (1,1,2,2,1) to (1,2,1) instead of (1,2)?
This is a typical gaps and islands problem that can be solved like this:
with u as
(select id, section,
case when section = lag(section) over(order by id) then 0 else 1 end as grp
from X),
v as
(select id,
section,
sum(grp) over(order by id) as section_nr
from u)
select section,
section_nr,
min(id) as first_id
from v
group by section, section_nr;
Basically you keep tabs in a column where there is a change in section by comparing current section to section from the row above (ordered by id). Whenever there is a change, set this column to 1, when no change set it to 0. The rolling sum of this column will be the section number. Getting first_id is a simple matter of using group by.
Fiddle
That's a classic.
P.S.
If id is indeed a series of integers without gaps, we can use it instead of rn
select section
,row_number() over (order by min(id)) as section_nr
,min(id) as first_id
from (select id
,section
,row_number() over (order by id) as rn
,row_number() over (partition by section order by id) as rn_section
from X
)
group by section
,rn - rn_section
SECTION
SECTION_NR
FIRST_ID
6
1
1
7
2
3
6
3
5
Fiddle

How to rank groups of data?

Given the following, and tasked with ranking the raw data by the SUM(volume) within each group:
group_id volume
1 2
1 3
2 5
3 1
3 3
How can I obtain the following?
group_id volume group_volume rank
1 2 5 1
1 3 5 1
2 5 5 2
3 1 4 3
3 3 4 3
I can get group_volume easily, but am struggling on how to break the ties in rank without grouping by + ranking in a separate subquery and joining in.
SELECT *
, SUM(volume) OVER (PARTITION BY group_id) AS grouped_volume
, ... AS rank
FROM groups
Use CTE and Dense_rank
WITH CTE1 AS (SELECT group_id, volume,
sum(volume) over(partition by group_id) group_volume
from table1)
SELECT A.*, dense_rank() over( order by group_id, group_volume) rank FROM CTE1 A;
Use two levels of window functions for this:
select g.*,
dense_rank() over (order by group_volume desc, group_id) as rank
from (select g.*,
sum(volume) over (partition by group_id) as group_volume
from groups g
) g;
There is no need for a JOIN.

how to find the number has more than two consecutive appearences?

The source table:
id num
-------------------
1 1
2 1
3 1
4 2
5 2
6 1
The output:(appear at least 2 times)
num times
--------------
1 3
2 2
Based on the addition logic defined in the comments it appears this is what you're after:
WITH YourTable AS(
SELECT V.id,
V.num
FROM (VALUES(1,1),
(2,1),
(3,1),
(4,2),
(5,2),
(6,1),
(7,1))V(id,num)), --Added extra row due to logic defined in comments
Grps AS(
SELECT YT.id,
YT.num,
ROW_NUMBER() OVER (ORDER BY id) -
ROW_NUMBER() OVER (PARTITION BY Num ORDER BY id) AS Grp
FROM YourTable YT),
Counts AS(
SELECT num,
COUNT(num) AS Times
FROM grps
GROUP BY grp,
num)
SELECT num,
MAX(times) AS times
FROM Counts
GROUP BY num;
This uses a CTE and ROW_NUMBER to define the groups, and then an additional CTE to get the COUNT per group. Finally you can then get the MAX COUNT per num.
I would adress this with a gaps-and-islands technique:
select num, max(cnt)
from (
select num, count(*) cnt
from (
select
id,
num,
row_number() over(order by id) rn1,
row_number() over(partition by num order by id) rn2
from mytable
) t
group by num, rn1 - rn2
) t
group by num
The most inner query computes row numbers over the whole table and within num groups; the difference between the row numbers gives you the group of adjacent records that each record belong to (you can run that subquery independently and follow how the difference evolves to understand more).
Then, the next level count the number of records in each group of adjacent records. The most outer query takes the maximum count of adjacent records in for each num.
Demo on DB Fiddle:
num | (No column name)
--: | ---------------:
1 | 3
2 | 2
this will work for you
select num,count(num) times from Tabl
group by num

SQL rank grouping variation

I'm trying to achieve the following "rank" result given the original dataset composed by the column ID and CODE.
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 4
10 A 4
Using the RANK_DENSE instruction over the CODE column i get the following result (with the A code getting the same rank value also after "the break" between the rows)
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 1
10 A 1
Is it possible to achieve the results as shown in the first (example) table, with the A code changing rank when there is a separation between the group formed by id: 1-2-3 and the one formed by id: 9-10 without using a cursor?
Thanks
You want to find sequences of values and give them a rank. You can do this with a difference of row numbers approach. The following assigns a different number to each grouping:
select o.*, dense_rank() over (order by grp, code)
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o;
If you want the assignment in the same order as the original data, then you can order by the id, but that requires an additional window function:
select o.*, dense_rank() over (order by minid) as therank
from (select o.*, min(id) over (partition by grp, code) as minid
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o
) o;
SUM by if current is the same as previous row. Works from SQL Server 2012.
WITH CTE AS (
SELECT id, code,
CASE Code WHEN LAG(CODE) OVER (ORDER BY id) THEN 0 ELSE 1 END AS Diff
FROM Table1)
SELECT id, code, SUM(Diff) OVER (ORDER BY id) FROM CTE
Please also see similar question at How to make row numbering with ordering, partitioning and grouping