SQL RANK but higher if equal?

SQL RANK but higher if equal? - sql

For data:
20
50
50
60
70
If I use RANK I get
1
2
2
4
5
if I use DENSE_RANK I get
1
2
2
3
4
I need for my application this:
1
3
3
4
5

I think you want:
rank() over(order by val) + count(*) over(partition by val) - 1
Actually this would be simpler phrased with just a window count:
count(*) over(order by val)
Demo on DB Fiddle:
select val, count(*) over(order by val) rn
from (values (20), (50), (50), (60), (70)) as t(val)
order by val
val | rn
--: | -:
20 | 1
50 | 3
50 | 3
60 | 4
70 | 5

Related

SQL Sort population by value and place in groups by value

I have to create a report. I’m having trouble figuring how to approach it. On top of that, I don’t have the proper vocabulary to express it, and thusly search for the solution. Please bear with me.
I have a population of accounts. The accounts must be ordered by value. The accounts at bottom 5% of the overall value are placed in a group (Group #5). The remaining 95% of the population are divided into four equal groups (Groups #1-4) by value (not by number of accounts).
The values of the accounts change over time so the results would change over time. I'm hoping to produce an output something like this...
ACC# |VALUE|GROUP|
------+-----+-----+
2615A | 24 | 1
0793A | 24 | 2
0652A | 12 | 3
6758A | 12 | 3
7764A | 6 | 4
8718A | 6 | 4
0155A | 6 | 4
6923A | 5 | 4
8079A | 3 | 5
2265A | 1 | 5
7421A | 1 | 5
I have the option of running it in SQL Server or Oracle(11g). Whichever gets me over the finish line.
Thanks in advance.

I would use row_number() and count() window functions:
select t.*,
(case when seqnum <= (cnt * 0.95 * 0.25) then 1
when seqnum <= (cnt * 0.95 * 0.50) then 2
when seqnum <= (cnt * 0.95 * 0.75) then 3
when seqnum <= (cnt * 0.95 * 1.00) then 4
else 5
end) as grp
from (select t.*,
row_number() over (order by value desc, acc) as seqnum,
count(*) over () as cnt
from t
) t;
Note: rows with the same value can be in different groups -- as in your example data. If you don't want this to be the case, then use rank() instead of row_number().
EDIT:
If you want equal value, just use cumulative sums and totals:
select t.*,
(case when running_value <= (total_value * 0.95 * 0.25) then 1
when running_value <= (total_value * 0.95 * 0.50) then 2
when running_value <= (total_value * 0.95 * 0.75) then 3
when running_value <= (total_value * 0.95 * 1.00) then 4
else 5
end) as grp
from (select t.*,
sum(value) over (order by value desc, acc) as running_value,
sum(value) over () as total_value
from t
) t;

Using a few SUM OVER's seems to get those results somehow.
CREATE TABLE test
(
ID INT IDENTITY(1,1) PRIMARY KEY,
ACC# VARCHAR(5),
[VALUE] INT
);
INSERT INTO test
(ACC#, [VALUE]) VALUES
('2615A', 24),
('0793A', 24),
('0652A', 12),
('6758A', 12),
('7764A', 6),
('8718A', 6),
('0155A', 6),
('6923A', 5),
('8079A', 3),
('2265A', 1),
('7421A', 1);
>
WITH CTE_DATA AS
(
SELECT *,
CASE
WHEN (1.0*SUM([VALUE]) OVER (ORDER BY [VALUE], ID DESC)
/ SUM([VALUE]) OVER ()) <= 0.05
THEN 5
END AS grp
FROM test
)
SELECT ID, ACC#, [VALUE],
COALESCE(grp
, CEILING(FLOOR(
100.0*SUM([VALUE]) OVER (PARTITION BY grp ORDER BY [VALUE] DESC, ID)
/ SUM([VALUE]) OVER (PARTITION BY grp)
)/25)
) AS [GROUP]
FROM CTE_DATA
ORDER BY ID;
ID | ACC# | VALUE | GROUP
-: | :---- | ----: | :----
1 | 2615A | 24 | 1
2 | 0793A | 24 | 2
3 | 0652A | 12 | 3
4 | 6758A | 12 | 3
5 | 7764A | 6 | 4
6 | 8718A | 6 | 4
7 | 0155A | 6 | 4
8 | 6923A | 5 | 4
9 | 8079A | 3 | 5
10 | 2265A | 1 | 5
11 | 7421A | 1 | 5
db<>fiddle here

get the nth-lowest value in a `group by` clause

Here's a tough one: I have data coming back in a temporary table foo in this form:
id n v
-- - -
1 3 1
1 3 10
1 3 100
1 3 201
1 3 300
2 1 13
2 1 21
2 1 300
4 2 1
4 2 7
4 2 19
4 2 21
4 2 300
8 1 11
Grouping by id, I need to get the row with the nth-lowest value for v based on the value in n. For example, for the group with an ID of 1, I need to get the row which has v equal to 100, since 100 is the third-lowest value for v.
Here's what the final results need to look like:
id n v
-- - -
1 3 100
2 1 13
4 2 7
8 1 11
Some notes about the data:
the number of rows for each ID may vary
n will always be the same for every row with a given ID
n for a given ID will never be greater than the number of rows with that ID
the data will already be sorted by id, then v
Bonus points if you can do it in generic SQL instead of oracle-specific stuff, but that's not a requirement (I suspect that rownum may factor prominently in any solutions). It has in my attempts, but I wind up confusing myself before I get a working solution.

I would use row_number function make row number the compare with n column value in CTE, do another CTE to make row number order by v desc.
get rn = 1 which is mean max value in the n number group.
CREATE TABLE foo(
id int,
n int,
v int
);
insert into foo values (1,3,1);
insert into foo values (1,3,10);
insert into foo values (1,3,100);
insert into foo values (1,3,201);
insert into foo values (1,3,300);
insert into foo values (2,1,13);
insert into foo values (2,1,21);
insert into foo values (2,1,300);
insert into foo values (4,2,1);
insert into foo values (4,2,7);
insert into foo values (4,2,19);
insert into foo values (4,2,21);
insert into foo values (4,2,300);
insert into foo values (8,1,11);
Query 1:
with cte as(
select id,n,v
from
(
select t.*, row_number() over(partition by id ,n order by n) as rn
from foo t
) t1
where rn <= n
), maxcte as (
select id,n,v, row_number() over(partition by id ,n order by v desc) rn
from cte
)
select id,n,v
from maxcte
where rn = 1
Results:
| ID | N | V |
|----|---|-----|
| 1 | 3 | 100 |
| 2 | 1 | 13 |
| 4 | 2 | 7 |
| 8 | 1 | 11 |

use window function
select * from
(
select t.*, row_number() over(partition by id ,n order by v) as rn
from foo t
) t1
where t1.rn=t1.n
as ops sample output just need 3rd highest value so i put where condition t1.rn=3 though accodring to description it would be t1.rn=t1.n
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=65abf8d4101d2d1802c1a05ed82c9064

If your database is version 12.1 or higher then there is a much simpler solution:
SELECT DISTINCT ID, n, NTH_VALUE(v,n) OVER (PARTITION BY ID) AS v
FROM foo
ORDER BY ID;
| ID | N | V |
|----|---|-----|
| 1 | 3 | 100 |
| 2 | 1 | 13 |
| 4 | 2 | 7 |
| 8 | 1 | 11 |
Depending on your real data you may have to add an ORDER BY n clause and/or windowing_clause as RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, see NTH_VALUE

Count groups of NULL values - partition or window?

n | g
---------
1 | 1
2 | NULL
3 | 1
4 | 1
5 | 1
6 | 1
7 | NULL
8 | NULL
9 | NULL
10 | 1
11 | 1
12 | 1
13 | 1
14 | 1
15 | 1
16 | 1
17 | NULL
18 | 1
19 | 1
20 | 1
21 | NULL
22 | 1
23 | 1
24 | 1
25 | 1
26 | NULL
27 | NULL
28 | 1
29 | 1
30 | NULL
31 | 1
From the above column g I should get this result:
x|y
---
1|4
2|1
3|1
where
x stands for the count of contiguous NULLs and
y stands for the times a single group of NULLs occurs.
I.e., there is ...
4 groups of only 1 NULL,
1 group of 2 NULLs and
1 group of 3 NULLs

Compute a running count of not-null values with a window function to form groups, then 2 two nested counts ...
SELECT x, count(*) AS y
FROM (
SELECT grp, count(*) FILTER (WHERE g IS NULL) AS x
FROM (
SELECT g, count(g) OVER (ORDER BY n) AS grp
FROM tbl
) sub1
WHERE g IS NULL
GROUP BY grp
) sub2
GROUP BY 1
ORDER BY 1;
count() only counts not null values.
This includes the preceding row with a not null g in the following group (grp) of NULL values - which has to be removed from the count.
I replaced the HAVING clause I had for that in my initial query with WHERE g IS NULL, like #klin uses in his answer), that's simpler.
Related:
Find “n” consecutive free numbers from table
Select longest continuous sequence
If n is a gapless sequence of integer numbers, you can simplify further:
SELECT x, count(*) AS y
FROM (
SELECT grp, count(*) AS x
FROM (
SELECT n - row_number() OVER (ORDER BY n) AS grp
FROM tbl
WHERE g IS NULL
) sub1
GROUP BY 1
) sub2
GROUP BY 1
ORDER BY 1;
Eliminate not null values immediately and deduct the row number from n, thereby arriving at (meaningless) group numbers directly ...
While the only possible value in g is 1, sum() is a smart trick (like #klin provided). But that should be a boolean column then, wouldn't make sense as numeric type. So I assume that's just a simplification of the actual problem in the question.

select x, count(x) y
from (
select s, count(s) x
from (
select *, sum(g) over (order by i) as s
from example
) s
where g isnull
group by 1
) s
group by 1
order by 1;
Test it here.

If the difference between two sequences is bigger than 30, deduct bigger sequence

I'm having a hard time trying to make a query that gets a lot of numbers, a sequence of numbers, and if the difference between two of them is bigger than 30, then the sequence resets from this number. So, I have the following table, which has another column other than the number one, which should be maintained intact:
+----+--------+--------+
| Id | Number | Status |
+----+--------+--------+
| 1 | 1 | OK |
| 2 | 1 | Failed |
| 3 | 2 | Failed |
| 4 | 3 | OK |
| 5 | 4 | OK |
| 6 | 36 | Failed |
| 7 | 39 | OK |
| 8 | 47 | OK |
| 9 | 80 | Failed |
| 10 | 110 | Failed |
| 11 | 111 | OK |
| 12 | 150 | Failed |
| 13 | 165 | OK |
+----+--------+--------+
It should turn it into this one:
+----+--------+--------+
| Id | Number | Status |
+----+--------+--------+
| 1 | 1 | OK |
| 2 | 1 | Failed |
| 3 | 2 | Failed |
| 4 | 3 | OK |
| 5 | 4 | OK |
| 6 | 1 | Failed |
| 7 | 4 | OK |
| 8 | 12 | OK |
| 9 | 1 | Failed |
| 10 | 1 | Failed |
| 11 | 2 | OK |
| 12 | 1 | Failed |
| 13 | 16 | OK |
+----+--------+--------+
Thanks for your attention, I will be available to clear any doubt regarding my problem! :)
EDIT: Sample of this table here: http://sqlfiddle.com/#!6/ded5af

With this test case:
declare #data table (id int identity, Number int, Status varchar(20));
insert #data(number, status) values
( 1,'OK')
,( 1,'Failed')
,( 2,'Failed')
,( 3,'OK')
,( 4,'OK')
,( 4,'OK') -- to be deleted, ensures IDs are not sequential
,(36,'Failed') -- to be deleted, ensures IDs are not sequential
,(36,'Failed')
,(39,'OK')
,(47,'OK')
,(80,'Failed')
,(110,'Failed')
,(111,'OK')
,(150,'Failed')
,(165,'OK')
;
delete #data where id between 6 and 7;
This SQL:
with renumbered as (
select rn = row_number() over (order by id), data.*
from #data data
),
paired as (
select
this.*,
startNewGroup = case when this.number - prev.number >= 30
or prev.id is null then 1 else 0 end
from renumbered this
left join renumbered prev on prev.rn = this.rn -1
),
groups as (
select Id,Number, GroupNo = Number from paired where startNewGroup = 1
)
select
Id
,Number = 1 + Number - (
select top 1 GroupNo
from groups where groups.id <= paired.id
order by GroupNo desc)
,status
from paired
;
yields as desired:
Id Number status
----------- ----------- --------------------
1 1 OK
2 1 Failed
3 2 Failed
4 3 OK
5 4 OK
8 1 Failed
9 4 OK
10 12 OK
11 1 Failed
12 1 Failed
13 2 OK
14 1 Failed
15 16 OK
Update: using the new LAG() function allows somewhat simpler SQL without a self-join early on:
with renumbered as (
select
data.*
,gap = number - lag(number, 1) over (order by number)
from #data data
),
paired as (
select
*,
startNewGroup = case when gap >= 30 or gap is null then 1 else 0 end
from renumbered
),
groups as (
select Id,Number, GroupNo = Number from paired where startNewGroup = 1
)
select
Id
,Number = 1 + Number - ( select top 1 GroupNo
from groups
where groups.id <= paired.id
order by GroupNo desc
)
,status
from paired
;

I don't deserve answer but I think this is even shorter
with gapped as
( select id, number, gap = number - lag(number, 1) over (order by id)
from #data data
),
select Id, status
ReNumber = Number + 1 - isnull( (select top 1 gapped.Number
from gapped
where gapped.id <= data.id
and gap >= 30
order by gapped.id desc), 1)
from #data data;

This is simply Pieter Geerkens's answer slightly simplified. I removed some intermediate results and columns:
with renumbered as (
select data.*, gap = number - lag(number, 1) over (order by number)
from #data data
),
paired as (
select *
from renumbered
where gap >= 30 or gap is null
)
select Id, Number = 1 + Number - (select top 1 Number
from paired
where paired.id <= renumbered.id
order by Number desc)
, status
from renumbered;
It should have been a comment, but it's too long for that and wouldn't be understandable.

You might need to make another cte before this and use row_number instead of ID to join the recursive cte, if your ID's are not in sequential order
WITH cte AS
( SELECT
Id, [Number], [Status],
0 AS Diff,
[Number] AS [NewNumber]
FROM
Table1
WHERE Id = 1
UNION ALL
SELECT
t1.Id, t1.[Number], t1.[Status],
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN t1.Number - 1 ELSE Diff END,
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN 1 ELSE t1.[Number] - Diff END
FROM Table1 t1
JOIN cte ON cte.Id + 1 = t1.Id
)
SELECT Id, [NewNumber], [Status]
FROM cte
SQL Fiddle
Here is another SQL Fiddle with an example of what you would do if the ID is not sequential..
SQL Fiddle 2
In case sql fiddle stops working
--Order table to make sure there is a sequence to follow
WITH OrderedSequence AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY Id) RnId,
Id,
[Number],
[Status]
FROM
Sequence
),
RecursiveCte AS
( SELECT
Id, [Number], [Status],
0 AS Diff,
[Number] AS [NewNumber],
RnId
FROM
OrderedSequence
WHERE Id = 1
UNION ALL
SELECT
t1.Id, t1.[Number], t1.[Status],
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN t1.Number - 1 ELSE Diff END,
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN 1 ELSE t1.[Number] - Diff END,
t1.RnId
FROM OrderedSequence t1
JOIN RecursiveCte cte ON cte.RnId + 1 = t1.RnId
)
SELECT Id, [NewNumber], [Status]
FROM RecursiveCte

I tried to optimize the queries here, since it took 1h20m to process my data. I had it down to 30s after some further research.
WITH AuxTable AS
( SELECT
id,
number,
status,
relevantId = CASE WHEN
number = 1 OR
((number - LAG(number, 1) OVER (ORDER BY id)) > 29)
THEN id
ELSE NULL
END,
deduct = CASE WHEN
((number - LAG(number, 1) OVER (ORDER BY id)) > 29)
THEN number - 1
ELSE 0
END
FROM #data data
)
,AuxTable2 AS
(
SELECT
id,
number,
status,
AT.deduct,
MAX(AT.relevantId) OVER (ORDER BY AT.id ROWS UNBOUNDED PRECEDING ) AS lastRelevantId
FROM AuxTable AT
)
SELECT
id,
number,
status,
number - MAX(deduct) OVER(PARTITION BY lastRelevantId ORDER BY id ROWS UNBOUNDED PRECEDING ) AS ReNumber,
FROM AuxTable2
I think this runs faster, but it's not shorter.

Stripe the order of a PostgreSQL result set

Let's say I have the following table:
create temp table test (id serial, number integer);
insert into test (number)
values (5), (4), (3), (2), (1), (0);
If I sort by number descending, I get:
select * from test order by number desc;
id | number
---+--------
1 | 5
2 | 4
3 | 3
4 | 2
5 | 1
6 | 0
If I sort by number ascending, I get:
select * from test order by number asc;
6 | 0
5 | 1
4 | 2
3 | 3
2 | 4
1 | 5
How do I stripe the order so that it alternates between ascending and descending per row?
for example:
6 | 0 or 1 | 5
1 | 5 6 | 0
5 | 1 2 | 4
2 | 4 5 | 1
4 | 2 3 | 3
3 | 3 4 | 2

Update
WITH x AS (
SELECT *
, row_number() OVER (ORDER BY number) rn_up
, row_number() OVER (ORDER BY number DESC) rn_down
FROM test
)
SELECT id, number
FROM x
ORDER BY LEAST(rn_up, rn_down), number;
Or:
...
ORDER BY LEAST(rn_up, rn_down), number DESC;
to start with the bigger number.
I had two CTE at first, but one is enough - simpler and faster.

Or like this (similar to the already given answer but slightly shorter :)
WITH x AS (
SELECT *, row_number() OVER (ORDER BY number) rn, count(*) over () as c
FROM test
)
SELECT id, number
FROM x
ORDER BY ABS((c + 1.5) / 2 - rn) DESC;
If the reverse order is needed then it should be
ORDER BY ABS((c + 0.5) / 2 - rn) DESC;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL RANK but higher if equal? - sql

For data: 20 50 50 60 70 If I use RANK I get 1 2 2 4 5 if I use DENSE_RANK I get 1 2 2 3 4 I need for my application this: 1 3 3 4 5

Related

SQL Sort population by value and place in groups by value

get the nth-lowest value in a `group by` clause

Count groups of NULL values - partition or window?

If the difference between two sequences is bigger than 30, deduct bigger sequence

Stripe the order of a PostgreSQL result set

Categories

Resources