get the nth-lowest value in a `group by` clause - sql

Here's a tough one: I have data coming back in a temporary table foo in this form:
id n v
-- - -
1 3 1
1 3 10
1 3 100
1 3 201
1 3 300
2 1 13
2 1 21
2 1 300
4 2 1
4 2 7
4 2 19
4 2 21
4 2 300
8 1 11
Grouping by id, I need to get the row with the nth-lowest value for v based on the value in n. For example, for the group with an ID of 1, I need to get the row which has v equal to 100, since 100 is the third-lowest value for v.
Here's what the final results need to look like:
id n v
-- - -
1 3 100
2 1 13
4 2 7
8 1 11
Some notes about the data:
the number of rows for each ID may vary
n will always be the same for every row with a given ID
n for a given ID will never be greater than the number of rows with that ID
the data will already be sorted by id, then v
Bonus points if you can do it in generic SQL instead of oracle-specific stuff, but that's not a requirement (I suspect that rownum may factor prominently in any solutions). It has in my attempts, but I wind up confusing myself before I get a working solution.

I would use row_number function make row number the compare with n column value in CTE, do another CTE to make row number order by v desc.
get rn = 1 which is mean max value in the n number group.
CREATE TABLE foo(
id int,
n int,
v int
);
insert into foo values (1,3,1);
insert into foo values (1,3,10);
insert into foo values (1,3,100);
insert into foo values (1,3,201);
insert into foo values (1,3,300);
insert into foo values (2,1,13);
insert into foo values (2,1,21);
insert into foo values (2,1,300);
insert into foo values (4,2,1);
insert into foo values (4,2,7);
insert into foo values (4,2,19);
insert into foo values (4,2,21);
insert into foo values (4,2,300);
insert into foo values (8,1,11);
Query 1:
with cte as(
select id,n,v
from
(
select t.*, row_number() over(partition by id ,n order by n) as rn
from foo t
) t1
where rn <= n
), maxcte as (
select id,n,v, row_number() over(partition by id ,n order by v desc) rn
from cte
)
select id,n,v
from maxcte
where rn = 1
Results:
| ID | N | V |
|----|---|-----|
| 1 | 3 | 100 |
| 2 | 1 | 13 |
| 4 | 2 | 7 |
| 8 | 1 | 11 |

use window function
select * from
(
select t.*, row_number() over(partition by id ,n order by v) as rn
from foo t
) t1
where t1.rn=t1.n
as ops sample output just need 3rd highest value so i put where condition t1.rn=3 though accodring to description it would be t1.rn=t1.n
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=65abf8d4101d2d1802c1a05ed82c9064

If your database is version 12.1 or higher then there is a much simpler solution:
SELECT DISTINCT ID, n, NTH_VALUE(v,n) OVER (PARTITION BY ID) AS v
FROM foo
ORDER BY ID;
| ID | N | V |
|----|---|-----|
| 1 | 3 | 100 |
| 2 | 1 | 13 |
| 4 | 2 | 7 |
| 8 | 1 | 11 |
Depending on your real data you may have to add an ORDER BY n clause and/or windowing_clause as RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, see NTH_VALUE

Related

Juggling the values of a column in oracle

In a table tab I have a column with the name of col1 and it has 5 rows with values 1 to 5.
col1
1
2
3
4
5
Now I want to write a select query which will juggle the values in col1,distribute it and put those values in new column.
Below output will help you understand my requirement.
col1 New_col
1 3
2 5
3 4
4 1
5 2
Note: If 1 is changed to 3, then no other value in col1 after juggling should result in 3. i have to do it for 500 rows, i am taking a small example for better understanding.
Please let me know if you require further clarification.
This is a step by step approach:
Try it at SQL Fiddle
Oracle 11g R2 Schema Setup:
create table t ( i int );
insert into t values (1);
insert into t values (2);
insert into t values (3);
insert into t values (4);
insert into t values (5);
Step by step query:
with
/*add a random column to shuffle*/
a as
( select i, dbms_random.value as o
from t),
/*get last element to pair it with the first*/
b as
( select i,
o,
last_Value(i) over (ORDER BY o asc
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS i2
from a)
/*pair each element with the next one, take the last one as default*/
select i, LAG(i, 1, i2 ) OVER (ORDER BY o ) AS i3
from b
Results:
| I | I3 |
|---|----|
| 2 | 5 |
| 1 | 2 |
| 3 | 1 |
| 4 | 3 |
| 5 | 4 |
What about this?
SELECT row_number() over (order by 1) col, col1 new_col
FROM tab
ORDER BY DBMS_RANDOM.VALUE
demo

Count groups of NULL values - partition or window?

n | g
---------
1 | 1
2 | NULL
3 | 1
4 | 1
5 | 1
6 | 1
7 | NULL
8 | NULL
9 | NULL
10 | 1
11 | 1
12 | 1
13 | 1
14 | 1
15 | 1
16 | 1
17 | NULL
18 | 1
19 | 1
20 | 1
21 | NULL
22 | 1
23 | 1
24 | 1
25 | 1
26 | NULL
27 | NULL
28 | 1
29 | 1
30 | NULL
31 | 1
From the above column g I should get this result:
x|y
---
1|4
2|1
3|1
where
x stands for the count of contiguous NULLs and
y stands for the times a single group of NULLs occurs.
I.e., there is ...
4 groups of only 1 NULL,
1 group of 2 NULLs and
1 group of 3 NULLs
Compute a running count of not-null values with a window function to form groups, then 2 two nested counts ...
SELECT x, count(*) AS y
FROM (
SELECT grp, count(*) FILTER (WHERE g IS NULL) AS x
FROM (
SELECT g, count(g) OVER (ORDER BY n) AS grp
FROM tbl
) sub1
WHERE g IS NULL
GROUP BY grp
) sub2
GROUP BY 1
ORDER BY 1;
count() only counts not null values.
This includes the preceding row with a not null g in the following group (grp) of NULL values - which has to be removed from the count.
I replaced the HAVING clause I had for that in my initial query with WHERE g IS NULL, like #klin uses in his answer), that's simpler.
Related:
Find ā€œnā€ consecutive free numbers from table
Select longest continuous sequence
If n is a gapless sequence of integer numbers, you can simplify further:
SELECT x, count(*) AS y
FROM (
SELECT grp, count(*) AS x
FROM (
SELECT n - row_number() OVER (ORDER BY n) AS grp
FROM tbl
WHERE g IS NULL
) sub1
GROUP BY 1
) sub2
GROUP BY 1
ORDER BY 1;
Eliminate not null values immediately and deduct the row number from n, thereby arriving at (meaningless) group numbers directly ...
While the only possible value in g is 1, sum() is a smart trick (like #klin provided). But that should be a boolean column then, wouldn't make sense as numeric type. So I assume that's just a simplification of the actual problem in the question.
select x, count(x) y
from (
select s, count(s) x
from (
select *, sum(g) over (order by i) as s
from example
) s
where g isnull
group by 1
) s
group by 1
order by 1;
Test it here.

Count rows in each 'partition' of a table

Disclaimer: I don't mean partition in the window function sense, nor table partitioning; I mean it in the more general sense, i.e. to divide up.
Here's a table:
id | y
----+------------
1 | 1
2 | 1
3 | 1
4 | 2
5 | 2
6 | null
7 | 2
8 | 2
9 | null
10 | null
I'd like to partition by checking equality on y, such that I end up with counts of the number of times each value of y appears contiguously, when sorted on id (i.e. in the order shown).
Here's the output I'm looking for:
y | count
-----+----------
1 | 3
2 | 2
null | 1
2 | 2
null | 2
So reading down the rows in that output we have:
The first partition of three 1's
The first partition of two 2's
The first partition of a null
The second partition of two 2's
The second partition of two nulls
Try:
SELECT y, count(*)
FROM (
SELECT y,
sum( xyz ) OVER (
ORDER BY id
rows between unbounded preceding
and current row
) qwe
FROM (
SELECT *,
case
when y is null and
lag(y) OVER ( ORDER BY id ) is null
then 0
when y = lag(y) OVER ( ORDER BY id )
then 0
else 1 end xyz
FROM table1
) alias
) alias
GROUP BY qwe, y
ORDER BY qwe;
demo: http://sqlfiddle.com/#!15/b1794/12

Stripe the order of a PostgreSQL result set

Let's say I have the following table:
create temp table test (id serial, number integer);
insert into test (number)
values (5), (4), (3), (2), (1), (0);
If I sort by number descending, I get:
select * from test order by number desc;
id | number
---+--------
1 | 5
2 | 4
3 | 3
4 | 2
5 | 1
6 | 0
If I sort by number ascending, I get:
select * from test order by number asc;
6 | 0
5 | 1
4 | 2
3 | 3
2 | 4
1 | 5
How do I stripe the order so that it alternates between ascending and descending per row?
for example:
6 | 0 or 1 | 5
1 | 5 6 | 0
5 | 1 2 | 4
2 | 4 5 | 1
4 | 2 3 | 3
3 | 3 4 | 2
Update
WITH x AS (
SELECT *
, row_number() OVER (ORDER BY number) rn_up
, row_number() OVER (ORDER BY number DESC) rn_down
FROM test
)
SELECT id, number
FROM x
ORDER BY LEAST(rn_up, rn_down), number;
Or:
...
ORDER BY LEAST(rn_up, rn_down), number DESC;
to start with the bigger number.
I had two CTE at first, but one is enough - simpler and faster.
Or like this (similar to the already given answer but slightly shorter :)
WITH x AS (
SELECT *, row_number() OVER (ORDER BY number) rn, count(*) over () as c
FROM test
)
SELECT id, number
FROM x
ORDER BY ABS((c + 1.5) / 2 - rn) DESC;
If the reverse order is needed then it should be
ORDER BY ABS((c + 0.5) / 2 - rn) DESC;

SQL - min() gets the lowest value, max() the highest, what if I want the 2nd (or 5th or nth) lowest value?

The problem I'm trying to solve is that I have a table like this:
a and b refer to point on a different table. distance is the distance between the points.
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 2 | 1 | 2 | 0.2345 | 0 |
| 3 | 1 | 3 | 100 | 0 |
| 4 | 2 | 1 | 1343.2 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
| 6 | 2 | 3 | 110 | 0 |
....
The important column I'm looking is a_id. If I wanted to keep the closet b for each a, I could do something like this:
update mytable set delete = 1 from (select a_id, min(distance) as dist from table group by a_id) as x where a_gid = a_gid and distance > dist;
delete from mytable where delete = 1;
Which would give me a result table like this:
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
....
i.e. I need one row for each value of a_id, and that row should have the lowest value of distance for each a_id.
However I want to keep the 10 closest points for each a_gid. I could do this with a plpgsql function but I'm curious if there is a more SQL-y way.
min() and max() return the smallest and largest, if there was an aggregate function like nth(), which'd return the nth largest/smallest value then I could do this in similar manner to the above.
I'm using PostgeSQL.
Try this:
SELECT *
FROM (
SELECT a_id, (
SELECT b_id
FROM mytable mib
WHERE mib.a_id = ma.a_id
ORDER BY
dist DESC
LIMIT 1 OFFSET s
) AS b_id
FROM (
SELECT DISTINCT a_id
FROM mytable mia
) ma, generate_series (1, 10) s
) ab
WHERE b_id IS NOT NULL
Checked on PostgreSQL 8.3
I love postgres, so it took it as a challenge the second I saw this question.
So, for the table:
Table "pg_temp_29.foo"
Column | Type | Modifiers
--------+---------+-----------
value | integer |
With the values:
SELECT value FROM foo ORDER BY value;
value
-------
0
1
2
3
4
5
6
7
8
9
14
20
32
(13 rows)
You can do a:
SELECT value FROM foo ORDER BY value DESC LIMIT 1 OFFSET X
Where X = 0 for the highest value, 1 for the second highest, 2... And so forth.
This can be further embedded in a subquery to retrieve the value needed. So, to use the dataset provided in the original question we can get the a_ids with the top ten lowest distances by doing:
SELECT a_id, distance FROM mytable
WHERE id IN
(SELECT id FROM mytable WHERE t1.a_id = t2.a_id
ORDER BY distance LIMIT 10);
ORDER BY a_id, distance;
a_id | distance
------+----------
1 | 0.2345
1 | 1
1 | 100
2 | 0.45
2 | 110
2 | 1342.2
Does PostgreSQL have the analytic function rank()? If so try:
select a_id, b_id, distance
from
( select a_id, b_id, distance, rank() over (partition by a_id order by distance) rnk
from mytable
) where rnk <= 10;
This SQL should find you the Nth lowest salary should work in SQL Server, MySQL, DB2, Oracle, Teradata, and almost any other RDBMS: (note: low performance because of subquery)
SELECT * /*This is the outer query part */
FROM mytable tbl1
WHERE (N-1) = ( /* Subquery starts here */
SELECT COUNT(DISTINCT(tbl2.distance))
FROM mytable tbl2
WHERE tbl2.distance < tbl1.distance)
The most important thing to understand in the query above is that the subquery is evaluated each and every time a row is processed by the outer query. In other words, the inner query can not be processed independently of the outer query since the inner query uses the tbl1 value as well.
In order to find the Nth lowest value, we just find the value that has exactly N-1 values lower than itself.