Closest distance of a column - sql

I need to find the two closest distance of each row based on all the values of the column.
I tried to do cross join and used the lead function to find the distance. I am totally not sure how to write it. Please suggest.
select a.id,lead(a.value,b.value) as distance from cluster a , cluster b
Input table:
ID Values
1 12.1
2 11
3 14
4 10
5 9
6 15
7 16
8 8
ID Values Closest_Value
1 12.1 11,10
2 11 9,10
3 14 15,16
4 10 9,11
5 9 8,10
6 15 14,16
7 16 14,15
8 8 9,10

One method uses a cross join and aggregation:
select id, value,
listagg(other_value, ',') within group (order by diff) as near_values
from (select c.id, c.value, c2.value as other_value
abs(c2.value = c.value) as diff,
row_number() over (partition by c.id order by abs(c2.value = c.value)) as seqnum
from cluster c join
cluster c2
on c.id <> c2.id
) c
where seqnum <= 2
group by id, value;
The above is not particularly efficient for larger amounts of data. An alternative is to use lead() and lag() to get the values, unpivot, and aggregate:
with vals as (
select c.id, c.value,
(case when n.n = 1 then prev_value_2
when n.n = 2 then prev_value
when n.n = 3 then next_value
when n.n = 4 then next_value_2
end) as other_value
from (select c.*,
lag(value, 2) over (order by value) as prev_value_2,
lag(value) over (order by value) as prev_value,
lead(value) over (order by value) as next_value,
lead(value, 2) over (order by value) as next_value_2,
from clusters c
) c cross join
(select rownum as n
from clusters
where rownum <= 4
) n -- just a list of 4 numbers
)
select v.id, v.value,
list_agg(other_value, ',') within group (order by diff)
from (select v.*,
abs(other_value - value) as diff
row_number() over (partition by id order by abs(other_value - value)) as seqnum
from vals v
) v
where seqnum <= 2
group by id, value;

Related

How to remove all consecutive numbers in select statement?

If I had a SQL Server query that returns numbers in order like this
1
2
3
5
6
7
9
10
11
how can I remove numbers such that no two adjacent pairs are consecutive by 1? The above should be returned like
3
5
7
9
Is this possible to do?
We can use LEAD and LAG here:
WITH cte AS (
SELECT id, LAG(id) OVER (ORDER BY id) lag_id, LEAD(id) OVER (ORDER BY id) lead_id
FROM yourTable
)
SELECT id
FROM cte
WHERE lag_id <> id - 1 OR lead_id <> id + 1
ORDER BY id;
You can try to use LEAD and LAG window functions and calculation what rows are consecutive by 1.
SELECT Val
FROM (
SELECT *,
LEAD(Val) OVER(ORDER BY Val) - Val gap1,
Val - LAG(Val) OVER(ORDER BY Val) gap2
FROM T
) t1
WHERE gap1 > 1 OR gap2 > 1

Assign column value based on the percentage of rows

In DB2 is there a way to assign a column value based on the first x%, then y% and remaining z% of rows?
I've tried using row_number() function but no luck!
Example below
Assuming that the below example count(id) is already arranged in descending order
Input:
ID count(id)
5 10
3 8
1 5
4 3
2 1
Output:
First 30% rows of the above input should be assigned code H, last 30% of the rows will have code L and remaining will have code M. If 30% of rows evaluates to decimal then round up-to 0 decimal place.
ID code
5 H
3 H
1 M
4 L
2 L
You can use window functions:
select t.id,
(case ntile(3) over (order by count(id) desc)
when 1 then 'H'
when 2 then 'M'
when 3 then 'L'
end) as grp
from t
group by t.id;
This puts them into equal sized groups.
For 30-40-30% split with your conditions, you have to be more careful:
select t.id,
(case when (seqnum - 1.0) < 0.3 * cnt then 'H'
when (seqnum + 1.0) > 0.7 * cnt then 'L'
else 'M'
end) as grp
from (select t.id,
count(*) as cnt,
count(*) over () as num_ids,
row_number() over (order by count(*) desc) as seqnum
from t
group by t.id
) t
Try this:
with t(ID, count_id) as (values
(5, 10)
, (3, 8)
, (1, 5)
, (4, 3)
, (2, 1)
)
select t.*
, case
when pst <=30 then 'H'
when pst <=70 then 'M'
else 'L'
end as code
from
(
select t.*
, rownumber() over (order by count_id desc) as rn
, 100*rownumber() over (order by count_id desc)/nullif(count(1) over(), 0) as pst
from t
) t;
The result is:
ID COUNT_ID RN PST CODE
-- -------- -- --- ----
5 10 1 20 H
3 8 2 40 M
1 5 3 60 M
4 3 4 80 L
2 1 5 100 L

sql numbering the partition of Numbers

I have a set of numbers like this
ID
===
1
2
3
1
2
1
1
2
3
4
5
...
I want to select a new row that increase when fetch next 1 like this
ID number
=== ========
1 1
2 1
3 1
1 2
2 2
1 3
1 4
2 4
3 4
4 4
5 4
Any suggestion ?
Assuming that you have a column o which specify the ordering then you can use a self-join like this:
select d1.o, d1.id, count(*)
from data d1
join data d2 on d1.o >= d2.o and d2.id = 1
group by d1.o, d1.id
DBFiddle DEMO
You can solve this with use of cte and window functions, as follows:
DECLARE #t TABLE (ID INT);
INSERT INTO #t VALUES (1),(2),(3),(1),(2),(1),(1),(2),(3),(4),(5);
WITH cte AS(
SELECT ID, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) rn
FROM #t
),
cte1 AS(
SELECT ID, rn, ROW_NUMBER() OVER (ORDER BY rn) rn2
FROM cte
WHERE ID = 1
)
SELECT c.ID, MAX(rn2) OVER (ORDER BY c.rn) rn
FROM cte c
LEFT JOIN cte1 c1 ON c1.rn = c.rn
ORDER BY c.rn

Select TOP 2 values for each group

I'm having problem with getting only TOP 2 values for each group (groups are in column).
Example :
ID Group Value
1 A 30
2 A 150
3 A 40
4 A 70
5 B 0
6 B 100
7 B 90
I expect my output to be
ID Group Value
1 A 150
2 A 70
3 B 100
4 B 90
Simply, for each group I want just 2 rows with the highest Value
Most databases support the ANSI standard row_number() function. You would use it as:
select group, value
from (select t.*,
row_number() over (partition by group order by value desc) as seqnum
from t
) t
where seqnum <= 2;
To set the id you can use row_number() in the outer query:
select row_number() over (order by group, value) as id,
group, value
from (select t.*,
row_number() over (partition by group order by value desc) as seqnum
from t
) t
where seqnum <= 2;
However, changing the id seems suspicious.
You can use CTE with rank function ROW_NUMBER() .
Here is query to get your result.
;WITH cte AS
( SELECT Group, value,
ROW_NUMBER() OVER (PARTITION BY Group ORDER BY value DESC) AS rn
FROM test
)
SELECT Group, value FROM cte
WHERE rn <= 2
ORDER BY value

SQL Random N rows for each distinct value in column

I have the following table:
Name Field
A 1
B 1
C 1
D 1
E 1
F 1
G 1
H 2
I 2
J 2
K 3
L 3
M 3
N 3
O 3
P 3
Q 3
R 3
S 3
T 3
I need a SQL query which will generate me a set with 5 random rows for each distinct value on column Field.
For example, results expected:
Name Field
A 1
B 1
D 1
E 1
G 1
J 2
I 2
H 2
M 3
Q 3
T 3
S 3
P 3
Is there an easy way to do this? Or should i split that table into more tables and generate random for each table then union them?
You can do this with a CTE using a ROW_NUMBER() whilst PARTITIONing on the Field:
;With Cte As
(
Select Name, Field,
Row_Number() Over (Partition By Field Order By NewId()) RN
From YourTable
)
Select Name, Field
From Cte
Where RN <= 5
SQL Fiddle
You can readily do this with row_number():
select name, field
from (select t.*,
row_number() over (partition by field order by newid()) as seqnum
from t
) t
where seqnum <= 5;
An enhancement to Gordon Linoff's code, This code really helped me if you need criteria in your query.
select *
from (select t.*,
row_number() over (partition by region order by newid()) as seqnum
from MyTable t
WHERE t.program = 'ACME'
) t
where seqnum <= 1500;