Find group of N similar numbers in group of N+M numbers

Find group of N similar numbers in group of N+M numbers - sql

I'm trying to find similar values from an array -not just one, but a group of them, while the sum of their element-wise differences is to be the lowest possible value
EXAMPLE:
0
2
4
6
8
9
11
15
16
19
pick 5 numbers
RESULT:
4
6
8
9
11
or
2
4
6
8
9
Where the sum of the element-wise difference of both groups is 7.
The problem is I need to select such group of 1500 numbers from an array of 2927 numbers and I'm not sure if algorithm which takes groups of 0-1500 (indexes)numbers and sums the differences, then goes i+1 until it reaches the 1427-2927 group is effective (finally I would check the smallest sum and which group it belongs to).
Note, that numbers are sorted (doesn't matter if ASC or DESC) and I'm trying to do that using PostgreSQL.
Thanks in advance.

SQL Fiddle
PostgreSQL 9.3 Schema Setup:
A small dataset of random data:
CREATE TABLE test (
id INT,
population INT
);
INSERT INTO TEST VALUES ( 1, 12 );
INSERT INTO TEST VALUES ( 2, 11 );
INSERT INTO TEST VALUES ( 3, 14 );
INSERT INTO TEST VALUES ( 4, 6 );
INSERT INTO TEST VALUES ( 5, 7 );
INSERT INTO TEST VALUES ( 6, 7 );
INSERT INTO TEST VALUES ( 7, 1 );
INSERT INTO TEST VALUES ( 8, 15 );
INSERT INTO TEST VALUES ( 9, 14 );
INSERT INTO TEST VALUES ( 10, 14 );
INSERT INTO TEST VALUES ( 11, 15 );
INSERT INTO TEST VALUES ( 12, 12 );
INSERT INTO TEST VALUES ( 13, 11 );
INSERT INTO TEST VALUES ( 14, 3 );
INSERT INTO TEST VALUES ( 15, 8 );
INSERT INTO TEST VALUES ( 16, 1 );
INSERT INTO TEST VALUES ( 17, 1 );
INSERT INTO TEST VALUES ( 18, 2 );
INSERT INTO TEST VALUES ( 19, 3 );
INSERT INTO TEST VALUES ( 20, 5 );
Query 1:
WITH ordered_sums AS (
SELECT ID,
POPULATION,
ROW_NUMBER() OVER ( ORDER BY POPULATION ) AS RN,
POPULATION - LAG(POPULATION,4) OVER ( ORDER BY POPULATION ) AS DIFFERENCE
FROM test
), minimum_rn AS (
SELECT DISTINCT FIRST_VALUE( RN ) OVER wnd AS optimal_rn
FROM ordered_sums
WINDOW wnd AS ( ORDER BY DIFFERENCE )
)
SELECT ID,
POPULATION
FROM ordered_sums o
INNER JOIN
minimum_rn m
ON ( o.RN BETWEEN m.OPTIMAL_RN - 4 AND m.OPTIMAL_RN )
Results:
| id | population |
|----|------------|
| 10 | 14 |
| 9 | 14 |
| 3 | 14 |
| 11 | 15 |
| 8 | 15 |
The query above will select 5 rows - to change it to select N rows then change the 4s in the LAG function and in the last line to N-1.

Assume the list is a[1], a[2], ..., a[N+M].
Calculate the minimal value of a[i+M-1]-a[i] for i=1 to N+1.
The value(s) of i for which the minimal value is reached are the first indices of M consecutive numbers in which the sum of the element-wise difference is minimal.
One key observation to understand this algorithm is that the "sum of the element-wise differences" of a sequence of sorted integers is simply the difference between the first and last element. E.g. for 4 6 8 9 11 it is 11-4=7.

This solution should work. Row_number() to get the order. self-Join on +1499, then order by the difference of sizes in the pair.
DECLARE #cities TABLE (
city VARCHAR(512)
,size INT
,rownum INT
)
INSERT INTO #cities
SELECT *
,row_number() OVER (
ORDER BY size
) rownum
FROM
rawdata
SELECT *
,d.size - c.size difference
FROM #cities c
INNER JOIN #cities d ON c.rownum + 1499 = d.rownum
WHERE c.rownum <=2927-1499
ORDER BY d.size - c.size

Related

How to create rows based on the range of all values between min and max in Snowflake (SQL)?

Assume I have the following data:
ID
T_Min
T_Max
1
3
5
2
1
4
I would like to create the following table using SQL (Snowflake):
ID
T
1
3
1
4
1
5
2
1
2
2
2
3
2
4
Does someone know how to do this? Thank you very much in advance!

Sample data:
CREATE OR REPLACE TABLE T1 (
ID INT,
T_Min INT,
T_Max INT);
INSERT INTO T1(ID, T_Min, T_Max)
SELECT * FROM VALUES (1, 3, 5), (2, 1, 4) t(ID, T_Min, T_Max);
Solution:
WITH N AS (
SELECT ROW_NUMBER() OVER(ORDER BY SEQ4()) AS T FROM TABLE(GENERATOR(ROWCOUNT => 1000)) -- Set to the maximum value of the difference between T_Max and T_Min
)
SELECT T1.ID, N.T
FROM T1
JOIN N ON N.T BETWEEN T1.T_Min AND T1.T_Max
ORDER BY T1.ID, N.T;

Get data as row per row

How get result as:
Get all from table_1 where ORACLE (ID = 10)
ID DAY ID2
---------------
1 1 10
2 2 10
3 3 10
4 4 10
Structure:
Create table table_1 (
id number primary key,
day_1 number,
day_2 number,
day_3 number,
day_4 number,
day_5 number
)
Insert into table_1 (id,day_1,day_2,day_3,day_4,day_5) values (1,10,null,null,null,null);
Insert into table_1 (id,day_1,day_2,day_3,day_4,day_5) values (2,20,10,20,null,null);
Insert into table_1 (id,day_1,day_2,day_3,day_4,day_5) values (3,null,null,10,null,null);
Insert into table_1 (id,day_1,day_2,day_3,day_4,day_5) values (4,null,null,null,10,null);
Insert into table_1 (id,day_1,day_2,day_3,day_4,day_5) values (5,30,null,null,null,null);
--Note
10 - ORACLE
20 - MSSQL
30 - MYSQL

Use UNPIVOT:
SELECT *
FROM table_1
UNPIVOT (
id2 FOR day IN (
day_1 AS 1,
day_2 AS 2,
day_3 AS 3,
day_4 AS 4,
day_5 AS 5
)
)
WHERE id2 = 10;
Which, for your sample data, outputs:
ID
DAY
ID2
1
1
10
2
2
10
3
3
10
4
4
10
db<>fiddle here

SQL Server - How to query the set of maximum numbers from a list of numbers from top to bottom

Best way to explain this would be through an example. Let's say I have this simple 2 column table:
Id | Score
1 | 10
2 | 5
3 | 20
4 | 15
5 | 20
6 | 25
7 | 30
8 | 30
9 | 10
10 | 40
The query should return the IDs of each item where the max score changed. So, from the top, 10 would be the top score since item 1 has 10 the first time through but then on item 3 it has a score of 20 so it just had a new max score and this continues until the bottom of the table. So eventually, the query will result to:
1, 3, 6, 7, 10
I tried doing a Cursor and loop through the table but I was wondering if there was a much simple way of doing this.
Thanks

Solution (SQL2012+):
SELECT v.MaxScore, MIN(v.Id) AS FirstId
FROM (
SELECT *, MAX(t.Score) OVER(ORDER BY t.Id ASC) AS MaxScore
FROM #Table AS t
) v
GROUP BY v.MaxScore
Demo

one more version,works for versions >= 2008,you can remove apply to make it work for 2005 as well
;with cte
(Id , Score)
as
(
select 1 , 10 union all
select 2 , 5 union all
select 3 , 20 union all
select 4 , 15 union all
select 5 , 20 union all
select 6 , 25 union all
select 7 , 30 union all
select 8 , 30 union all
select 9 , 10 union all
select 10 , 40
)
select min(id)
from
cte c2
cross apply
(select case when score -(select max(score) from cte c1 where c1.id<=c2.id )=0
then 1 else 0 end) b(val)
where val=1
group by Score
Output:
1
3
6
7
10

I think you can just do a MIN on the id with a GROUP BY Score. Like this:
SELECT MIN(Id) FROM table GROUP BY Score

Using LAG function, that returns prev value of score:
DECLARE #Table TABLE(Id int, Score int)
INSERT INTO #Table
VALUES
(1 , 10),
(2 , 10),
(3 , 20),
(4 , 20),
(5 , 20),
(6 , 25),
(7 , 30),
(8 , 30),
(9 , 30),
(10 , 40)
SELECT *
FROM
(
SELECT
*,
LAG(t.Score, 1, NULL) OVER (ORDER BY t.Id) AS PrevScore
FROM #Table AS t
) AS p
WHERE p.Score <> p.PrevScore OR p.PrevScore IS NULL

Try This
declare #scores varchar(max)
select #scores = isnull(#scores+',','')+convert(varchar,min(id))
from #temp group by score
select #scores

SQL speed issue

I am using PostgreSQL and want to use a query like this:
SELECT device, value, id
FROM myTable s
WHERE (SELECT COUNT(*) FROM myTable f WHERE f.id = s.id AND f.value >= s.value ) <= 2
This works but the problem is it takes minutes to execute over large data. Is there a faster way that can happen in seconds? What I am trying to do is take only two items from a row where both values are sorted in asc order.
id | device | value
1 123 40
1 456 30
1 789 45
2 12 10
2 11 9
The above is my table (I know ids are not unique, not my design, but it has a purpose) but within the id lets say id = 1, I want to select id, device and value of the smallest 2, so my result would be 1, 123, 30 and 1, 456, 40 and so on for other ids.
Also, if anyone knows, if you insert sorted data into a database is it a guarantee to read back out in the same order?

Try below query:
SELECT s.device,s.id,s.value
FROM myTable s
INNER JOIN myTable f ON s.id = f. id AND f.value >= s.value
GROUP BY s.device,s.id,s.value
HAVING COUNT(s.id) <= 2

This can be done using window functions:
select id, device, value
from (
select id, device, value,
row_number() over (partition by id order by value) as rn
from the_table
) t
where rn <= 2
order by id, device, value;
Example:
postgres> create table the_table (id integer, device integer, value integer);
CREATE TABLE
postgres> insert into the_table values
...> (1, 123, 40),
...> (1, 456, 30),
...> (1, 789, 45),
...> (2, 12 , 10),
...> (2, 11 , 9);
INSERT 0 5
postgres> select id, device, value
...> from (
...> select id, device, value,
...> row_number() over (partition by id order by value) as rn
...> from the_table
...> ) t
...> where rn <= 2;
id | device | value
----+--------+-------
1 | 123 | 40
1 | 456 | 30
2 | 11 | 9
2 | 12 | 10
(4 rows)

How to select multi record depending on some column's condition?

Say there is a SQL Server table which contain 2 columns: ID, Value
The sample data looks like this:
ID value
------------------
1 30
1 30
2 50
2 50
3 50
When I run this query:
select ID, NEWID(), value
from table1
order by ID
The result looks like this:
1 30 E152AD19-9920-4567-87FF-C4822FD9E485
1 30 54F28C58-ABA9-4DFB-9A80-CE9C4C390CBB
2 50 ........
2 50 ........
3 50 4E5A9E26-FEEC-4CC7-9AC5-96747053B6B2
But what I want is : how many record of ID depending on (sum of value /30 )'s result, for example of ID 2, it's value's sum is 50+50=100, and 100/30=3, so ID 2 will display in query result three times
The final result i want is like this:
1 E152AD19-9920-4567-87FF-C4822FD9E485
1 54F28C58-ABA9-4DFB-9A80-CE9C4C390CBB
2 4E5A9E26-FEEC-4CC7-9AC5-96747053B6B2
2 ....
2 ....
3 D861563E-E01A-4198-9E92-7BEB4678E5D1
Please note ID of 2 display three times, wait for your helps, thanks.

How about something like
CREATE TABLE Table1
([ID] int, [value] int)
;
INSERT INTO Table1
([ID], [value])
VALUES
(1, 30),
(1, 30),
(2, 50),
(2, 50),
(3, 50)
;
;WITH SummedVals AS (
SELECT ID,
SUM(value) / 30 Cnt
FROM Table1
GROUP BY ID
)
, Vals AS (
SELECT ID,
Cnt - 1 Cnt
FROM SummedVals
UNION ALL
SELECT ID,
Cnt - 1 Cnt
FROM Vals
WHERE Cnt > 0
)
SELECT ID,
NEWID()
FROM Vals
ORDER BY 1
SQL Fiddle DEMO

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find group of N similar numbers in group of N+M numbers - sql

Related

How to create rows based on the range of all values between min and max in Snowflake (SQL)?

Get data as row per row

SQL Server - How to query the set of maximum numbers from a list of numbers from top to bottom

SQL speed issue

How to select multi record depending on some column's condition?

Categories

Resources