How to select similar numbers using SQL? - sql

Here are some example numbers:
987
1001
1004
1009
1010
1016
1020
1050
For example, I would like select the top 4 numbers that close to the given number 1009 (so, the results would be 1001, 1004, 1010 and 1016), how should I write the SQL expression?

Get the distance from the given number by subtracting and using the abs function:
select top 4 Number
from NumberTable
where number <> 1009
order by abs(Number - 1009)
Edit:
As you now mention that you have a very large table, you would need a way to elliminate most of the results first. You could pick the four closest in both direction and then get the correct ones from that:
select top 4 Number
from (
select Number
from (
select top 4 Number
from NumberTable
where number < 1009
order by number desc
)
union all
select Number
from (
select top 4 Number
from NumberTable
where number > 1009
order by number
)
)
order by abs(Number - 1009)
If the numbers are evenly distributed so that you are sure that you can find the numbers in a range like for example +-100 numbers, you can simply get that range first:
select top 4 Number
from (
select Number
from NumberTable
where number between 1009-100 and 1009+100
)
where number <> 1009
order by abs(Number - 1009)

SELECT TOP 4 number
FROM your_table
WHERE number <> #numberToMatch
ORDER BY ABS(number - #numberToMatch)

Taking bits from all other answers on this page!
Assuming an index on the number column this should perform well (at least in SQL Server)
DECLARE #Target int
SET #Target = 1009;
SELECT TOP 4 number
FROM
(
SELECT TOP 4 number from YourTable
WHERE number < #Target
ORDER BY number desc
UNION ALL
SELECT TOP 4 number from YourTable
WHERE number > #Target
ORDER BY number asc
) d
order by abs(number - #Target)

Try this:
DECLARE #Target int
SET #Target = 1009
SELECT TOP 2 number from TABLE
WHERE number < #Target
ORDER BY number desc
UNION
SELECT TOP 2 number from TABLE
WHERE number > #Target
ORDER BY number asc

Related

Progressive Select Query in Oracle Database

I want to write a select query that selects distinct rows of data progressively.
Explaining with an example,
Say i have 5000 accounts selected for repayment of loan, these accounts are ordered in descending order( Account 1st has highest outstanding while account 5000nd will have the lowest).
I want to select 1000 unique accounts 5 times such that the total outstanding amount of repayment in all 5 cases are similar.
i have tried out a few methods by trying to select rownums based on odd/even or other such way, but it's only good for upto 2 distributions. I was expecting more like a A.P. as in maths that selects data progressively.
A naïve method of splitting sets into (for example) 5 bins, numbered 0 to 4, is give each row a unique sequential numeric index and then, in order of size, assign the first 10 rows to bins 0,1,2,3,4,4,3,2,1,0 and then repeat for additional sets of 10 rows:
WITH indexed_values (value, rn) AS (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value DESC) - 1
FROM table_name
),
assign_bins (value, rn, bin) AS (
SELECT value,
rn,
CASE WHEN MOD(rn, 2 * 5) >= 5
THEN 5 - MOD(rn, 5) - 1
ELSE MOD(rn, 5)
END
FROM indexed_values
)
SELECT bin,
COUNT(*) AS num_values,
SUM(value) AS bin_size
FROM assign_bins
GROUP BY bin
Which, for some random data:
CREATE TABLE table_name ( value ) AS
SELECT FLOOR(DBMS_RANDOM.VALUE(1, 1000001)) FROM DUAL CONNECT BY LEVEL <= 1000;
May output:
BIN
NUM_VALUES
BIN_SIZE
0
200
100012502
1
200
100004633
2
200
99980342
3
200
99976774
4
200
100005756
It will not get the bins to have equal values but it is relatively simple and will get a close approximation if your values are approximately evenly distributed.
If you want to select values from a certain bin then:
WITH indexed_values (value, rn) AS (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value DESC) - 1
FROM table_name
),
assign_bins (value, rn, bin) AS (
SELECT value,
rn,
CASE WHEN MOD(rn, 2 * 5) >= 5
THEN 5 - MOD(rn, 5) - 1
ELSE MOD(rn, 5)
END
FROM indexed_values
)
SELECT value
FROM assign_bins
WHERE bin = 0
fiddle

Get rows in SQL by summing up a until certain value is exceeded and stop retrieving

I have to return rows from the database when the value exceeds a certain point.
I should get enough rows to sum up to a value that is greater than my quantity and stop retrieving rows.
Is this possible and does it makes sense?
Can this be transferred into LINQ for EF core?
I am currently stuck with query that will return all the rows...
SELECT [i].[InventoryArticleId], [i].[ArticleId], [i].[ArticleQuantity], [i].[InventoryId]
FROM [InventoryArticle] AS [i]
INNER JOIN [Article] AS [a] ON [i].[ArticleId] = [a].[ArticleId]
WHERE (([i].[ArticleId] = 1) AND ([a].[ArticlePrice] <= 1500))
AND ((
SELECT COALESCE(SUM([i0].[ArticleQuantity]), 0)
FROM [InventoryArticle] AS [i0]
INNER JOIN [Article] AS [a0] ON [i0].[ArticleId] = [a0].[ArticleId]
WHERE ([i0].[ArticleId] = 1) AND ([a0].[ArticlePrice] < 1500)) > 10)
Expected result is one row. If number would be greater than 34, more rows should be added.
You can use a windowed SUM to calculate a running sum ArticleQuantity. It is likely to be far more efficient than self-joining.
The trick is that you need all rows where the running sum up to the previous row is less than the requirement.
You could utilize a ROWS clause of ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING. But then you need to deal with possible NULLs on the first row.
In any event, even a regular running sum should always use ROWS UNBOUNDED PRECEDING, because the default is RANGE UNBOUNDED PRECEDING, which is subtly different and can cause incorrect results, as well as being slower.
DECLARE #requirement int = 10;
SELECT
i.InventoryArticleId,
i.ArticleId,
i.ArticleQuantity,
i.InventoryId
FROM (
SELECT
i.*,
RunningSum = SUM(i.ArticleQuantity) OVER (PARTITION BY i.ArticleId ORDER BY i.InventoryArticleId ROWS UNBOUNDED PRECEDING)
FROM InventoryArticle i
INNER JOIN Article a ON i.ArticleId = a.ArticleId
WHERE i.ArticleId = 1
AND a.ArticlePrice <= 1500
) i
WHERE i.RunningSum - i.ArticleQuantity < #requirement;
You may want to choose a better ordering clause.
EF Core cannot use window functions, unless you specifically define a SqlExpression for it.
My approach would be to:
Filter for the eligible records.
Calculate the running total.
Identify the first record where the running total satisfies your criteria.
Perform a final select of all eligible records up to that point.
Something like the following somewhat stripped down example:
-- Some useful generated data
DECLARE #Inventory TABLE (InventoryArticleId INT, ArticleId INT, ArticleQuantity INT)
INSERT #Inventory(InventoryArticleId, ArticleId, ArticleQuantity)
SELECT TOP 1000
InventoryArticleId = N.n,
ArticleId = N.n % 5,
ArticleQuantity = 5 * N.n
FROM (
-- Generate a range of integers
SELECT n = ones.n + 10*tens.n + 100*hundreds.n + 1000*thousands.n
FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) hundreds(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) thousands(n)
ORDER BY 1
) N
ORDER BY N.n
SELECT * FROM #Inventory
DECLARE #ArticleId INT = 2
DECLARE #QuantityNeeded INT = 500
;
WITH isum as (
SELECT i.*, runningTotalQuantity = SUM(i.ArticleQuantity) OVER(ORDER BY i.InventoryArticleId)
FROM #Inventory i
WHERE i.ArticleId = #ArticleId
)
SELECT isum.*
FROM (
SELECT TOP 1 InventoryArticleId
FROM isum
WHERE runningTotalQuantity >= #QuantityNeeded
ORDER BY InventoryArticleId
) selector
JOIN isum ON isum.InventoryArticleId <= selector.InventoryArticleId
ORDER BY isum.InventoryArticleId
Results:
InventoryArticleId
ArticleId
ArticleQuantity
runningTotalQuantity
2
2
10
10
7
2
35
45
12
2
60
105
17
2
85
190
22
2
110
300
27
2
135
435
32
2
160
595
All of the ORDER BY clauses in the running total calculation, selector, and final select must be consistent and unambiguous (no dups). If a more complex order or preference is needed, it may be necessary to assign a rank value the eligible records before calculating the running total.

Alternative: Sql - SELECT rows until the sum of a row is a certain value

My question is very similar to my previous one posted here:
Sql - SELECT rows until the sum of a row is a certain value
To sum it up, I need to return the rows, until a certain sum is reached, but the difference this time, is that, I need to find the best fit for this sum, I mean, It doesn't have to be sequential. For example:
Let's say I have 5 unpaid receipts from customer 1:
Receipt_id: 1 | Amount: 110€
Receipt_id: 2 | Amount: 110€
Receipt_id: 3 | Amount: 130€
Receipt_id: 4 | Amount: 110€
Receipt_id: 5 | Amount: 190€
So, customer 1 ought to pay me 220€.
Now I need to select the receipts, until this 220€ sum is met and it might be in a straight order, like (receipt 1 + receipt 2) or not in a specific order, like (receipt 1 + receipt 4), any of these situations would be suitable.
I am using SQL Server 2016.
Any additional questions, feel free to ask.
Thanks in advance for all your help.
This query should solve it.
It is a quite dangerous query (containing a recursive CTE), so please be careful!
You can find some documentation here: https://www.essentialsql.com/recursive-ctes-explained/
WITH the_data as (
SELECT *
FROM (
VALUES (1, 1, 110),(1, 2,110),(1, 3,130),(1, 4,110),(1, 5,190),
(2, 1, 10),(2, 2,20),(2, 3,200),(2, 4,190)
) t (user_id, receipt_id, amount)
), permutation /* recursive used here */ as (
SELECT
user_id,
amount as sum_amount,
CAST(receipt_id as varchar(max)) as visited_receipt_id,
receipt_id as max_receipt_id,
1 as i
FROM the_data
WHERE amount > 0 -- remove empty amount
UNION ALL
SELECT
the_data.user_id,
sum_amount + amount as sum_amount,
CAST(concat(visited_receipt_id, ',', CAST(receipt_id as varchar))as varchar(max)) as visited_receipt_id,
receipt_id as max_receipt_id ,
i + 1
FROM the_data
JOIN permutation
ON the_data.user_id = permutation.user_id
WHERE i < 1000 -- max 1000 loops, means any permutation with less than 1000 different receipts
and receipt_id > max_receipt_id -- in order that sum in komutatif , we can check the sum in any unique order ( here we take the order of the reciept_id in fact we do not produce any duplicates )
-- AND sum_amount + amount <= 220 -- ignore everything that is bigger than the expected value (optional)
)
SELECT *
FROM permutation
WHERE sum_amount = 220
in order to select only one combination per user_id, replace the last three lines of the previous query by
SELECT *
FROM (
SELECT *, row_number() OVER (partition by user_id order by random() ) as r
FROM permutation
WHERE sum_amount = 220
) as t
WHERE r = 1
IF your target is to sum only 2 receipts in order to reach your value, this could be a solution:
DECLARE #TARGET INT = 220 --SET YOUR TARGET
, #DIFF INT
, #FIRSTVAL INT
SET #FIRSTVAL = (
SELECT TOP 1 AMOUNT
FROM myRECEIPTS
ORDER BY RECEIPT_ID ASC
)
SELECT TOP 1 *
FROM myRECEIPTS
WHERE AMOUNT = #TARGET - #FIRSTVAL
ORDER BY RECEIPT_ID ASC
this code will do it:
declare #sum1 int
declare #numrows int
set #numrows= 1
set #sum1 =0
while (#sum1 < 10)
begin
select top (#numrows) #sum1=sum(sum1) from receipts
set #numrows +=1
end
select top(#numrows) * from receipts

SQL random number that doesn't repeat within a group

Suppose I have a table:
HH SLOT RN
--------------
1 1 null
1 2 null
1 3 null
--------------
2 1 null
2 2 null
2 3 null
I want to set RN to be a random number between 1 and 10. It's ok for the number to repeat across the entire table, but it's bad to repeat the number within any given HH. E.g.,:
HH SLOT RN_GOOD RN_BAD
--------------------------
1 1 9 3
1 2 4 8
1 3 7 3 <--!!!
--------------------------
2 1 2 1
2 2 4 6
2 3 9 4
This is on Netezza if it makes any difference. This one's being a real headscratcher for me. Thanks in advance!
To get a random number between 1 and the number of rows in the hh, you can use:
select hh, slot, row_number() over (partition by hh order by random()) as rn
from t;
The larger range of values is a bit more challenging. The following calculates a table (called randoms) with numbers and a random position in the same range. It then uses slot to index into the position and pull the random number from the randoms table:
with nums as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9
),
randoms as (
select n, row_number() over (order by random()) as pos
from nums
)
select t.hh, t.slot, hnum.n
from (select hh, randoms.n, randoms.pos
from (select distinct hh
from t
) t cross join
randoms
) hnum join
t
on t.hh = hnum.hh and
t.slot = hnum.pos;
Here is a SQLFiddle that demonstrates this in Postgres, which I assume is close enough to Netezza to have matching syntax.
I am not an expert on SQL, but probably do something like this:
Initialize a counter CNT=1
Create a table such that you sample 1 row randomly from each group and a count of null RN, say C_NULL_RN.
With probability C_NULL_RN/(10-CNT+1) for each row, assign CNT as RN
Increment CNT and go to step 2
Well, I couldn't get a slick solution, so I did a hack:
Created a new integer field called rand_inst.
Assign a random number to each empty slot.
Update rand_inst to be the instance number of that random number within this household. E.g., if I get two 3's, then the second 3 will have rand_inst set to 2.
Update the table to assign a different random number anywhere that rand_inst>1.
Repeat assignment and update until we converge on a solution.
Here's what it looks like. Too lazy to anonymise it, so the names are a little different from my original post:
/* Iterative hack to fill 6 slots with a random number between 1 and 13.
A random number *must not* repeat within a household_id.
*/
update c3_lalfinal a
set a.rand_inst = b.rnum
from (
select household_id
,slot_nbr
,row_number() over (partition by household_id,rnd order by null) as rnum
from c3_lalfinal
) b
where a.household_id = b.household_id
and a.slot_nbr = b.slot_nbr
;
update c3_lalfinal
set rnd = CAST(0.5 + random() * (13-1+1) as INT)
where rand_inst>1
;
/* Repeat until this query returns 0: */
select count(*) from (
select household_id from c3_lalfinal group by 1 having count(distinct(rnd)) <> 6
) x
;

Generating Random Number In Each Row In Oracle Query

I want to select all rows of a table followed by a random number between 1 to 9:
select t.*, (select dbms_random.value(1,9) num from dual) as RandomNumber
from myTable t
But the random number is the same from row to row, only different from each run of the query. How do I make the number different from row to row in the same execution?
Something like?
select t.*, round(dbms_random.value() * 8) + 1 from foo t;
Edit:
David has pointed out this gives uneven distribution for 1 and 9.
As he points out, the following gives a better distribution:
select t.*, floor(dbms_random.value(1, 10)) from foo t;
At first I thought that this would work:
select DBMS_Random.Value(1,9) output
from ...
However, this does not generate an even distribution of output values:
select output,
count(*)
from (
select round(dbms_random.value(1,9)) output
from dual
connect by level <= 1000000)
group by output
order by 1
1 62423
2 125302
3 125038
4 125207
5 124892
6 124235
7 124832
8 125514
9 62557
The reasons are pretty obvious I think.
I'd suggest using something like:
floor(dbms_random.value(1,10))
Hence:
select output,
count(*)
from (
select floor(dbms_random.value(1,10)) output
from dual
connect by level <= 1000000)
group by output
order by 1
1 111038
2 110912
3 111155
4 111125
5 111084
6 111328
7 110873
8 111532
9 110953
you don’t need a select … from dual, just write:
SELECT t.*, dbms_random.value(1,9) RandomNumber
FROM myTable t
If you just use round then the two end numbers (1 and 9) will occur less frequently, to get an even distribution of integers between 1 and 9 then:
SELECT MOD(Round(DBMS_RANDOM.Value(1, 99)), 9) + 1 FROM DUAL