I have a bucket that has different 3-3-4, I need to find the right bucket for the number
1-3 4-6 7-10 11-13 14-16 17-20 21 - 23 24 - 26 27 - 30
What could be the efficient formula to find the right bucket, like number 5 lies in buckets 4-6, 18 lies in the bucket 17-20.
Need to write sql query for that
I would maintain a separate bona fide table containing the range values. Then, join to it to get the output you want.
Table: ranges
start | end
1 | 3
4 | 6
7 | 10
11 | 13
14 | 16
17 | 20
21 | 23
24 | 26
27 | 30
WITH buckets AS (
SELECT 5 AS val UNION ALL
SELECT 18
)
SELECT b.val, CAST(r.start AS VARCHAR(10)) + '-' + CAST(r.end AS VARCHAR(10))
FROM buckets b
INNER JOIN ranges r
ON b.val BETWEEN r.start AND r.end
ORDER BY b.val;
For non-overlapping ranges, you only need the starting value of each range (or, equivalently, the ending). Then INDEX that column. Then this is very efficient:
SELECT ...
FROM ...
WHERE n >= ?
ORDER BY n
LIMIT 1
Because of the INDEX, this runs in O(1). (Or O(logN) if include the cost of the BTree lookup.)
More details (couched in IP-addresses): http://mysql.rjweb.org/doc.php/ipranges
Related
I am new to Presto SQL syntax and and wondering if a function exists that will bin rows into n bins in a certain range.
For example, I have a a table with 1m different integers that range from 1 - 100. What can I do to create 20 bins between 1 and 100 (a bin for 1-5, 6-10, 11-15 ... etc. ) without using 20 separate CASE WHEN statements ? Are there any standard SQL functions that do will perform the binning function?
Any advice would be appreciated!
You can use the standard SQL function width_bucket. For example:
WITH data(value) AS (
SELECT rand(100)+1 FROM UNNEST(sequence(1,10000))
)
SELECT value, width_bucket(value, 1, 101, 20) bucket
FROM data
produces:
value | bucket
-------+--------
100 | 20
98 | 20
38 | 8
42 | 9
67 | 14
74 | 15
6 | 2
...
You can just use integer division:
select (intcol - 1) / 5 as bin
Presto does integer division, so you shouldn't have to worry about the remainder.
I want to sort a string column which can include both numbers and alphabets.
SQL Script:
select distinct a.UoA, b.rating , b.tot from omt_source a left join
wlm_progress_Scored b
on a.UoA = b.UoA
where a.UoA in (select UoA from UserAccess_dev
where trim(App_User) = lower(:APP_USER))
order by
regexp_substr(UoA, '^\D*') ,
to_number(regexp_substr(UoA, '\d+'))--);
Output I'm currently getting:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
23
26B
26A
27
28
30
31
32
33
34B
34A
But, I want 26 and 34 to be in this order
26A
26B
34A
34B
Any suggestion will be much helpful
Thanks
If your first order by clause ensures that the primary sort order is based on the numerical component of the UoA field, then your second order clause could actually be just the UoA field itself. I.e.
order by
regexp_substr(UoA, '^\D*'), UoA;
I'm using Postgres version 9.6.9 and attempting to use width_bucket() to generate a histogram with buckets consisting of equal widths. However, the query I'm using is not returning buckets of equal widths.
As you can see in the example below, the values in the bucket have varying widths. e.g. bucket 1 has a min of 7 and a max of 18 - a width of 11. bucket 3 has a min of 52 and a max of 55 - a width of 3.
How can I adjust my query to ensure that each bucket has the same width?
Here's what the data looks like:
value
-------
7
7
15
17
18
22
23
25
29
42
52
52
55
60
74
85
90
90
92
95
(20 rows)
Here's the query and resulting histogram:
WITH min_max AS (
SELECT
min(value) AS min_val,
max(value) AS max_val
FROM table
)
SELECT
min(value),
max(value),
count(*),
width_bucket(value, min_val, max_val, 5) AS bucket
FROM table, min_max
GROUP BY bucket
ORDER BY bucket;
min | max | count | bucket
-----+-----+-------+--------
7 | 23 | 7 | 1
25 | 42 | 3 | 2
52 | 55 | 3 | 3
60 | 74 | 2 | 4
85 | 92 | 4 | 5
95 | 95 | 1 | 6
( 6 rows )
From https://prestodb.io/docs/current/functions/window.html
Have a look at ntile():
ntile(n) → bigint
Divides the rows for each window partition into n buckets ranging from 1 to at most n. Bucket values will differ by at most 1. If the number of rows in the partition does not divide evenly into the number of buckets, then the remainder values are distributed one per bucket, starting with the first bucket.
For example, with 6 rows and 4 buckets, the bucket values would be as follows: 1 1 2 2 3 4
Or say to rank each runner's 100m race times to find their personal best out of their 10 races:
SELECT
NTILE(10) over (PARTITION BY runners ORDER BY racetimes)
FROM
table
Your buckets are the same size. You just don't have data that accurately represents the end-points.
For instance, would 24 be in the first or second bucket? This is more notable for the ranges without any data, such as 75-83.
From https://www.oreilly.com/library/view/sql-in-a/9780596155322/re91.html
WIDTH_BUCKET( expression, min, max, buckets)
The buckets argument specifies the number of buckets to create over the range defined by min through max. min is inclusive, whereas max is not.
Maximum is not included. so set
WIDTH_BUCKET( expression, min, max + 1, buckets)
I would like to create a script that takes the rows of a table which have a specific mathematical difference in their ASCII sum and to add the rows to a separate table, or even to flag a different field when they have that difference.
For instance, I am looking to find when the ASCII sum of word A and the ASCII sum of word B, both stored in rows of a table, have a difference of 63 or 31.
I could probably use a loop to select these rows, but SQL is not my greatest virtue.
ItemID | asciiSum |ProperDiff
-------|----------|----------
1 | 100 |
2 | 37 |
3 | 69 |
4 | 23 |
5 | 6 |
6 | 38 |
After running the code, the field ProperDiff will be updated to contain 'yes' for ItemID 1,2,3,5,6, since the AsciiSum for 1 and 2 (100-37) = 63 etc.
This will not be fast, but I think it does what you want:
update t
set ProperDiff = 'yes'
where exists (select 1
from t t2
where abs(t2.AsciiSum - t.AsciiSum) in (63, 31)
);
It should work okay on small tables.
Need your help with a SQL query in Oracle db. I have data that I want to partition into groups when event = "Start". E.g. Row 1-6 is a group, row 7-9 is a group. I want to ignore rows with event = "Ignore". Finally I want to calculate max(Value)-min(Value) for these groups. I dont have any way to group the data.
Can this be achieved? Is it possible to use partition by Event = start. Same data is below:
Row Event Value Required Result is max-min of value
1 Start 10
2 A 11
3 B 12
4 C 13
5 D 14
6 E 15 5
--------------------------------------------
7 Start 16
8 A 18
9 B 20 4
--------------------------------------------
10 Start 27
11 A 30
12 B 33
13 C 34 7
--------------------------------------------
14 Ignore 35
--------------------------------------------
15 Ignore 36
--------------------------------------------
16 Start 33
17 A 34
18 B 35
19 C 36
20 D 37
21 E 38 5
--------------------------------------------
Yes, you can do this in SQL.
The following query first finds the group that a row is in, by finding the largest start before the row id. This version uses a correlated subquery for this calculation.
It then does the grouping on the id and does the calculation.
select groupid, max(value) - min(value)
from (select t.*,
(select max(row) from t t2 where t2.row < t.row and t2.event = start
) as groupid
from t
) t
where event <> 'IGNORE'