I need help in writing a Oracle SQL query to achieve the following.
Let say I have a query that returns about 110,000 sorted unique number values, not necessary from 1 to 110,000, could be any unique numbers and not consecutive. I would like to split them into chunks of 25,000 each and the last chunk holds the rest, 10,000 in this example. and get the min and max of each chunk.
Thanks in advance.
John T.
For this example, I expected to have 5 chunks and the min and max values of each chunk.
Let's ASSUME these numbers are from 1 to 110,000:
Chunk Min Max
1 1 25,000
2 25,001 50,000
3 50,001 75,000
4 75,001 100,000
5 100,001 110,000
For example
with tbl as (
/* sample data */
select round(dbms_random.value() * 1000000) n
from dual
connect by level <= 110000
)
select chunk_no, count(*) cnt, min(n), max(n)
from (
select n, floor((row_number() over(order by n) - 1) / 25000) chunk_no
from tbl
)
group by chunk_no
order by chunk_no
I want to write a select query that selects distinct rows of data progressively.
Explaining with an example,
Say i have 5000 accounts selected for repayment of loan, these accounts are ordered in descending order( Account 1st has highest outstanding while account 5000nd will have the lowest).
I want to select 1000 unique accounts 5 times such that the total outstanding amount of repayment in all 5 cases are similar.
i have tried out a few methods by trying to select rownums based on odd/even or other such way, but it's only good for upto 2 distributions. I was expecting more like a A.P. as in maths that selects data progressively.
A naïve method of splitting sets into (for example) 5 bins, numbered 0 to 4, is give each row a unique sequential numeric index and then, in order of size, assign the first 10 rows to bins 0,1,2,3,4,4,3,2,1,0 and then repeat for additional sets of 10 rows:
WITH indexed_values (value, rn) AS (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value DESC) - 1
FROM table_name
),
assign_bins (value, rn, bin) AS (
SELECT value,
rn,
CASE WHEN MOD(rn, 2 * 5) >= 5
THEN 5 - MOD(rn, 5) - 1
ELSE MOD(rn, 5)
END
FROM indexed_values
)
SELECT bin,
COUNT(*) AS num_values,
SUM(value) AS bin_size
FROM assign_bins
GROUP BY bin
Which, for some random data:
CREATE TABLE table_name ( value ) AS
SELECT FLOOR(DBMS_RANDOM.VALUE(1, 1000001)) FROM DUAL CONNECT BY LEVEL <= 1000;
May output:
BIN
NUM_VALUES
BIN_SIZE
0
200
100012502
1
200
100004633
2
200
99980342
3
200
99976774
4
200
100005756
It will not get the bins to have equal values but it is relatively simple and will get a close approximation if your values are approximately evenly distributed.
If you want to select values from a certain bin then:
WITH indexed_values (value, rn) AS (
SELECT value,
ROW_NUMBER() OVER (ORDER BY value DESC) - 1
FROM table_name
),
assign_bins (value, rn, bin) AS (
SELECT value,
rn,
CASE WHEN MOD(rn, 2 * 5) >= 5
THEN 5 - MOD(rn, 5) - 1
ELSE MOD(rn, 5)
END
FROM indexed_values
)
SELECT value
FROM assign_bins
WHERE bin = 0
fiddle
I have a number for example: 1000 (1)
I have a query that returns different number without any order (2). for example: 100,300,1000,400,500,600
I want to write a query (not a loop) that sum my numbers in (2) till the sum be in the range of (1000-300 , 1000+ 300) -> (700,1300)
for example : 100+300+400 could be an answer or 400+500 or ...
P.S : the first order of numbers that is in that range is an answer.
Not sure if I understood your question fully, but you may be able to achieve this using the windowing clause of analytic functions.
I created a sample table number_list with the values you'd provided. Assuming (2) to be the output from below query ..
SQL> select * from number_list;
VALUE
----------
100
300
1000
400
500
600
6 rows selected.
.. you now need the first list of numbers who's sum falls within a certain range i.e. (1000 - 300) and (1000 + 300) ..
SQL> with sorted_list as
2 (
3 select rownum rnum, value from
4 ( select value from number_list order by value ) -- sort values ascending
5 )
6 select value from sorted_list where rnum <= (
7 select min(rnum) from ( -- determine first value from sorted list to fall in the specified range
8 select rownum rnum, value,
9 sum(value) over ( order by null
10 rows between
11 unbounded preceding -- indicate that the window starts at the first row
12 and current row -- indicate that the window ends at the current row
13 ) sum
14 from sorted_list
15 ) where sum between (1000-300) and (1000+300)
16 );
VALUE
----------
100
300
400
The task is the following: select 20 rows from dual table with randomly generated distinct numbers from 23 to 45.
I performed the following:
select distinct floor(dbms_random.value(23,45)) output
from dual
connect by rownum <= 20;
But it selects random number of rows less than 20. For example:
OUTPUT
44
35
25
27
40
32
26
36
43
34
31
33
37
13 rows selected.
Please help, how to select exactly 20 numbers, not less? Lot of thanks in advance!
Use a row generator to generate all the numbers; order them randomly using DBMS_RANDOM.VALUE and then get the first 20 rows:
SELECT OUTPUT
FROM (
SELECT 22 + LEVEL AS OUTPUT
FROM DUAL
CONNECT BY 22 + LEVEL <= 45
ORDER BY DBMS_RANDOM.VALUE
)
WHERE ROWNUM <= 20
Why your code does not work:
The code you are using may randomly generate 20 distinct numbers but it is highly likely that it will not as it will generate 20 rows of random integers between 23 and 45 and then the DISTINCT clause will remove all the duplicates and you are likely to have duplicates which will reduce the final number of rows below 20.
Mathematically, the first row it generates will be unique then there is a 22-in-23 chance the second row is unique and, given the previous rows are unique, a 21-in-23 chance the 3rd row is unique and ... a 4-in-23 chance the 20th row is unique. Multiplying all those probabilities together:
SELECT probabilities ( number_of_rows, probability ) AS (
SELECT 1, 1 FROM DUAL
UNION ALL
SELECT number_of_rows + 1, probability * ( 23 - number_of_rows ) / 23
FROM probabilities
WHERE number_of_rows < 20
)
SELECT * FROM probabilities;
Gives a probability of 0.0000025 that you will generate all 20 rows with your method - possible but improbable.
If I have a table like this:
pkey age
---- ---
1 8
2 5
3 12
4 12
5 22
I can "group by" to get a count of each age.
select age,count(*) n from tbl group by age;
age n
--- -
5 1
8 1
12 2
22 1
What query can I use to group by age ranges?
age n
----- -
1-10 2
11-20 2
20+ 1
I'm on 10gR2, but I'd be interested in any 11g-specific approaches as well.
SELECT CASE
WHEN age <= 10 THEN '1-10'
WHEN age <= 20 THEN '11-20'
ELSE '21+'
END AS age,
COUNT(*) AS n
FROM age
GROUP BY CASE
WHEN age <= 10 THEN '1-10'
WHEN age <= 20 THEN '11-20'
ELSE '21+'
END
Try:
select to_char(floor(age/10) * 10) || '-'
|| to_char(ceil(age/10) * 10 - 1)) as age,
count(*) as n from tbl group by floor(age/10);
What you are looking for, is basically the data for a histogram.
You would have the age (or age-range) on the x-axis and the count n (or frequency) on the y-axis.
In the simplest form, one could simply count the number of each distinct age value like you already described:
SELECT age, count(*)
FROM tbl
GROUP BY age
When there are too many different values for the x-axis however, one may want to create groups (or clusters or buckets). In your case, you group by a constant range of 10.
We can avoid writing a WHEN ... THEN line for each range - there could be hundreds if it were not about age. Instead, the approach by #MatthewFlaschen is preferable for the reasons mentioned by #NitinMidha.
Now let's build the SQL...
First, we need to split the ages into range-groups of 10 like so:
0-9
10-19
20 - 29
etc.
This can be achieved by dividing the age column by 10 and then calculating the result's FLOOR:
FLOOR(age/10)
"FLOOR returns the largest integer equal to or less than n"
http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions067.htm#SQLRF00643
Then we take the original SQL and replace age with that expression:
SELECT FLOOR(age/10), count(*)
FROM tbl
GROUP BY FLOOR(age/10)
This is OK, but we cannot see the range, yet. Instead we only see the calculated floor values which are 0, 1, 2 ... n.
To get the actual lower bound, we need to multiply it with 10 again so we get 0, 10, 20 ... n:
FLOOR(age/10) * 10
We also need the upper bound of each range which is lower bound + 10 - 1 or
FLOOR(age/10) * 10 + 10 - 1
Finally, we concatenate both into a string like this:
TO_CHAR(FLOOR(age/10) * 10) || '-' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1)
This creates '0-9', '10-19', '20-29' etc.
Now our SQL looks like this:
SELECT
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1),
COUNT(*)
FROM tbl
GROUP BY FLOOR(age/10)
Finally, apply an order and nice column aliases:
SELECT
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1) AS range,
COUNT(*) AS frequency
FROM tbl
GROUP BY FLOOR(age/10)
ORDER BY FLOOR(age/10)
However, in more complex scenarios, these ranges might not be grouped into constant chunks of size 10, but need dynamical clustering.
Oracle has more advanced histogram functions included, see http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366
Credits to #MatthewFlaschen for his approach; I only explained the details.
Here is a solution which creates a "range" table in a sub-query and then uses this to partition the data from the main table:
SELECT DISTINCT descr
, COUNT(*) OVER (PARTITION BY descr) n
FROM age_table INNER JOIN (
select '1-10' descr, 1 rng_start, 10 rng_stop from dual
union (
select '11-20', 11, 20 from dual
) union (
select '20+', 21, null from dual
)) ON age BETWEEN nvl(rng_start, age) AND nvl(rng_stop, age)
ORDER BY descr;
I had to group data by how many transactions appeared in an hour. I did this by extracting the hour from the timestamp:
select extract(hour from transaction_time) as hour
,count(*)
from table
where transaction_date='01-jan-2000'
group by
extract(hour from transaction_time)
order by
extract(hour from transaction_time) asc
;
Giving output:
HOUR COUNT(*)
---- --------
1 9199
2 9167
3 9997
4 7218
As you can see this gives a nice easy way of grouping the number of records per hour.
add an age_range table and an age_range_id field to your table and group by that instead.
// excuse the DDL but you should get the idea
create table age_range(
age_range_id tinyint unsigned not null primary key,
name varchar(255) not null);
insert into age_range values
(1, '18-24'),(2, '25-34'),(3, '35-44'),(4, '45-54'),(5, '55-64');
// again excuse the DML but you should get the idea
select
count(*) as counter, p.age_range_id, ar.name
from
person p
inner join age_range ar on p.age_range_id = ar.age_range_id
group by
p.age_range_id, ar.name order by counter desc;
You can refine this idea if you like - add from_age to_age columns in the age_range table etc - but i'll leave that to you.
hope this helps :)
If using Oracle 9i+, you might be able to use the NTILE analytic function:
WITH tiles AS (
SELECT t.age,
NTILE(3) OVER (ORDER BY t.age) AS tile
FROM TABLE t)
SELECT MIN(t.age) AS min_age,
MAX(t.age) AS max_age,
COUNT(t.tile) As n
FROM tiles t
GROUP BY t.tile
The caveat to NTILE is that you can only specify the number of partitions, not the break points themselves. So you need to specify a number that is appropriate. IE: With 100 rows, NTILE(4) will allot 25 rows to each of the four buckets/partitions. You can not nest analytic functions, so you'd have to layer them using subqueries/subquery factoring to get desired granularity. Otherwise, use:
SELECT CASE t.age
WHEN BETWEEN 1 AND 10 THEN '1-10'
WHEN BETWEEN 11 AND 20 THEN '11-20'
ELSE '21+'
END AS age,
COUNT(*) AS n
FROM TABLE t
GROUP BY CASE t.age
WHEN BETWEEN 1 AND 10 THEN '1-10'
WHEN BETWEEN 11 AND 20 THEN '11-20'
ELSE '21+'
END
I had to get a count of samples by day. Inspired by #Clarkey I used TO_CHAR to extract the date of sample from the timestamp to an ISO-8601 date format and used that in the GROUP BY and ORDER BY clauses. (Further inspired, I also post it here in case it is useful to others.)
SELECT
TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') AS TS_DAY,
COUNT(*)
FROM
TABLE X
GROUP BY
TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD')
ORDER BY
TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') ASC
/
Can you try the below solution:
SELECT count (1), '1-10' where age between 1 and 10
union all
SELECT count (1), '11-20' where age between 11 and 20
union all
select count (1), '21+' where age >20
from age
My approach:
select range, count(1) from (
select case
when age < 5 then '0-4'
when age < 10 then '5-9'
when age < 15 then '10-14'
when age < 20 then '15-20'
when age < 30 then '21-30'
when age < 40 then '31-40'
when age < 50 then '41-50'
else '51+'
end
as range from
(select round(extract(day from feedback_update_time - feedback_time), 1) as age
from txn_history
) ) group by range
I have flexibility in defining the ranges
I do not repeat the ranges in select and group clauses
but some one please tell me, how to order them by magnitude!