Oracle: how to "group by" over a range? - sql

If I have a table like this:
pkey age
---- ---
1 8
2 5
3 12
4 12
5 22
I can "group by" to get a count of each age.
select age,count(*) n from tbl group by age;
age n
--- -
5 1
8 1
12 2
22 1
What query can I use to group by age ranges?
age n
----- -
1-10 2
11-20 2
20+ 1
I'm on 10gR2, but I'd be interested in any 11g-specific approaches as well.

SELECT CASE
WHEN age <= 10 THEN '1-10'
WHEN age <= 20 THEN '11-20'
ELSE '21+'
END AS age,
COUNT(*) AS n
FROM age
GROUP BY CASE
WHEN age <= 10 THEN '1-10'
WHEN age <= 20 THEN '11-20'
ELSE '21+'
END

Try:
select to_char(floor(age/10) * 10) || '-'
|| to_char(ceil(age/10) * 10 - 1)) as age,
count(*) as n from tbl group by floor(age/10);

What you are looking for, is basically the data for a histogram.
You would have the age (or age-range) on the x-axis and the count n (or frequency) on the y-axis.
In the simplest form, one could simply count the number of each distinct age value like you already described:
SELECT age, count(*)
FROM tbl
GROUP BY age
When there are too many different values for the x-axis however, one may want to create groups (or clusters or buckets). In your case, you group by a constant range of 10.
We can avoid writing a WHEN ... THEN line for each range - there could be hundreds if it were not about age. Instead, the approach by #MatthewFlaschen is preferable for the reasons mentioned by #NitinMidha.
Now let's build the SQL...
First, we need to split the ages into range-groups of 10 like so:
0-9
10-19
20 - 29
etc.
This can be achieved by dividing the age column by 10 and then calculating the result's FLOOR:
FLOOR(age/10)
"FLOOR returns the largest integer equal to or less than n"
http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions067.htm#SQLRF00643
Then we take the original SQL and replace age with that expression:
SELECT FLOOR(age/10), count(*)
FROM tbl
GROUP BY FLOOR(age/10)
This is OK, but we cannot see the range, yet. Instead we only see the calculated floor values which are 0, 1, 2 ... n.
To get the actual lower bound, we need to multiply it with 10 again so we get 0, 10, 20 ... n:
FLOOR(age/10) * 10
We also need the upper bound of each range which is lower bound + 10 - 1 or
FLOOR(age/10) * 10 + 10 - 1
Finally, we concatenate both into a string like this:
TO_CHAR(FLOOR(age/10) * 10) || '-' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1)
This creates '0-9', '10-19', '20-29' etc.
Now our SQL looks like this:
SELECT
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1),
COUNT(*)
FROM tbl
GROUP BY FLOOR(age/10)
Finally, apply an order and nice column aliases:
SELECT
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1) AS range,
COUNT(*) AS frequency
FROM tbl
GROUP BY FLOOR(age/10)
ORDER BY FLOOR(age/10)
However, in more complex scenarios, these ranges might not be grouped into constant chunks of size 10, but need dynamical clustering.
Oracle has more advanced histogram functions included, see http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366
Credits to #MatthewFlaschen for his approach; I only explained the details.

Here is a solution which creates a "range" table in a sub-query and then uses this to partition the data from the main table:
SELECT DISTINCT descr
, COUNT(*) OVER (PARTITION BY descr) n
FROM age_table INNER JOIN (
select '1-10' descr, 1 rng_start, 10 rng_stop from dual
union (
select '11-20', 11, 20 from dual
) union (
select '20+', 21, null from dual
)) ON age BETWEEN nvl(rng_start, age) AND nvl(rng_stop, age)
ORDER BY descr;

I had to group data by how many transactions appeared in an hour. I did this by extracting the hour from the timestamp:
select extract(hour from transaction_time) as hour
,count(*)
from table
where transaction_date='01-jan-2000'
group by
extract(hour from transaction_time)
order by
extract(hour from transaction_time) asc
;
Giving output:
HOUR COUNT(*)
---- --------
1 9199
2 9167
3 9997
4 7218
As you can see this gives a nice easy way of grouping the number of records per hour.

add an age_range table and an age_range_id field to your table and group by that instead.
// excuse the DDL but you should get the idea
create table age_range(
age_range_id tinyint unsigned not null primary key,
name varchar(255) not null);
insert into age_range values
(1, '18-24'),(2, '25-34'),(3, '35-44'),(4, '45-54'),(5, '55-64');
// again excuse the DML but you should get the idea
select
count(*) as counter, p.age_range_id, ar.name
from
person p
inner join age_range ar on p.age_range_id = ar.age_range_id
group by
p.age_range_id, ar.name order by counter desc;
You can refine this idea if you like - add from_age to_age columns in the age_range table etc - but i'll leave that to you.
hope this helps :)

If using Oracle 9i+, you might be able to use the NTILE analytic function:
WITH tiles AS (
SELECT t.age,
NTILE(3) OVER (ORDER BY t.age) AS tile
FROM TABLE t)
SELECT MIN(t.age) AS min_age,
MAX(t.age) AS max_age,
COUNT(t.tile) As n
FROM tiles t
GROUP BY t.tile
The caveat to NTILE is that you can only specify the number of partitions, not the break points themselves. So you need to specify a number that is appropriate. IE: With 100 rows, NTILE(4) will allot 25 rows to each of the four buckets/partitions. You can not nest analytic functions, so you'd have to layer them using subqueries/subquery factoring to get desired granularity. Otherwise, use:
SELECT CASE t.age
WHEN BETWEEN 1 AND 10 THEN '1-10'
WHEN BETWEEN 11 AND 20 THEN '11-20'
ELSE '21+'
END AS age,
COUNT(*) AS n
FROM TABLE t
GROUP BY CASE t.age
WHEN BETWEEN 1 AND 10 THEN '1-10'
WHEN BETWEEN 11 AND 20 THEN '11-20'
ELSE '21+'
END

I had to get a count of samples by day. Inspired by #Clarkey I used TO_CHAR to extract the date of sample from the timestamp to an ISO-8601 date format and used that in the GROUP BY and ORDER BY clauses. (Further inspired, I also post it here in case it is useful to others.)
SELECT
TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') AS TS_DAY,
COUNT(*)
FROM
TABLE X
GROUP BY
TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD')
ORDER BY
TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') ASC
/

Can you try the below solution:
SELECT count (1), '1-10' where age between 1 and 10
union all
SELECT count (1), '11-20' where age between 11 and 20
union all
select count (1), '21+' where age >20
from age

My approach:
select range, count(1) from (
select case
when age < 5 then '0-4'
when age < 10 then '5-9'
when age < 15 then '10-14'
when age < 20 then '15-20'
when age < 30 then '21-30'
when age < 40 then '31-40'
when age < 50 then '41-50'
else '51+'
end
as range from
(select round(extract(day from feedback_update_time - feedback_time), 1) as age
from txn_history
) ) group by range
I have flexibility in defining the ranges
I do not repeat the ranges in select and group clauses
but some one please tell me, how to order them by magnitude!

Related

Select X rows of each criteria

I can get a count of records based on some criteria such as length of the data in specific columns.
But it seems I can get first X records (say 20 records) and they could all be the same length.
How do I get 20 records of each length?
SELECT LABEL_ID, DEST, WEIGHT_OZ
FROM MYTABLE
WHERE
LENGTH(LABEL_ID) IN (10,13,24)
AND ROWNUM <= 20;
This returns 20 records of labels of length 10 (since there are more than 20 records of that length). How do I get 20 of length 10, 20 of length 13, 20 of length 24, etc.?
Thanks.
Assisted by a post here
WITH rws AS (
SELECT o.LABEL_ID, o.DEST, o.WEIGHT_OZ,
ROW_NUMBER () OVER (
PARTITION BY LENGTH(LABEL_ID)
ORDER BY SOME_DATE_COLUMN DESC
) rn
FROM MYTABLE o
WHERE LENGTH(LABEL_ID) IN (10,13,24)
)
SELECT LABEL_ID, DEST, WEIGHT_OZ
FROM rws
WHERE rn <= 20
ORDER BY LENGTH(LABEL_ID), SOME_DATE_COLUMN DESC;

how to select SQL , age 20 first , and then all the rest from table PERSONS?

My table contains 113 people.
48 of them are 20 years old. Now I am just selecting all people like
select * from persons
this will get me all persons, but 20 yr old are not the first 48 people.
I need the 20 yr old to be first 48 in 113 results.
something like
20 year ols ( 48 of them ), after that ..... all the rest in the table
How can I query this using PostgreSQL.
EDIT : there are age less than 20 too. after getting the first 48 , 20 yr olds, I dont care rest of the order I am getting the 48 to 113 people.
Just use order by :
select *
from persons
order by age
You can use asc or desc but because default is asc you do not need to put it in your example.
select *
from persons
order by age desc
After the comment from OP here is the new code(I do not know why but my firs assumption was that the value 20 is the lowest possible value... bad assumption):
select *
from persons
order by case when age = 20 then 1 else 2 end
OR
select *
from persons
order by (age = 20) desc
Here is a demo
If 20 is not your minimum age, you can use the CASE statement inside the ORDER BY clause, like this:
SELECT
*
FROM
persons
ORDER BY
CASE WHEN age = 20 THEN 0
ELSE 1
END ASC

SQL Create a column with conditions from a sub query

I have a table with ID and HEIGHT, LENGTH and WIDTH. I need to find the mas measure of every row and then create a column of surcharge of $5 if the biggest measure is between 22 and 30 and 8 if it is >30. The first parte is working fine
select id, max(measure) as max_measure
from (
select id, height as measure from table1
union
select id, length as measure from table1
union
select id, width as measure from table1
) m
group by id
But i cant make the second part, it should be a sub query using the results I got from the first part and looking something roughly like this
select surcharge where
m.max_measure >= 22 and m.max_measure <30
m.max_measure>= 30
select id,
max(measure) as max_measure,
case when max(measure) >= 30 then 8
when max(measure) >= 22 then 5
else 0 end as surcharge

How do I aggregate numbers from a string column in SQL

I am dealing with a poorly designed database column which has values like this
ID cid Score
1 1 3 out of 3
2 1 1 out of 5
3 2 3 out of 6
4 3 7 out of 10
I want the aggregate sum and percentage of Score column grouped on cid like this
cid sum percentage
1 4 out of 8 50
2 3 out of 6 50
3 7 out of 10 70
How do I do this?
You can try this way :
select
t.cid
, cast(sum(s.a) as varchar(5)) +
' out of ' +
cast(sum(s.b) as varchar(5)) as sum
, ((cast(sum(s.a) as decimal))/sum(s.b))*100 as percentage
from MyTable t
inner join
(select
id
, cast(substring(score,0,2) as Int) a
, cast(substring(score,charindex('out of', score)+7,len(score)) as int) b
from MyTable
) s on s.id = t.id
group by t.cid
[SQLFiddle Demo]
Redesign the table, but on-the-fly as a CTE. Here's a solution that's not as short as you could make it, but that takes advantage of the handy SQL Server function PARSENAME. You may need to tweak the percentage calculation if you want to truncate rather than round, or if you want it to be a decimal value, not an int.
In this or most any solution, you have to count on the column values for Score to be in the very specific format you show. If you have the slightest doubt, you should run some other checks so you don't miss or misinterpret anything.
with
P(ID, cid, Score2Parse) as (
select
ID,
cid,
replace(Score,space(1),'.')
from scores
),
S(ID,cid,pts,tot) as (
select
ID,
cid,
cast(parsename(Score2Parse,4) as int),
cast(parsename(Score2Parse,1) as int)
from P
)
select
cid, cast(round(100e0*sum(pts)/sum(tot),0) as int) as percentage
from S
group by cid;

Different value counts on same column

I am new to Oracle. I have an Oracle table with three columns: serialno, item_category and item_status. In the third column the rows have values of serviceable, under_repair or condemned.
I want to run the query using count to show how many are serviceable, how many are under repair, how many are condemned against each item category.
I would like to run something like:
select item_category
, count(......) "total"
, count (.....) "serviceable"
, count(.....)"under_repair"
, count(....) "condemned"
from my_table
group by item_category ......
I am unable to run the inner query inside the count.
Here's what I'd like the result set to look like:
item_category total serviceable under repair condemned
============= ===== ============ ============ ===========
chair 18 10 5 3
table 12 6 3 3
You can either use CASE or DECODE statement inside the COUNT function.
SELECT item_category,
COUNT (*) total,
COUNT (DECODE (item_status, 'serviceable', 1)) AS serviceable,
COUNT (DECODE (item_status, 'under_repair', 1)) AS under_repair,
COUNT (DECODE (item_status, 'condemned', 1)) AS condemned
FROM mytable
GROUP BY item_category;
Output:
ITEM_CATEGORY TOTAL SERVICEABLE UNDER_REPAIR CONDEMNED
----------------------------------------------------------------
chair 5 1 2 2
table 5 3 1 1
This is a very basic "group by" query. If you search for that you will find plenty of documentation on how it is used.
For your specific case, you want:
select item_category, item_status, count(*)
from <your table>
group by item_category, item_status;
You'll get something like this:
item_category item_status count(*)
======================================
Chair under_repair 7
Chair condemned 16
Table under_repair 3
Change the column ordering as needed for your purpose
I have a tendency of writing this stuff up so when I forget how to do it, I have an easy to find example.
The PIVOT clause was new in 11g. Since that was 5+ years ago, I'm hoping you are using it.
Sample Data
create table t
(
serialno number(2,0),
item_category varchar2(30),
item_status varchar2(20)
);
insert into t ( serialno, item_category, item_status )
select
rownum serialno,
( case
when rownum <= 12 then 'table'
else 'chair'
end ) item_category,
( case
--table status
when rownum <= 12
and rownum <= 6
then 'servicable'
when rownum <= 12
and rownum between 7 and 9
then 'under_repair'
when rownum <= 12
and rownum > 9
then 'condemned'
--chair status
when rownum > 12
and rownum < 13 + 10
then 'servicable'
when rownum > 12
and rownum between 23 and 27
then 'under_repair'
when rownum > 12
and rownum > 27
then 'condemned'
end ) item_status
from
dual connect by level <= 30;
commit;
and the PIVOT query:
select *
from
(
select
item_status stat,
item_category,
item_status
from t
)
pivot
(
count( item_status )
for stat in ( 'servicable' as "servicable", 'under_repair' as "under_repair", 'condemned' as "condemned" )
);
ITEM_CATEGORY servicable under_repair condemned
------------- ---------- ------------ ----------
chair 10 5 3
table 6 3 3
I still prefer #Ramblin' Man's way of doing it (except using CASE in place of DECODE) though.
Edit
Just realized I left out the TOTAL column. I'm not sure there's a way to get that column using the PIVOT clause, perhaps someone else knows how. May also be the reason I don't use it that often.