In SQL, how can you "group by" in ranges? - sql

Suppose I have a table with a numeric column (lets call it "score").
I'd like to generate a table of counts, that shows how many times scores appeared in each range.
For example:
score range | number of occurrences
-------------------------------------
0-9 | 11
10-19 | 14
20-29 | 3
... | ...
In this example there were 11 rows with scores in the range of 0 to 9, 14 rows with scores in the range of 10 to 19, and 3 rows with scores in the range 20-29.
Is there an easy way to set this up? What do you recommend?

Neither of the highest voted answers are correct on SQL Server 2000. Perhaps they were using a different version.
Here are the correct versions of both of them on SQL Server 2000.
select t.range as [score range], count(*) as [number of occurences]
from (
select case
when score between 0 and 9 then ' 0- 9'
when score between 10 and 19 then '10-19'
else '20-99' end as range
from scores) t
group by t.range
or
select t.range as [score range], count(*) as [number of occurrences]
from (
select user_id,
case when score >= 0 and score< 10 then '0-9'
when score >= 10 and score< 20 then '10-19'
else '20-99' end as range
from scores) t
group by t.range

An alternative approach would involve storing the ranges in a table, instead of embedding them in the query. You would end up with a table, call it Ranges, that looks like this:
LowerLimit UpperLimit Range
0 9 '0-9'
10 19 '10-19'
20 29 '20-29'
30 39 '30-39'
And a query that looks like this:
Select
Range as [Score Range],
Count(*) as [Number of Occurences]
from
Ranges r inner join Scores s on s.Score between r.LowerLimit and r.UpperLimit
group by Range
This does mean setting up a table, but it would be easy to maintain when the desired ranges change. No code changes necessary!

I see answers here that won't work in SQL Server's syntax. I would use:
select t.range as [score range], count(*) as [number of occurences]
from (
select case
when score between 0 and 9 then ' 0-9 '
when score between 10 and 19 then '10-19'
when score between 20 and 29 then '20-29'
...
else '90-99' end as range
from scores) t
group by t.range
EDIT: see comments

In postgres (where || is the string concatenation operator):
select (score/10)*10 || '-' || (score/10)*10+9 as scorerange, count(*)
from scores
group by score/10
order by 1
gives:
scorerange | count
------------+-------
0-9 | 11
10-19 | 14
20-29 | 3
30-39 | 2
And here's how to do it in T-SQL:
DECLARE #traunch INT = 1000;
SELECT
CONCAT
(
FORMAT((score / #traunch) * #traunch, '###,000,000')
, ' - ' ,
FORMAT((score / #traunch) * #traunch + #traunch - 1, '###,000,000')
) as [Range]
, FORMAT(MIN(score), 'N0') as [Min]
, FORMAT(AVG(score), 'N0') as [Avg]
, FORMAT(MAX(score), 'N0') as [Max]
, FORMAT(COUNT(score), 'N0') as [Count]
, FORMAT(SUM(score), 'N0') as [Sum]
FROM scores
GROUP BY score / #traunch
ORDER BY score / #traunch

James Curran's answer was the most concise in my opinion, but the output wasn't correct. For SQL Server the simplest statement is as follows:
SELECT
[score range] = CAST((Score/10)*10 AS VARCHAR) + ' - ' + CAST((Score/10)*10+9 AS VARCHAR),
[number of occurrences] = COUNT(*)
FROM #Scores
GROUP BY Score/10
ORDER BY Score/10
This assumes a #Scores temporary table I used to test it, I just populated 100 rows with random number between 0 and 99.

select cast(score/10 as varchar) + '-' + cast(score/10+9 as varchar),
count(*)
from scores
group by score/10

create table scores (
user_id int,
score int
)
select t.range as [score range], count(*) as [number of occurences]
from (
select user_id,
case when score >= 0 and score < 10 then '0-9'
case when score >= 10 and score < 20 then '10-19'
...
else '90-99' as range
from scores) t
group by t.range

This will allow you to not have to specify ranges, and should be SQL server agnostic. Math FTW!
SELECT CONCAT(range,'-',range+9), COUNT(range)
FROM (
SELECT
score - (score % 10) as range
FROM scores
)

I would do this a little differently so that it scales without having to define every case:
select t.range as [score range], count(*) as [number of occurences]
from (
select FLOOR(score/10) as range
from scores) t
group by t.range
Not tested, but you get the idea...

declare #RangeWidth int
set #RangeWidth = 10
select
Floor(Score/#RangeWidth) as LowerBound,
Floor(Score/#RangeWidth)+#RangeWidth as UpperBound,
Count(*)
From
ScoreTable
group by
Floor(Score/#RangeWidth)

select t.blah as [score range], count(*) as [number of occurences]
from (
select case
when score between 0 and 9 then ' 0-9 '
when score between 10 and 19 then '10-19'
when score between 20 and 29 then '20-29'
...
else '90-99' end as blah
from scores) t
group by t.blah
Make sure you use a word other than 'range' if you are in MySQL, or you will get an error for running the above example.

Because the column being sorted on (Range) is a string, string/word sorting is used instead of numeric sorting.
As long as the strings have zeros to pad out the number lengths the sorting should still be semantically correct:
SELECT t.range AS ScoreRange,
COUNT(*) AS NumberOfOccurrences
FROM (SELECT CASE
WHEN score BETWEEN 0 AND 9 THEN '00-09'
WHEN score BETWEEN 10 AND 19 THEN '10-19'
ELSE '20-99'
END AS Range
FROM Scores) t
GROUP BY t.Range
If the range is mixed, simply pad an extra zero:
SELECT t.range AS ScoreRange,
COUNT(*) AS NumberOfOccurrences
FROM (SELECT CASE
WHEN score BETWEEN 0 AND 9 THEN '000-009'
WHEN score BETWEEN 10 AND 19 THEN '010-019'
WHEN score BETWEEN 20 AND 99 THEN '020-099'
ELSE '100-999'
END AS Range
FROM Scores) t
GROUP BY t.Range

Try
SELECT (str(range) + "-" + str(range + 9) ) AS [Score range], COUNT(score) AS [number of occurances]
FROM (SELECT score, int(score / 10 ) * 10 AS range FROM scoredata )
GROUP BY range;

select t.range as score, count(*) as Count
from (
select UserId,
case when isnull(score ,0) >= 0 and isnull(score ,0)< 5 then '0-5'
when isnull(score ,0) >= 5 and isnull(score ,0)< 10 then '5-10'
when isnull(score ,0) >= 10 and isnull(score ,0)< 15 then '10-15'
when isnull(score ,0) >= 15 and isnull(score ,0)< 20 then '15-20'
else ' 20+' end as range
,case when isnull(score ,0) >= 0 and isnull(score ,0)< 5 then 1
when isnull(score ,0) >= 5 and isnull(score ,0)< 10 then 2
when isnull(score ,0) >= 10 and isnull(score ,0)< 15 then 3
when isnull(score ,0) >= 15 and isnull(score ,0)< 20 then 4
else 5 end as pd
from score table
) t
group by t.range,pd order by pd

I'm here because i have similar question but i find the short answers wrong and the one with the continuous "case when" is to much work and seeing anything repetitive in my code hurts my eyes. So here is the solution
SELECT --MIN(score), MAX(score),
[score range] = CAST(ROUND(score-5,-1)AS VARCHAR) + ' - ' + CAST((ROUND(score-5,-1)+10)AS VARCHAR),
[number of occurrences] = COUNT(*)
FROM order
GROUP BY CAST(ROUND(score-5,-1)AS VARCHAR) + ' - ' + CAST((ROUND(score-5,-1)+10)AS VARCHAR)
ORDER BY MIN(score)

For PrestoSQL/Trino applying answer from Ken https://stackoverflow.com/a/232463/429476
select t.range, count(*) as "Number of Occurance", ROUND(AVG(fare_amount),2) as "Avg",
ROUND(MAX(fare_amount),2) as "Max" ,ROUND(MIN(fare_amount),2) as "Min"
from (
select
case
when trip_distance between 0 and 9 then ' 0-9 '
when trip_distance between 10 and 19 then '10-19'
when trip_distance between 20 and 29 then '20-29'
when trip_distance between 30 and 39 then '30-39'
else '> 39'
end as range ,fare_amount
from nyc_in_parquet.tlc_yellow_trip_2022) t
where fare_amount > 1 and fare_amount < 401092
group by t.range;
range | Number of Occurance | Avg | Max | Min
-------+---------------------+--------+-------+------
0-9 | 2260865 | 10.28 | 720.0 | 1.11
30-39 | 1107 | 104.28 | 280.0 | 5.0
10-19 | 126136 | 43.8 | 413.5 | 2.0
> 39 | 42556 | 39.11 | 668.0 | 1.99
20-29 | 19133 | 58.62 | 250.0 | 2.5

Perhaps you're asking about keeping such things going...
Of course you'll invoke a full table scan for the queries and if the table containing the scores that need to be tallied (aggregations) is large you might want a better performing solution, you can create a secondary table and use rules, such as on insert - you might look into it.
Not all RDBMS engines have rules, though!

Related

Count average with multiple conditions

I'm trying to create a query which allows to categorize the average percentage for specific data per month.
Here's how my dataset presents itself:
Date
Name
Group
Percent
2022-01-21
name1
gr1
5.2
2022-01-22
name1
gr1
6.1
2022-01-26
name1
gr1
4.9
2022-02-01
name1
gr1
3.2
2022-02-03
name1
gr1
8.1
2022-01-22
name2
gr1
36.1
2022-01-25
name2
gr1
32.1
2022-02-10
name2
gr1
35.8
...
...
...
...
And here's what I want to obtain with my query (based on what I showed of the table):
Month
<=25%
25<_<=50%
50<_<=75%
75<_<=100%
01
1
1
0
0
02
1
1
0
0
...
...
...
...
...
The result needs to:
Be ordered by month
Have the average use for each name counted and categorized
So far I know how to get the average of the Percent value per Name:
SELECT Name,
AVG(Percent)
from `table`
where Group = 'gr1'
group by Name
and how to count iterations of Percent in the categories created for the query:
SELECT EXTRACT(MONTH FROM Date) as Month,
COUNT(CASE WHEN Percent <= 25 AND Group = 'gr1' THEN Name END) `_25`,
COUNT(CASE WHEN Percent > 25 AND Percent <= 50 AND Group = 'gr1' THEN Name END) `_50`,
COUNT(CASE WHEN Percent > 50 AND Percent <= 75 AND Group = 'gr1' THEN Name END) `_75`,
COUNT(CASE WHEN Percent > 75 AND Percent <= 100 AND Group = 'gr1' THEN Name END) `_100`,
FROM `table`
GROUP BY Month
ORDER BY Month
but this counts all iterations of every name where I want the average of those values.
I've been struggling to figure out how to combine the two queries or to create a new one that answers my need.
I'm working with the BigQuery service from Google Cloud
This query produces the needed result, based on your example. So basically this combines your 2 queries using subquery, where the subquery is responsible to calculate AVG grouped by Name, Month and Group, and the outer query is for COUNT and "categorization"
SELECT
Month,
COUNT(CASE
WHEN avg <= 25 THEN Name
END) AS _25,
COUNT(CASE
WHEN avg > 25
AND avg <= 50 THEN Name
END) AS _50,
COUNT(CASE
WHEN avg > 50
AND avg <= 75 THEN Name
END) AS _75,
COUNT(CASE
WHEN avg > 75
AND avg <= 100 THEN Name
END) AS _100
FROM
(
SELECT
EXTRACT(MONTH from Date) AS Month,
Name,
AVG(Percent) AS avg
FROM
table1
GROUP BY Month, Name, Group
HAVING Group = 'gr1'
) AS namegr
GROUP BY Month
This is the result:
Month
_25
_50
_75
_100
1
1
1
0
0
2
1
1
0
0
See also Fiddle (BUT on MySql) - http://sqlfiddle.com/#!9/16c5882/9
You can use this query to Group By Month and each Name
SELECT CONCAT(EXTRACT(MONTH FROM Date), ', ', Name) AS DateAndName,
CASE
WHEN AVG(Percent) <= 25 THEN '1'
ELSE '0'
END AS '<=25%',
CASE
WHEN AVG(Percent) > 25 AND AVG(Percent) <= 50 THEN '1'
ELSE '0'
END AS '25<_<=50%',
CASE
WHEN AVG(Percent) > 50 AND AVG(Percent) <= 75 THEN '1'
ELSE '0'
END AS '50<_<=75%',
CASE
WHEN AVG(Percent) > 75 AND AVG(Percent) <= 100 THEN '1'
ELSE '0'
END AS '75<_<=100%'
from DataTable /*change to your table name*/
group by EXTRACT(MONTH FROM Date), Name
order by DateAndName
It gives the following result:
DateAndName
<=25%
25<_<=50%
50<_<=75%
75<_<=100%
1, name1
1
0
0
0
1, name2
0
1
0
0
2, name1
1
0
0
0
2, name2
0
1
0
0

Subtract in Union

I have this data, where I want to generate the last row "on the fly" from the first two:
Group
1yr
2yrs
3yrs
date
code
Port
19
-15
88
1/1/2020
arp
Bench
10
-13
66
1/1/2020
arb
Diff
9
2
22
I am trying to subtract the Port & Bench returns and have the difference on the new row. How can I do this?
Here's my code so far:
Select
date
Group,
Code,
1 yr returnp,
2 yrs returnp,
3yrs return
From timetable
union
Select
date,
Group,
Code,
1 yr returnb,
2 yrs returnb,
3yrs returnb
From timetable
Seems to me that a UNION ALL in concert with a conditional aggregation should do the trick
Note the sum() is wrapped in an abs() to match desired results
Select *
From YourTable
Union All
Select [Group] = 'Diff'
,[1yr] = abs(sum([1yr] * case when [Group]='Bench' then -1 else 1 end))
,[2yrs] = abs(sum([2yrs] * case when [Group]='Bench' then -1 else 1 end))
,[3yrs] = abs(sum([3yrs] * case when [Group]='Bench' then -1 else 1 end))
,[date] = null
,[code] = null
from YourTable
Results
Group 1yr 2yrs 3yrs date code
Port 19 -15 88 2020-01-01 arp
Bench 10 -13 66 2020-01-01 arb
Diff 9 2 22 NULL NULL
If you know there is always 2 rows, something like this would work
SELECT * FROM timetable
UNION ALL
SELECT
MAX(1yr) - MIN(1yr),
MAX(2yrs) - MIN(2yrs),
MAX(3yrs) - MIN(3yrs),
null,
null,
FROM timetable

Function to Calculate Formula Without Using a For Loop in PL/SQL

Still kinda new to PL/SQL but basically I'm trying to create a function that will calculate a person's score depending on how much money they've paid in the last 8 years. It uses the sums for each year for calculations. The formula is (Year-1)/Year-2) * (Year-2/Year-3) * (Year3/Year4) and so on. What makes it extra tricky is that I need to skip years where they gave 0.
For example:
Here's the code I have so far:
CREATE OR REPLACE FUNCTION formula
(idnum IN NUMBER)
RETURN NUMBER IS score NUMBER;
-- Declare Variables
currentyear NUMBER := EXTRACT (YEAR FROM SYSDATE);
previousyear NUMBER := currentyear - 1;
yeareight NUMBER := currentyear - 8;
previoussum NUMBER := 0;
currentsum NUMBER := 0;
placeholder NUMBER := 0;
score NUMBER := 1;
BEGIN
-- Set Score to 0 if no history of payments in the last 8 years
SELECT NVL(SUM(amount), 0)
INTO currentsum
FROM moneytable g
WHERE g.id_number = idnum
AND g.fiscal_year BETWEEN yeareight AND previousyear;
IF currentsum = 0 THEN score := 0;
ELSE
-- Loop to calculate Score
-- Score formula is (Year-1/Year -2) * (Year-2/Year-3) and so on for the last 8 years
-- Zeroes ignored for above calculations
-- Score defaults to 1 if only one year has any gifts
FOR counter IN 1..8
LOOP
currentyear := currentyear - 1;
placeholder := 0;
SELECT NVL(SUM(amount), 0)
INTO currentsum
FROM moneytable g
WHERE g.id_number = idnum
AND g.fiscal_year = currentyear;
IF currentsum = 0 THEN CONTINUE;
ELSE placeholder := previoussum / currentsum; END IF;
previoussum := currentsum;
IF currentsum > 0 AND placeholder > 0 THEN score := score * placeholder; END IF;
END LOOP;
END IF;
RETURN score;
END;
It works and it gives the correct score but it runs super slow if I try running it more for than a few people at a time. Is there a more efficient, optimized way to create this function?
First UNPIVOT to get rid of the empty years
select * from tab
UNPIVOT ( value
FOR year IN
(YEAR1, YEAR2, YEAR3, YEAR4, YEAR5, YEAR6, YEAR7, YEAR8)
)
where value != 0
order by 1,2;
NAME YEAR VALUE
---- ----- ----------
Jane YEAR1 10
Jane YEAR3 20
Jane YEAR4 50
Jane YEAR7 30
Jane YEAR8 20
Rob YEAR2 10
Rob YEAR4 20
...
Then calculate the coefficient using LEAD aggregate function (use as default the same VALUE from the row to ignore the last coefficient - set it to one).
with formula as (select * from tab
UNPIVOT ( value
FOR year IN
(YEAR1, YEAR2, YEAR3, YEAR4, YEAR5, YEAR6, YEAR7, YEAR8)
)
where value != 0)
select NAME, YEAR, VALUE,
VALUE / lead(value,1,VALUE) over (partition by NAME order by YEAR) as koeff
from formula
order by 1,2;
NAME YEAR VALUE KOEFF
---- ----- ---------- ----------
Jane YEAR1 10 ,5
Jane YEAR3 20 ,4
Jane YEAR4 50 1,66666667
Jane YEAR7 30 1,5
Jane YEAR8 20 1
Rob YEAR2 10 ,5
Rob YEAR4 20 ,2
...
In the last step calculate the aggregated multiplication of the coefficients using this trick
with formula as (select * from tab
UNPIVOT ( value
FOR year IN
(YEAR1, YEAR2, YEAR3, YEAR4, YEAR5, YEAR6, YEAR7, YEAR8)
)
where value != 0),
formula2 as (
select NAME, YEAR, VALUE,
VALUE / lead(value,1,VALUE) over (partition by NAME order by YEAR) as koeff
from formula)
select name,
round(EXP(SUM(LN(koeff))),6) score
from formula2
group by name
order by 1 ;
NAME SCORE
---- ----------
Jane ,5
Rob ,1
Tom ,2
Test Data
create table tab as
select 'Tom' name, 0 year1, 0 year2, 0 year3, 10 year4, 20 year5, 30 year6, 40 year7, 50 year8
from dual union all
select 'Jane' name, 10 year1, 0 year2, 20 year3, 50 year4, 0 year5, 0 year6, 30 year7, 20 year8
from dual union all
select 'Rob' name, 0 year1, 10 year2, 0 year3, 20 year4, 0 year5, 0 year6, 0 year7, 100 year8
from dual;
Perhaps is better to call one single query with the last 8 years and put the result on an array and so loop on it, without execute 8 query:
DECLARE
TYPE arrayofnumbers IS TABLE OF NUMBER(11);
sums arrayofnumbers;
BEGIN
SELECT NVL(SUM(amount), 0)
INTO sums
FROM moneytable g
WHERE g.id_number = idnum AND g.fiscal_year between currentyear and currentyear+7;
FOR i IN 1 .. sums.count
LOOP
-- other code
dbms_output.put_line(sums(i));
END LOOP;
END;
You want the first non-zero value divided by the last non-zero value. That would be:
select (case when year1 <> 0 then year1
when year2 <> 0 then year2
when year3 <> 0 then year3
when year4 <> 0 then year4
when year5 <> 0 then year5
when year6 <> 0 then year6
when year7 <> 0 then year7
when year8 <> 0 then year8
end) /
(case when year8 <> 0 then year8
when year7 <> 0 then year7
when year6 <> 0 then year6
when year5 <> 0 then year5
when year4 <> 0 then year4
when year3 <> 0 then year3
when year2 <> 0 then year2
when year1 <> 0 then year1
end)
Query table only once and group result by years. Number them descending and loop through eight rows. It can be done in SQL or you can go through grouped rows in loop in your function. SQL example:
dbfiddle
with y(rn, amt) as (
select row_number() over (order by fiscal_year desc), sum(amount)
from moneytable g
where id_number = 1
and fiscal_year between extract (year from sysdate) - 8
and extract(year from sysdate) - 1
group by fiscal_year),
c(rn, amt, prev, ret) as (
select rn, amt, amt, 1 from y where rn = 1
union all
select y.rn, y.amt, c.amt, (c.amt/y.amt)*ret from c join y on y.rn = c.rn + 1)
select ret from c where rn = (select max(rn) from c)

Further group by WEEK NUMBER in sub-query

I am trying to subtract values from 2 columns and group them week number.
The event code column has values 3,4. I am trying to sum duration for event codes 4 and subtract the duration of event code 3. These values need to be derived for the last 12 weeks.
Here is what I have so far. I am stuff and further grouping by week number:
SELECT DISTINCT CUSTOMER_ID,
((SELECT SUM(DURATION_IN_SECONDS)/60 FROM TABLE1 ee WHERE ee.CUSTOMER_ID = e.CUSTOMER_ID AND EVENT_CODE IN (4))-
(SELECT SUM(DURATION_IN_SECONDS)/60 FROM TABLE1 ee WHERE ee.CUSTOMER_ID = e.CUSTOMER_ID AND EVENT_CODE IN (3))) AS UNPRODUCTIVE_MINUTES
FROM TABLE1 e
WHERE TIMEDATE >= TO_DATE('01-OCT-19','DD-MON-YY')
AND TIMEDATE <= TO_DATE('31-DEC-19','DD-MON-YY')
GROUP BY CUSTOMER_ID
The above query produces results like this:
CUSTOMER_ID UNPRODUCTIVE_MINUTES
A100 1601
But my result has to be like this:
CUSTOMER_ID WEEKNUMBER UNPRODUCTIVE_MINUTES
A100 12 171
A100 11 108
A100 10 112
A100 9 110
A100 8 98
A100 7 67
A100 6 117
A100 5 100
A100 4 111
A100 3 77
A100 2 73
A100 1 87
I am not sure, how you want to calculate the week number but I guess weeknumber is (timedate - start timedate / 7) + 1 so creating the query accordingly.
Select customer_id,
Sum(case when EVENT_CODE = 4 then DURATION_IN_SECONDS else (-1* DURATION_IN_SECONDS) end)/60 as dur,
Floor(Trunc(timedate) - TO_DATE('01-OCT-19','DD-MON-YY') / 7) + 1 as weeknumber
From table1 e
Where TIMEDATE >= TO_DATE('01-OCT-19','DD-MON-YY')
AND TIMEDATE <= TO_DATE('31-DEC-19','DD-MON-YY')
AND EVENT_CODE in (3, 4)
GROUP BY CUSTOMER_ID, floor(trunc(timedate) - TO_DATE('01-OCT-19','DD-MON-YY') / 7) + 1
Here, I have not used event_code 3 as DURATION_IN_SECONDS for event_code 3 and 4 both minus DURATION_IN_SECONDS for 3 will eventually same as DURATION_IN_SECONDS for event_code 4 alone.
Cheers!!
TO_CHAR(TIMEDATE, 'ww') function might directly be used to
determine the week number
No need to use Correlated Subqueries, but Conditional Aggregation
should be used instead
Reformat your DATE literals as DATE'yyyy-mm-dd' according to ISO standard as in the below
Using BETWEEN Operator is enough for inclusive date ranges instead of
inequalities
query
SELECT CUSTOMER_ID,
TO_CHAR(TIMEDATE, 'ww') AS WEEK,
NVL(SUM(CASE
WHEN EVENT_CODE = 4 THEN
DURATION_IN_SECONDS / 60
END),
0) - NVL(SUM(CASE
WHEN EVENT_CODE = 3 THEN
DURATION_IN_SECONDS / 60
END),
0) AS UNPRODUCTIVE_MINUTES
FROM TABLE1 e
WHERE TIMEDATE BETWEEN DATE '2019-10-01' AND DATE '2019-12-31'
GROUP BY CUSTOMER_ID, TO_CHAR(TIMEDATE, 'ww')
ORDER BY CUSTOMER_ID, WEEK
Demo

Oracle select sum by time window

Lets assume that we have the ORACLE table of the following format and data:
TIMESTAMP MESSAGENO ORGMESSAGE
------------------------- ---------------------- -------------------------------------
27.04.13 1 START PERIOD
27.04.13 3 10
27.04.13 4 5
28.04.13 5 6
28.04.13 3 20
29.04.13 4 25
29.04.13 5 26
30.04.13 2 END PERIOD
30.04.13 1 START PERIOD
01.05.13 3 10
02.05.13 4 15
02.05.13 5 16
03.05.13 3 30
03.05.13 4 35
04.05.13 5 36
05.05.13 2 END PERIOD
I want to select sum of all the ORGMESSAGE for all the period (window between START PERIOD and END PERIOD) grouped by MESSAGENO.
Exapmle output would be:
PERIOD START PERIOD END MESSAGENO SUM
------------ ------------- -------- ----
27.04.13 30.04.13 3 25
27.04.13 30.04.13 4 30
27.04.13 30.04.13 5 32
30.04.13 05.05.13 3 45
30.04.13 05.05.13 4 50
30.04.13 05.05.13 5 52
I am guessing that use of ORACLE Analityc function woulde be suitable but really dont know how and where to start.
Thanks in advance for any help.
If we assume that the period starts and ends match, then a simple way to find the matching messages is to count the preceding number of starts. This is a cumulative sum and it is easy in Oracle. The rest is just aggregation:
select min(timestamp) as periodstart, max(timestamp) as periodend, messageno, count(*)
from (select om.*,
sum(case when messageno = 1 then 1 else 0 end) over (order by timestamp) as grp
from orgmessages om
) om
where messageno not in (1, 2)
group by grp, messageno;
Note that this method (as with the others) really wants the timestamp to be unique on each record. In the data presented, these solutions will work. But if you have multiple starts and ends on the same day, none of them will work assuming that timestamp only has the date.
First find all period ends per period start. Then join with your table to group and sum.
select
dates.start_date,
dates.end_date,
messageno,
sum(to_number(orgmessage)) as period_sum
from mytable
join
(
select start_dates.timestmp as start_date, min(end_dates.timestmp) as end_date
from (select * from mytable where orgmessage = 'START PERIOD') start_dates
join (select * from mytable where orgmessage = 'END PERIOD') end_dates
on start_dates.timestmp < end_dates.timestmp
group by start_dates.timestmp
) dates on mytable.timestmp between dates.start_date and dates.end_date
where mytable.orgmessage not like '%PERIOD%'
group by dates.start_date, dates.end_date, messageno
order by dates.start_date, dates.end_date, messageno;
SQL fiddle: http://www.sqlfiddle.com/#!4/365de/15.
please, try this one, replace rrr with your table name
select periodstart, periodend, messageno, sum(to_number(orgmessage)) s
from (select TIMESTAMP periodstart,
(select min (TIMESTAMP) from rrr r2 where orgmessage = 'END PERIOD' and r2.TIMESTAMP > r.TIMESTAMP) periodend
from rrr r
where orgmessage = 'START PERIOD'
) borders, rrr r
where r.TIMESTAMP between borders.periodstart and borders.periodend
and r.orgmessage not in ('END PERIOD', 'START PERIOD')
group by periodstart, periodend, messageno
order by periodstart, periodend, messageno