Oracle MIN as analytic function - odd behavior with ORDER BY? - sql

This particular case was distilled from an example where the programmer assumed that for two shipments into a tank car, line #1 would be loaded first. I corrected this to allow for the loading to be performed in any order - however, I discovered that MIN() OVER (PARTITION BY) allows an ORDER BY in Oracle (this is not allowed in SQL Server), and additionally, it alters the behavior of the function, causing the ORDER BY to apparently be added to the PARTITION BY.
WITH data AS (
SELECT 1 AS SHIPMENT_ID, 1 AS LINE_NUMBER, 2 AS TARE, 3 AS GROSS FROM DUAL
UNION ALL
SELECT 1 AS SHIPMENT_ID, 2 AS LINE_NUMBER, 1 AS TARE, 2 AS GROSS FROM DUAL
)
SELECT MIN(tare) OVER (PARTITION BY shipment_id) first_tare
,MAX(gross) OVER (PARTITION BY shipment_id) last_gross
,FIRST_VALUE(tare) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER) first_tare_incorrect
,FIRST_VALUE(gross) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER DESC) last_gross_incorrect
,MIN(tare) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER) first_tare_incorrect_still
,MAX(gross) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER DESC) last_gross_incorrect_still
,MIN(tare) OVER (PARTITION BY shipment_id, LINE_NUMBER) first_tare_incorrect_still2
,MAX(gross) OVER (PARTITION BY shipment_id, LINE_NUMBER) last_gross_incorrect_still2
FROM data
A SQL Server example (with non-applicable code commented out):
WITH data AS (
SELECT 1 AS SHIPMENT_ID, 1 AS LINE_NUMBER, 2 AS TARE, 3 AS GROSS -- FROM DUAL
UNION ALL
SELECT 1 AS SHIPMENT_ID, 2 AS LINE_NUMBER, 1 AS TARE, 2 AS GROSS -- FROM DUAL
)
SELECT MIN(tare) OVER (PARTITION BY shipment_id) first_tare
,MAX(gross) OVER (PARTITION BY shipment_id) last_gross
-- ,FIRST_VALUE(tare) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER) first_tare_incorrect
-- ,FIRST_VALUE(gross) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER DESC) last_gross_incorrect
-- ,MIN(tare) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER) first_tare_incorrect_still
-- ,MAX(gross) OVER (PARTITION BY shipment_id ORDER BY LINE_NUMBER DESC) last_gross_incorrect_still
,MIN(tare) OVER (PARTITION BY shipment_id, LINE_NUMBER) first_tare_incorrect_still2
,MAX(gross) OVER (PARTITION BY shipment_id, LINE_NUMBER) last_gross_incorrect_still2
FROM data
So question: What is Oracle doing and why and is it right?

If you add an ORDER BY to the MIN analytic function, you turn it into a "min so far" function rather than an overall minimum. For the final row for whatever you're partitioning by, the results will be the same. But the prior rows may have a different "min so far" than the overall minimum.
Using the EMP table as an example, you can see that the minimum salary so far for the department eventually converges on the overall minimum for the department. And you can see that the "min so far" value for any given department decreases as lower values are encountered.
SQL> ed
Wrote file afiedt.buf
1 select ename,
2 deptno,
3 sal,
4 min(sal) over (partition by deptno order by ename) min_so_far,
5 min(sal) over (partition by deptno) min_overall
6 from emp
7* order by deptno, ename
SQL> /
ENAME DEPTNO SAL MIN_SO_FAR MIN_OVERALL
---------- ---------- ---------- ---------- -----------
CLARK 10 2450 2450 1300
KING 10 5000 2450 1300
MILLER 10 1300 1300 1300
ADAMS 20 1110 1110 800
FORD 20 3000 1110 800
JONES 20 2975 1110 800
SCOTT 20 3000 1110 800
smith 20 800 800 800
ALLEN 30 1600 1600 950
BLAKE 30 2850 1600 950
MARTIN 30 1250 1250 950
SM0 30 950 950 950
TURNER 30 1500 950 950
WARD 30 1250 950 950
BAR
PAV
16 rows selected.
Of course, it would make more sense to use this form of the analytic function when you're trying to do something like compute a personal best that you can use as a comparison in future periods. If you're tracking an individual's decreasing golf scores, mile times, or weight, displaying personal bests can be a form of motivation.
SQL> ed
Wrote file afiedt.buf
1 with golf_scores as
2 ( select 1 golfer_id, 80 score, sysdate dt from dual union all
3 select 1, 82, sysdate+1 dt from dual union all
4 select 1, 72, sysdate+2 dt from dual union all
5 select 1, 75, sysdate+3 dt from dual union all
6 select 1, 71, sysdate+4 dt from dual union all
7 select 2, 74, sysdate from dual )
8 select golfer_id,
9 score,
10 dt,
11 (case when score=personal_best
12 then 'New personal best'
13 else null
14 end) msg
15 from (
16 select golfer_id,
17 score,
18 dt,
19 min(score) over (partition by golfer_id
20 order by dt) personal_best
21 from golf_scores
22* )
SQL> /
GOLFER_ID SCORE DT MSG
---------- ---------- --------- -----------------
1 80 12-SEP-11 New personal best
1 82 13-SEP-11
1 72 14-SEP-11 New personal best
1 75 15-SEP-11
1 71 16-SEP-11 New personal best
2 74 12-SEP-11 New personal best
6 rows selected.

Related

Oracle sum most recent records until a defined value then Ignore the rest

Im looking to sum a column until a defined value then ignore the rest of the records.
ID
WHEN
VALUE
AVG_COL
101
2016
6
84.5
101
2015
3
76
101
2014
3
87
101
2013
15
85.8
101
2012
6
92
101
2011
3
81
101
2010
3
82.3
I need a single result set of
ID
VALUE
AVG_COL
101
30
82.3
I have tried the following
SELECT
ID,
WHEN,
VALUE,
AVG_COL,
SUM(VALUE) OVER (PARTITION BY ID ORDER BY WHEN) AS VALUE, --must equal 30
AVG(AVG_COL) OVER (PARTITION BY ID) AVG
FROM
TABLE_ONE
WHERE
VALUE = 30;
Any help would be greatly appreciated!
Hi try some thing like this, where modified the WHERE clause
-- Untested
SELECT
ID,
WHEN,
VALUE,
AVG_COL,
SUM(VALUE) OVER (PARTITION BY ID ORDER BY WHEN) AS VALUE, --must equal 30
AVG(AVG_COL) OVER (PARTITION BY ID) AVG
FROM
TABLE_ONE
WHERE
ID IN (SELECT ID FROM ( (SELECT ID, sum(VALUE) sum_val FROM
TABLE_ONE GROUP BY ID) WHERE SUM_VAL = 30);
try this
select id,
SUM(VALUE) OVER (PARTITION BY ID ORDER BY WHEN RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS VALUE,
AVG(AVG_COL) OVER (PARTITION BY ID) AVG
from table_one
where VALUE <= 30
order by when desc
fetch first 1 rows only;
You can compute the running sum with window functions, then filter and limit in an outer query.
Assuming a table like (id, dt, val, . . .):
select *
from (
select t.*,
sum(val) over(partition by id order by dt) sum_val
from mytable t
) t
where sum_val >= 30
order by row_number() over(partition by id order by dt desc)
fetch first row with ties
Notes:
this handles multiple ids at once
for each id, this brings the first row where the running sum of the value reaches at least 30, if any (rows are processed by descending date)
What you need is running sum of values and a decision on what order by you want to apply for that running sum.
SAMPLE DATA
WITH
tbl AS
(
Select 101 "ID", 2016 "YR", 6 "VAL", 84.5 "AVG_COL" From Dual Union All
Select 101 "ID", 2015 "YR", 3 "VAL", 76 "AVG_COL" From Dual Union All
Select 101 "ID", 2014 "YR", 3 "VAL", 87 "AVG_COL" From Dual Union All
Select 101 "ID", 2013 "YR", 15 "VAL", 85.8 "AVG_COL" From Dual Union All
Select 101 "ID", 2012 "YR", 6 "VAL", 92 "AVG_COL" From Dual Union All
Select 101 "ID", 2011 "YR", 3 "VAL", 81 "AVG_COL" From Dual Union All
Select 101 "ID", 2010 "YR", 3 "VAL", 82.3 "AVG_COL" From Dual
),
Create CTE (I named it grid) to prepare your data - in this case I used descending order by years:
grid AS
(
Select
ID, YR,
VAL,
Sum(VAL) OVER(Partition By ID Order By YR DESC Rows Between Unbounded Preceding And Current Row) "RUNNING_SUM",
CASE WHEN Sum(VAL) OVER(Partition By ID Order By YR DESC Rows Between Unbounded Preceding And Current Row) >= 30
THEN Sum(1) OVER(Partition By ID Order By YR DESC Rows Between Unbounded Preceding And Current Row)
END "IS_OVER_RN",
AVG_COL,
Round(AVG(AVG_COL) OVER(Partition By ID Order By YR DESC Rows Between Unbounded Preceding And Current Row), 2) "RUNNING_AVG"
From
tbl
)
This cte is resulting as:
ID YR VAL RUNNING_SUM IS_OVER_RN AVG_COL RUNNING_AVG
---------- ---------- ---------- ----------- ---------- ---------- -----------
101 2016 6 6 84.5 84.5
101 2015 3 9 76 80.25
101 2014 3 12 87 82.5
101 2013 15 27 85.8 83.33
101 2012 6 33 5 92 85.06
101 2011 3 36 6 81 84.38
101 2010 3 39 7 82.3 84.09
Now you can get your result with the code below
-- Main SQL
SELECT
ID, RUNNING_SUM, RUNNING_AVG
FROM
grid g
WHERE IS_OVER_RN = (Select Min(IS_OVER_RN) From grid Where ID = g.ID)
Result:
-- with YEARS in DESCENDING order
ID RUNNING_SUM RUNNING_AVG
---------- ----------- -----------
101 33 85.06
If you make orderings within analytic functions in grid cte ASCENDING then the same main SQL would result with:
-- with YEARS in ASCENDING order
ID RUNNING_SUM RUNNING_AVG
---------- ----------- -----------
101 30 82.5

Analytic functions and means of window clause

I'm using Oracle and SQL Developer. I have downloaded HR schema and need to do some queries with it. Now I'm working with table Employees. As an user I need to see employees with the highest gap between their salary and the average salary of all later hired colleagues in corresponding department. It seems quite interesting and really complicated. I have read some documentation and tried, for example LEAD(), that provides access to more than one row of a table at the same time:
SELECT
employee_id,
first_name
|| ' '
|| last_name,
department_id,
salary,
hire_date,
LEAD(hire_date)
OVER(PARTITION BY department_id
ORDER BY
hire_date DESC
) AS Prev_hiredate
FROM
employees
ORDER BY
department_id,
hire_date;
That shows for every person in department hiredate of later hired person. Also I have tried to use window clause to understand its concepts:
SELECT
employee_id,
first_name
|| ' '
|| last_name,
department_id,
hire_date,
salary,
AVG(salary)
OVER(PARTITION BY department_id
ORDER BY
hire_date ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
) AS avg_sal
FROM
employees
ORDER BY
department_id,
hire_date;
The result of this query will be:
However, it is not exactly what I need. I need to reduce the result just by adding column with gap (salary-avr_sal), where the gap will be highest and receive one employee per department. How should the result look like: for example, we have 60 department. We have 5 employees there ordering by hire_date. First has salary 4800, second – 9000, third – 4800, fourth – 4200, fifth – 6000. If we do calculations: 4800 - ((9000+4800+4200+6000)/4)=-1200, 9000-((4800+4200+6000)/3)=4000, 4800 -((4200+6000)/2)=-300, 4200 - 6000=-1800 and the last person in department will have the highest gap: 6000 - 0 = 6000. Let's take a look on 20 department. We have two people there: first has salary 13000, second – 6000. Calculations: 13000 - 6000 = 7000, 6000 - 0 = 6000. The highest gap will be for first person. So for department 20 the result should be person with salary 13000, for department 60 the result should be person with salary 6000 and so on.
How should look my query to get the appropriate result (what I need is marked bold up, also I want to see column with highest gap, can be different solutions with analytic functions, but should be necessarily included window clause)?
You can get the average salary of employees that were hired prior to the current employee by just adapting the rows clause of your avg:
AVG(salary) OVER(
PARTITION BY department_id
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS avg_salary
The 1 PRECEDING clause tells the database not to include the current row in the window.
If you are looking for the employees with the greatest gap to that average, we can just order by the resultset:
SELECT e.*,
AVG(salary) OVER(
PARTITION BY department_id
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS avg_salary
FROM employees e
ORDER BY ABS(salary - avg_salary) DESC;
Finally, if you want the top "outlier salary" per department, then we need at least one more level. The shortest way to express this probably is to use ROW_NUMBER() to rank employees in each department by their salary gap to the average, and then to fetch all top rows per group using WITH TIES:
SELECT *
FROM (
SELECT e.*,
AVG(salary) OVER(
PARTITION BY department_id
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS avg_salary
FROM employees e
) e
ORDER BY ROW_NUMBER() OVER(
PARTITION BY department_id
ORDER BY ABS(salary - avg_salary) DESC
)
FETCH FIRST ROW WITH TIES
Maybe this is what you are looking for.
Sample data:
WITH
emp (ID, EMP_NAME, HIRE_DATE, SALARY, DEPT) AS
(
Select 601, 'HILLER', To_Date('23-JAN-82', 'dd-MON-yy'), 4800, 60 From Dual Union All
Select 602, 'MILLER', To_Date('23-FEB-82', 'dd-MON-yy'), 9000, 60 From Dual Union All
Select 603, 'SMITH', To_Date('23-MAR-82', 'dd-MON-yy'), 4800, 60 From Dual Union All
Select 604, 'FORD', To_Date('23-APR-82', 'dd-MON-yy'), 4200, 60 From Dual Union All
Select 605, 'KING', To_Date('23-MAY-82', 'dd-MON-yy'), 6000, 60 From Dual Union All
Select 201, 'SCOT', To_Date('23-MAR-82', 'dd-MON-yy'), 13000, 20 From Dual Union All
Select 202, 'JONES', To_Date('23-AUG-82', 'dd-MON-yy'), 6000, 20 From Dual
),
Create CTE named grid with several analytic functions and windowing clauses. They are not all needed but the resulting dataset below shows the logic with all components included.
grid AS
(
Select
g.*, Max(GAP) OVER(PARTITION BY DEPT) "DEPT_MAX_GAP"
From
(
Select
ROWNUM "RN",
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN Unbounded Preceding And Current Row) "RN_DEPT",
ID, EMP_NAME, HIRE_DATE, DEPT, SALARY,
--
Nvl(Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "SUM_SAL_LATER",
Nvl(Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "COUNT_EMP_LATER",
--
Nvl(Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following), 0) "AVG_LATER",
--
SALARY -
Nvl((
Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following)
), 0) "GAP"
from
emp
Order By
DEPT, HIRE_DATE, ID
) g
Order By
RN
)
CTE grid resultiing dataset:
RN
RN_DEPT
ID
EMP_NAME
HIRE_DATE
DEPT
SALARY
SUM_SAL_LATER
COUNT_EMP_LATER
AVG_LATER
GAP
DEPT_MAX_GAP
1
1
601
HILLER
23-JAN-82
60
4800
24000
4
6000
-1200
6000
2
2
602
MILLER
23-FEB-82
60
9000
15000
3
5000
4000
6000
3
3
603
SMITH
23-MAR-82
60
4800
10200
2
5100
-300
6000
4
4
604
FORD
23-APR-82
60
4200
6000
1
6000
-1800
6000
5
5
605
KING
23-MAY-82
60
6000
0
0
0
6000
6000
6
1
201
SCOT
23-MAR-82
20
13000
6000
1
6000
7000
7000
7
2
202
JONES
23-AUG-82
20
6000
0
0
0
6000
7000
Main SQL
SELECT
g.ID, g.EMP_NAME, g.HIRE_DATE, g.DEPT, g.SALARY, g.GAP
FROM
grid g
WHERE
g.GAP = g.DEPT_MAX_GAP
Order By
RN
Resulting as:
ID
EMP_NAME
HIRE_DATE
DEPT
SALARY
GAP
605
KING
23-MAY-82
60
6000
6000
201
SCOT
23-MAR-82
20
13000
7000
Without CTE and with all unnecessery columns excluded it looks like this:
SELECT ID, EMP_NAME, HIRE_DATE, DEPT, SALARY, GAP
FROM
(
( Select g.*, Max(GAP) OVER(PARTITION BY DEPT) "DEPT_MAX_GAP"
From( Select
ID, EMP_NAME, HIRE_DATE, DEPT, SALARY,
SALARY -
Nvl(( Sum(SALARY) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following) /
Sum(1) OVER(Partition By DEPT Order By DEPT, HIRE_DATE, ID ROWS BETWEEN 1 Following And Unbounded Following)
), 0) "GAP"
From emp
Order By DEPT, HIRE_DATE, ID
) g
)
)
WHERE GAP = DEPT_MAX_GAP
Order By DEPT, HIRE_DATE, ID
It seems like this is all you need.
Regards...

How to find regions where total of their sale exceeded 60%

I have a table interest_summary table with two columns:
int_rate number,
total_balance number
example
10.25 50
10.50 100
10.75 240
11.00 20
My query should return in 2 columns or a string like 10.50 to 10.75 because adding their total exceed 60% of total amount added together
Could you suggest a logic in Oracle?
select
min(int_rate),
max(int_rate)
from
(
select
int_rate,
nvl(sum(total_balance) over(
order by total_balance desc
rows between unbounded preceding and 1 preceding
),0) as part_sum
from interest_summary
)
where
part_sum < (select 0.6*sum(total_balance) from interest_summary)
fiddle
I'm assuming that you're selecting the rows based on the following algorithm:
Sort your rows by total_balance (descending)
Select the highest total_balance row remaining
If its total_balance added to the running total of the total balance is under 60%, add it to the pool and get the next row (step 2)
If not add the row to the pool and return.
The sorted running total looks like this (I'll number the rows so that it's easier to understand what happens):
SQL> WITH data AS (
2 SELECT 1 id, 10.25 interest_rate, 50 total_balance FROM DUAL
3 UNION ALL SELECT 2 id, 10.50 interest_rate, 100 total_balance FROM DUAL
4 UNION ALL SELECT 3 id, 10.75 interest_rate, 240 total_balance FROM DUAL
5 UNION ALL SELECT 4 id, 11.00 interest_rate, 20 total_balance FROM DUAL
6 )
7 SELECT id, interest_rate,
8 SUM(total_balance) OVER (ORDER BY total_balance DESC) running_total,
9 SUM(total_balance) OVER (ORDER BY total_balance DESC)
10 /
11 SUM(total_balance) OVER () * 100 pct_running_total
12 FROM data
13 ORDER BY 3;
ID INTEREST_RATE RUNNING_TOTAL PCT_RUNNING_TOTAL
---------- ------------- ------------- -----------------
3 10,75 240 58,5365853658537
2 10,5 340 82,9268292682927
1 10,25 390 95,1219512195122
4 11 410 100
So in this example we must return rows 3 and 2 because row 2 is the first row where its percent running total is above 60%:
SQL> WITH data AS (
2 SELECT 1 id, 10.25 interest_rate, 50 total_balance FROM DUAL
3 UNION ALL SELECT 2 id, 10.50 interest_rate, 100 total_balance FROM DUAL
4 UNION ALL SELECT 3 id, 10.75 interest_rate, 240 total_balance FROM DUAL
5 UNION ALL SELECT 4 id, 11.00 interest_rate, 20 total_balance FROM DUAL
6 )
7 SELECT ID, interest_rate
8 FROM (SELECT ID, interest_rate,
9 SUM(over_limit)
10 OVER(ORDER BY total_balance DESC) over_limit_no
11 FROM (SELECT id,
12 interest_rate,
13 total_balance,
14 CASE
15 WHEN SUM(total_balance)
16 OVER(ORDER BY total_balance DESC)
17 / SUM(total_balance) OVER() * 100 < 60 THEN
18 0
19 ELSE
20 1
21 END over_limit
22 FROM data
23 ORDER BY 3))
24 WHERE over_limit_no <= 1;
ID INTEREST_RATE
---------- -------------
3 10,75
2 10,5

MySQL AVG() in sub-query

What one query can produce table_c?
I have three columns: day, person, and revenue_per_person. Right now I have to use two queries since I lose 'person' when producing table_b.
table_a uses all three columns:
SELECT day, person, revenue_per_person
FROM purchase_table
GROUP BY day, person
table_b uses only two columns due to AVG() and GROUP BY:
SELECT day, AVG(revenue) as avg_revenue
FROM purchase_table
GROUP BY day
table_c created from table_a and table_b:
SELECT
CASE
WHEN revenue_per_person > avg_revenue THEN 'big spender'
ELSE 'small spender'
END as spending_bucket
FROM ????
Maybe this could help, try this one
SELECT a.day,
CASE
WHEN a.revenue_per_person > b.avg_revenue THEN 'big spender'
ELSE 'small spender'
END as spending_bucket
FROM
(
SELECT day, person, AVG(revenue) revenue_per_person
FROM purchase_table
GROUP BY day, person
) a INNER JOIN
(
SELECT day, AVG(revenue) as avg_revenue
FROM purchase_table
GROUP BY day
) b ON a.day = b.day
You might want to use analytic functions.
An Oracle example showing if a person's salary is greater than average salary in his department.
08:56:54 HR#vm_xe> ed
Wrote file s:\toolkit\service\buffer.sql
1 select
2 department_id
3 ,employee_id
4 ,salary
5 ,avg_salary
6 ,case when salary > avg_salary then 1 else 0 end case_is_greater
7 from (
8 select
9 department_id
10 ,employee_id
11 ,salary
12 ,round(avg(salary) over(partition by department_id),2) avg_salary
13 from employees
14 )
15* where department_id = 30
08:58:56 HR#vm_xe> /
DEPARTMENT_ID EMPLOYEE_ID SALARY AVG_SALARY CASE_IS_GREATER
------------- ----------- ---------- ---------- ---------------
30 114 11000 4150 1
30 115 3100 4150 0
30 116 2900 4150 0
30 117 2800 4150 0
30 118 2600 4150 0
30 119 2500 4150 0
6 rows selected.
Elapsed: 00:00:00.01
If you are using a database that supports windows functions, you can do this as:
SELECT (CASE WHEN revenue_per_person > avg_revenue THEN 'big spender'
ELSE 'small spender'
END) as spending_bucket
FROM (select pt.*,
avg(revenue) over (partition by day, person) as revenue_per_person,
avg(revenue) over (partition by day) as avg_revenue,
row_number() over (partition by day, person order by day) as seqnum
from purchase_table pt
) t
where seqnum = 1
The purpose of seqnum is to just get one row per person/day combination.

(Oracle) how to group rows for pagination

I have a table that outputs similar to this (although in thousands):
EMPNO ENAME TRANDATE AMT
---------- ---------- --------- -------
100 Alison 21-MAR-96 45000
100 Alison 12-DEC-78 23000
100 Alison 24-OCT-82 11000
101 Linda 15-JAN-84 16000
101 Linda 30-JUL-87 17000
102 Celia 31-DEC-90 78000
102 Celia 17-SEP-96 21000
103 James 21-MAR-96 45000
103 James 12-DEC-78 23000
103 James 24-OCT-82 11000
104 Robert 15-JAN-84 16000
104 Robert 30-JUL-87 17000
My desired output would be similar to this:
EMPNO ENAME TRANDATE AMT PAGE
---------- ---------- --------- ------- ----
100 Alison 21-MAR-96 45000 1
100 Alison 12-DEC-78 23000 1
100 Alison 24-OCT-82 11000 1
101 Linda 15-JAN-84 16000 2
101 Linda 30-JUL-87 17000 2
102 Celia 31-DEC-90 78000 2
102 Celia 17-SEP-96 21000 2
103 James 21-MAR-96 45000 3
104 Robert 12-DEC-78 23000 4
104 Robert 24-OCT-82 11000 4
104 Robert 15-JAN-84 16000 4
104 Robert 30-JUL-87 17000 4
Basically, it should insert a new field to identify the page it belongs to. The page break is based on the rows. And, as if "kept together" in EMPNO, it adds 1 to PAGE when the rows cannot add the next EMPNO batch. It's for the Excel limit since Excel does not allow more than 65000 rows (or so) in a single Sheet. In the sample's case, it's only 4 rows. The limit number is static.
ThinkJet is right that that some of the other answers don't cater for the 'keep together' requirement. However I think this can be done without resorting to a user-defined aggregate.
Sample data
create table test (empno number, ename varchar2(20), trandate date, amt number);
insert into test values (100, 'Alison' , to_date('21-MAR-1996') , 45000);
insert into test values (100, 'Alison' , to_date('12-DEC-1978') , 23000);
insert into test values (100, 'Alison' , to_date('24-OCT-1982') , 11000);
insert into test values (101, 'Linda' , to_date('15-JAN-1984') , 16000);
insert into test values (101, 'Linda' , to_date('30-JUL-1987') , 17000);
insert into test values (102, 'Celia' , to_date('31-DEC-1990') , 78000);
insert into test values (102, 'Celia' , to_date('17-SEP-1996') , 21000);
insert into test values (103, 'James' , to_date('21-MAR-1996') , 45000);
insert into test values (103, 'James' , to_date('12-DEC-1978') , 23000);
insert into test values (103, 'James' , to_date('24-OCT-1982') , 11000);
insert into test values (104, 'Robert' , to_date('15-JAN-1984') , 16000);
insert into test values (104, 'Robert' , to_date('30-JUL-1987') , 17000);
Now, determine the end row of each empno segment (using RANK to find the start and COUNT..PARTITION BY to find the number in the segment).
Then use ceil/4 from APC's solution to group them into their 'pages'. Again, as pointed out by ThinkJet, there is a problem in the specification as it doesn't cater for the situation when there are more records in the empno 'keep together' segment than can fit in a page.
select empno, ename,
ceil((rank() over (order by empno) +
count(1) over (partition by empno))/6) as chunk
from test
order by 1;
As pointed out by ThinkJet, this solution isn't bullet proof.
drop table test purge;
create table test (empno number, ename varchar2(20), trandate date, amt number);
declare
cursor csr_name is
select rownum emp_id,
decode(rownum,1,'Alan',2,'Brian',3,'Clare',4,'David',5,'Edgar',
6,'Fred',7,'Greg',8,'Harry',9,'Imran',10,'John',
11,'Kevin',12,'Lewis',13,'Morris',14,'Nigel',15,'Oliver',
16,'Peter',17,'Quentin',18,'Richard',19,'Simon',20,'Terry',
21,'Uther',22,'Victor',23,'Wally',24,'Xander',
25,'Yasmin',26,'Zac') emp_name
from dual connect by level <= 26;
begin
for c_name in csr_name loop
for i in 1..11 loop
insert into test values
(c_name.emp_id, c_name.emp_name, (date '2010-01-01') + i,
to_char(sysdate,'SS') * 1000);
end loop;
end loop;
end;
/
select chunk, count(*)
from
(select empno, ename,
ceil((rank() over (order by empno) +
count(1) over (partition by empno))/25) as chunk
from test)
group by chunk
order by chunk
;
So with chunk size of 25 and group size of 11, we get the jumps where it fits 33 people in the chunk despite the 25 limit. Large chunk sizes and small groups should make this infrequent, but you'd want to allow some leeway. So maybe set the chunks to 65,000 rather than going all the way to 65,536.
It's too tricky or even impossible to do such thing in plain SQL.
But with some limitations problem can be resolved with help of user-defined aggregate functions .
First, create object with ODCIAggregate interface implementation:
create or replace type page_num_agg_type as object
(
-- Purpose : Pagination with "leave together" option
-- Attributes
-- Current page number
cur_page_number number,
-- Cumulative number of rows per page incremented by blocks
cur_page_row_count number,
-- Row-by-row counter for detect page overflow while placing single block
page_row_counter number,
-- Member functions and procedures
static function ODCIAggregateInitialize(
sctx in out page_num_agg_type
)
return number,
member function ODCIAggregateIterate(
self in out page_num_agg_type,
value in number
)
return number,
member function ODCIAggregateTerminate(
self in page_num_agg_type,
returnValue out number,
flags in number
)
return number,
member function ODCIAggregateMerge(
self in out page_num_agg_type,
ctx2 in page_num_agg_type
)
return number
);
Create type body:
create or replace type body PAGE_NUM_AGG_TYPE is
-- Member procedures and functions
static function ODCIAggregateInitialize(
sctx in out page_num_agg_type
)
return number
is
begin
sctx := page_num_agg_type(1, 0, 0);
return ODCIConst.Success;
end;
member function ODCIAggregateIterate(
self in out page_num_agg_type,
value in number
)
return number
is
-- !!! WARNING: HARDCODED !!!
RowsPerPage number := 4;
begin
self.page_row_counter := self.page_row_counter + 1;
-- Main operations: determine number of page
if(value > 0) then
-- First row of new block
if(self.cur_page_row_count + value > RowsPerPage) then
-- If we reach next page with new block of records - switch to next page.
self.cur_page_number := self.cur_page_number + 1;
self.cur_page_row_count := value;
self.page_row_counter := 1;
else
-- Just increment rows and continue to place on current page
self.cur_page_row_count := self.cur_page_row_count + value;
end if;
else
-- Row from previous block
if(self.page_row_counter > RowsPerPage) then
-- Single block of rows exceeds page size - wrap to next page.
self.cur_page_number := self.cur_page_number + 1;
self.cur_page_row_count := self.cur_page_row_count - RowsPerPage;
self.page_row_counter := 1;
end if;
end if;
return ODCIConst.Success;
end;
member function ODCIAggregateTerminate(
self in page_num_agg_type,
returnValue out number,
flags in number
)
return number
is
begin
-- Returns current page number as result
returnValue := self.cur_page_number;
return ODCIConst.Success;
end;
member function ODCIAggregateMerge(
self in out page_num_agg_type,
ctx2 in page_num_agg_type
)
return number
is
begin
-- Can't act in parallel - error on merging attempts
raise_application_error(-20202,'PAGE_NUM_AGG_TYPE can''t act in parallel mode');
return ODCIConst.Success;
end;
end;
Create agrreation function to use with type:
create function page_num_agg (
input number
) return number aggregate using page_num_agg_type;
Next prepare data and use new function to calculate page numbers:
with data_list as (
-- Your example data as source
select 100 as EmpNo, 'Alison' as EmpName, to_date('21-MAR-96','dd-mon-yy') as TranDate, 45000 as AMT from dual union all
select 100 as EmpNo, 'Alison' as EmpName, to_date('12-DEC-78','dd-mon-yy') as TranDate, 23000 as AMT from dual union all
select 100 as EmpNo, 'Alison' as EmpName, to_date('24-OCT-82','dd-mon-yy') as TranDate, 11000 as AMT from dual union all
select 101 as EmpNo, 'Linda' as EmpName, to_date('15-JAN-84','dd-mon-yy') as TranDate, 16000 as AMT from dual union all
select 101 as EmpNo, 'Linda' as EmpName, to_date('30-JUL-87','dd-mon-yy') as TranDate, 17000 as AMT from dual union all
select 102 as EmpNo, 'Celia' as EmpName, to_date('31-DEC-90','dd-mon-yy') as TranDate, 78000 as AMT from dual union all
select 102 as EmpNo, 'Celia' as EmpName, to_date('17-SEP-96','dd-mon-yy') as TranDate, 21000 as AMT from dual union all
select 103 as EmpNo, 'James' as EmpName, to_date('21-MAR-96','dd-mon-yy') as TranDate, 45000 as AMT from dual union all
select 103 as EmpNo, 'James' as EmpName, to_date('12-DEC-78','dd-mon-yy') as TranDate, 23000 as AMT from dual union all
select 103 as EmpNo, 'James' as EmpName, to_date('24-OCT-82','dd-mon-yy') as TranDate, 11000 as AMT from dual union all
select 104 as EmpNo, 'Robert' as EmpName, to_date('15-JAN-84','dd-mon-yy') as TranDate, 16000 as AMT from dual union all
select 104 as EmpNo, 'Robert' as EmpName, to_date('30-JUL-87','dd-mon-yy') as TranDate, 17000 as AMT from dual union all
select 105 as EmpNo, 'Monica' as EmpName, to_date('30-JUL-88','dd-mon-yy') as TranDate, 31000 as AMT from dual union all
select 105 as EmpNo, 'Monica' as EmpName, to_date('01-JUL-87','dd-mon-yy') as TranDate, 19000 as AMT from dual union all
select 105 as EmpNo, 'Monica' as EmpName, to_date('31-JAN-97','dd-mon-yy') as TranDate, 11000 as AMT from dual union all
select 105 as EmpNo, 'Monica' as EmpName, to_date('17-DEC-93','dd-mon-yy') as TranDate, 33000 as AMT from dual union all
select 105 as EmpNo, 'Monica' as EmpName, to_date('11-DEC-91','dd-mon-yy') as TranDate, 65000 as AMT from dual union all
select 105 as EmpNo, 'Monica' as EmpName, to_date('22-OCT-89','dd-mon-yy') as TranDate, 19000 as AMT from dual
),
ordered_data as (
select
-- Source table fields
src_data.EmpNo, src_data.EmpName, src_data.TranDate, src_data.AMT,
-- Calculate row count per one employee
count(src_data.EmpNo) over(partition by src_data.EmpNo)as emp_row_count,
-- Calculate rank of row inside employee data sorted in output order
rank() over(partition by src_data.EmpNo order by src_data.EmpName, src_data.TranDate) as emp_rnk
from
data_list src_data
)
-- Final step: calculate page number for rows
select
-- Source table data
ordered_data.EmpNo, ordered_data.EmpName, ordered_data.TranDate, ordered_data.AMT,
-- Aggregate all data with our new function
page_num_agg(
-- pass count of rows to aggregate function only for first employee's row
decode(ordered_data.emp_rnk, 1, ordered_data.emp_row_count, 0)
)
over (order by ordered_data.EmpName, ordered_data.TranDate) as page_number
from
ordered_data
order by
ordered_data.EmpName, ordered_data.TranDate
And, finally ...
Disadvantages of this solution:
Hardcoded page row count.
Requires some specific data preparation in query to use aggregate function properly.
Advantages of this solution:
Just works :)
Updated: improved to handle oversized blocks, example modified.
The following SQL statement splits the twenty records in my EMP table into five pages of four rows each:
SQL> select empno
2 , ename
3 , deptno
4 , ceil((row_number() over (order by deptno, empno)/4)) as pageno
5 from emp
6 /
EMPNO ENAME DEPTNO PAGENO
---------- ---------- ---------- ----------
7782 BOEHMER 10 1
7839 SCHNEIDER 10 1
7934 KISHORE 10 1
7369 CLARKE 20 1
7566 ROBERTSON 20 2
7788 RIGBY 20 2
7876 KULASH 20 2
7902 GASPAROTTO 20 2
7499 VAN WIJK 30 3
7521 PADFIELD 30 3
7654 BILLINGTON 30 3
7698 SPENCER 30 3
7844 CAVE 30 4
7900 HALL 30 4
8083 KESTELYN 30 4
8084 LIRA 30 4
8060 VERREYNNE 50 5
8061 FEUERSTEIN 50 5
8085 TRICHLER 50 5
8100 PODER 50 5
20 rows selected.
SQL>
How about this one: (100 is the row limit per page)
select
a.*, floor(count(1) over (order by empno, empname)/100)+1 as page
from source_table a
order by page, empno, empname;