I am new to Oracle SQL programming (programming in general actually) and recently discovered analytical functions. I am convinced I can use them to automate my daily, weekly, and monthly reporting.
I need help tracking week-on-week and month-on-month sales and purchases given a particular date as the start of the year or and/or week.
I have a table that maintains daily sales and purchases transactions by sales agents, how do I dynamically track revenue growth (since the year began) for week-on-week as well as month-on-month purchases per territory per region from our transactions table.
Our reporting week runs from Wednesday through Tuesday. I have managed to get some output(albeit not entirely accurate) for month-on-month but week-on-week is challenging me. How do maintain a dynamic counter of days in Oracle SQL that refreshes and starts another week each time it adds up to 7 days across the entire daily transactions table?
The interesting challenge I have in my head that I can't seem to put into code is what happens when I have a partial week's worth of data?! I want to be able to compare, say, 3 days worth of the current week's transactions with 3 days worth of the previous week's transactions. The same can be said for a partial month's worth of data.
This is the month-on-month analysis code I have managed so far.
WITH
monthly_revenue as (
SELECT
to_char(txn_date, 'YYYY-MM') as month_key,
sum(stock_purchased) as revenue
FROM transactions_table
GROUP BY to_char(date_key, 'YYYY-MM')
),
prev_month_revenue as (
SELECT
month_key, revenue, lag(revenue) over (order by month_key) as
prev_month_revenue
FROM monthly_revenue
)
SELECT month_key,revenue,prev_month_revenue, round(100.0*(revenue-
prev_month_revenue)/prev_month_revenue,1) as revenue_growth
FROM prev_month_revenue
ORDER BY month_key;
The structure of my table is as below:
txn_date DATE,
agent_id NUMBER(12),
supervisor_id NUMBER(12),
stock_purchased NUMBER(15),
stock_sold NUMBER(15),
no_of_txns NUMBER(15),
account_balance NUMBER(15)
I would like to have my output in the format below;
Week-Start | Week-End | Week_Purchases | Previous_Week_Purchases | % Growth
If I can get over the initial hurdle of tracking week-on-week purchases and sales, I can easily attach location information.
The trunc function truncates a date to the specified unit. By default this is the day. But you can also use it to get the start of the previous week/month/quarter/year.
The format iw returns the start of the ISO week. Which is a Monday.
So how does that help you with weeks running Weds-Tues?
Subtract two from your date before passing it to trunc and voila!
with rws as (
select date'2018-07-24'+level dt from dual
connect by level <= 14
)
select * from rws;
DT
25-JUL-2018
26-JUL-2018
27-JUL-2018
28-JUL-2018
29-JUL-2018
30-JUL-2018
31-JUL-2018
01-AUG-2018
02-AUG-2018
03-AUG-2018
04-AUG-2018
05-AUG-2018
06-AUG-2018
07-AUG-2018
with rws as (
select date'2018-07-24'+level dt from dual
connect by level <= 14
)
select trunc ( dt-2, 'iw' ),
to_char ( min ( dt ), 'DY' ) week_start_day,
to_char ( max ( dt ), 'DY' ) week_end_day
from rws
group by trunc ( dt-2, 'iw' )
order by trunc ( dt-2, 'iw' );
TRUNC(DT-2,'IW') WEEK_START_DAY WEEK_END_DAY
23-JUL-2018 WED TUE
30-JUL-2018 WED TUE
Regarding:
I want to be able to compare, say, 3 days worth of the current week's transactions with 3 days worth of the previous week's transactions
I'm not sure what you're asking here. But you can use the windowing clause of analytic functions to get values that fall in a specific offset from the current. For example, the following calculates two running total. The first over the past three days. The second the corresponding three days in the previous week:
with rws as (
select date'2018-07-24'+level dt ,
round ( dbms_random.value( 1, 100 ) ) val
from dual
connect by level <= 10
)
select dt, val,
sum ( val ) over (
order by dt range between 3 preceding and current row
) past_three,
sum ( val ) over (
order by dt range between 10 preceding and 7 preceding
) three_prev_week
from rws
order by dt;
DT VAL PAST_THREE THREE_PREV_WEEK
25-JUL-2018 5 5 <null>
26-JUL-2018 89 94 <null>
27-JUL-2018 34 128 <null>
28-JUL-2018 88 216 <null>
29-JUL-2018 48 259 <null>
30-JUL-2018 25 195 <null>
31-JUL-2018 19 180 <null>
01-AUG-2018 71 163 5
02-AUG-2018 12 127 94
03-AUG-2018 39 141 128
Related
Say the scenario is this:
I have a database of student infractions. When a student is late to class, or misses a homework assignment they get an infraction.
student_id
infraction_type
day
1
tardy
0
2
missed_assignment
0
1
tardy
29
2
missed_assignment
15
1
tardy
99
2
missed_assignment
29
The school has three strike system, at each infraction disciplinary action is taken. Call them D0,D1,D2.
Infractions expire after 30 days.
I want to be able to perform a query to calculate the total counts of disciplinary actions taken in a given time period.
So the number of disciplinary actions taken in the last 100 days (at day 99) would be
disciplinary_action
count
D0
3
D1
2
D2
1
A table generated showing the disciplinary actions taken would look like:
student_id
infraction_type
day
disciplinary_action_gen
1
tardy
0
D0
2
missed_assignment
0
D0
1
tardy
29
D1
2
missed_assignment
15
D1
1
tardy
99
D0
2
missed_assignment
29
D2
What SQL query could I use to do such a cumulative sum?
You can solve your problem by checking in the following order:
if <30 days have passed from the last two infractions, assign D2
if <30 days have passed from last infraction, assign D1
assign D0 (given its the first infraction)
This will work assuming your DBMS supports the tools used for this solution, namely:
the CASE expression, to conditionally assign infraction values
the LAG window function, to retrieve the previous "day" values
SELECT *,
CASE WHEN day - LAG(day,2) OVER(PARTITION BY student_id
ORDER BY day ) < 30 THEN 'D2'
WHEN day - LAG(day,1) OVER(PARTITION BY student_id
ORDER BY day ) < 30 THEN 'D1'
ELSE 'D0'
END AS disciplinary_action_gen
FROM tab
Check a MySQL demo here.
A similar approach using COUNT() as a window function and a frame definition -
SELECT
*,
CONCAT(
'D',
LEAST(
3,
COUNT(*) OVER (
PARTITION BY student_id
ORDER BY day ASC
RANGE BETWEEN 30 PRECEDING AND CURRENT ROW
)
) - 1
) AS disciplinary_action_gen
FROM infractions;
The frame definition (RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) tells the server that we want to include all rows with a day value between (current row's value of day - 30) and (the current row's value of day). So, if the current row has a day value of 99, the count will be for all rows in the partition with a day value between 69 and 99.
To get the disciplinary counts, we can simply wrap this in a normal GROUP BY -
SELECT disciplinary_action, COUNT(*) AS count
FROM (
SELECT
CONCAT(
'D',
LEAST(
3,
COUNT(*) OVER (
PARTITION BY student_id
ORDER BY day ASC
RANGE BETWEEN 30 PRECEDING AND CURRENT ROW
)
) - 1
) AS disciplinary_action
FROM infractions
) t
GROUP BY disciplinary_action;
If your infractions are stored with a date, as opposed to the days in your example, this can be easily updated to use a date interval in the frame definition. And, if looking at counts of disciplinary actions in the last 100 days we need to include the previous 30 days, as these could impact the action (D0, D1 or D2) on the first day we are interested in.
SELECT disciplinary_action, COUNT(*) AS count
FROM (
SELECT
`date`,
CONCAT(
'D',
LEAST(
3,
COUNT(*) OVER (
PARTITION BY student_id
ORDER BY `date` ASC
RANGE BETWEEN INTERVAL 30 DAY PRECEDING AND CURRENT ROW
)
) - 1
) AS disciplinary_action
FROM infractions
WHERE `date` >= CURRENT_DATE - INTERVAL 130 DAY
) t
WHERE `date` >= CURRENT_DATE - INTERVAL 100 DAY
GROUP BY disciplinary_action;
Here's a db<>fiddle
I have a dataset that's just a list of orders made by customers each day.
order_date
month
week
customer
2022-10-06
10
40
Paul
2022-10-06
10
40
Edward
2022-10-01
10
39
Erick
2022-09-26
9
39
Divine
2022-09-23
9
38
Alice
2022-09-21
9
38
Evelyn
My goal is to calculate the total number of unique customers within a two-week period. I can count the number of customers within a month or week period but not two weeks. Also, the two weeks are in a rolling order such that weeks 40 and 39 (as in the sample above) is one window period while weeks 39 and 38 is the next frame.
So far, this is how I am getting the monthly and weekly numbers. Assume that the customer names are distinct per day.
select order_date,
month,
week,
COUNT(DISTINCT customer) over (partition by month) month_active_outlets,
COUNT(DISTINCT customer) OVER (partition by week) week active outlets,
from table
Again, I am unable to calculate the unique customer names within a two-week period.
I think the easiest would be to create your own grouper in a subquery and then use that to get to your count. Currently, COUNT UNIQUE and ORDER BY in the window is not supported, therefore that approach wouldn't work.
A possible query could be:
WITH
week_before AS (
SELECT
EXTRACT(WEEK from order_date) as week, --to be sure this is the same week format
month,
CONCAT(week,'-', EXTRACT(WEEK FROM DATE_SUB(order_date, INTERVAL 7 DAY))) AS two_weeks,
customer
FROM
`test`.`Basic`)
SELECT
two_weeks,
COUNT(DISTINCT customer) AS unique_customer
FROM
week_before
GROUP BY
two_weeks
The window function is the right tool. To obtain the 2 week date, we first extract the week number of the year:
mod(extract(week from order_date),2)
If the week number is odd (modulo 2) we add a week. Then we trunc to the start of (the even) week.
date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week )
with tbl as
(Select date("2022-10-06") as order_date, "Paul" as customer
union all select date("2022-10-06"),"Edward"
union all select date("2022-10-01"),"Erick"
union all select date("2022-09-26"),"Divine"
union all select date("2022-09-23"),"Alice"
union all select date("2022-09-21"),"Evelyn"
)
select *,
date_trunc(order_date,month) as month,
date_trunc(order_date,week) as week,
COUNT(DISTINCT customer) OVER week2 as customer_2weeks,
string_agg(cast(order_date as string)) over week2 as list_2weeks,
from tbl
window week2 as (partition by date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week ))
The first days of a year are counted to the last week of the previous year:
select order_date,
extract(isoweek from order_date),
date_trunc(date_add(order_date,interval mod(extract(week from order_date),2) week),week)
from
unnest(generate_date_array(date("2021-12-01"),date("2023-01-14"))) order_date
order by 1
We have a dataset of contracts, with columns indicating counterparty, value, start date, and end date.
We are looking for a summary of total contract value, per counterparty, per calendar year.
Before we could apply a GROUP BY to the data, we would need a calculated column for each calendar year, with the contract value assigned pro rata.
Example: start date 30/06/2015, end date 31/12/2017, contract value €500.000
the contract is about 2,5 years, so a value of €200.000 [€500.000 / 2,5] is allocated (pro rata) per year.
in the year 2015, the value is for half a year, so is assigned about €100.000
in the year 2016, the value is for a full year, so is assigned about €200.000
in the year 2017, idem, so the value assigned is about €200.000
(The values in the example are not exact but simplified for illustrative purposes, as the time between 30/06/2015 and 31/12/2015 for instance is not exactly half a year, but this goes beyond the purpose of the issue)
There is no fixed contract length; some span years, others are daily or hourly (the dates are of data type 'timestamp').
How can we do this efficiently, without having to write a select clause for every single calendar year? The start and end dates namely span several decades.
The Oracle version is 19c.
You can use:
SELECT t.counterparty,
t.value AS total_value,
c.year_start,
ROUND(
t.value * (year_end - year_start) / (end_date - start_date),
2
)AS year_value
FROM table_name t
CROSS APPLY (
SELECT GREATEST(
ADD_MONTHS(TRUNC(start_date, 'YY'), 12 * (LEVEL - 1)),
start_date
) AS year_start,
LEAST(
ADD_MONTHS(TRUNC(start_date, 'YY'), 12 * LEVEL),
end_date
) AS year_end
FROM DUAL
CONNECT BY ADD_MONTHS(TRUNC(start_date, 'YY'), 12 * (LEVEL - 1)) < end_date
) c
Which, for the sample data:
CREATE TABLE table_name (counterparty, value, start_date, end_date) AS
SELECT 'A', 500000, DATE '2015-06-30', DATE '2017-12-31' FROM DUAL;
Outputs:
COUNTERPARTY
TOTAL_VALUE
YEAR_START
YEAR_VALUE
A
500000
2015-06-30 00:00:00
101092.9
A
500000
2016-01-01 00:00:00
200000
A
500000
2017-01-01 00:00:00
198907.1
Note: amounts are calculated using the ratio of the number of days (and fractional days) in each year compared to the number of days in the entire period. There is more than half-a-year between 2015-06-30 00:00:00 and the start of the next year so it is assigned slightly more than 100,000 value. Similarly, the final year stops at 2017-12-31 00:00:00 which is slightly less than a full year so gets assigned slightly less than 200,000 value.
fiddle
I have a SQL query that pulls in three columns as below
employee_id start_date end_date hours
123 09-01-2019 09-02-2019 8
123 09-28-2019 10-01-2019 32
I want to rewrite the query so instead of going granular, i just want to know the sum(hrs) an employee has on a year month level like below:
employee_id Year_Month hours
123 201909 32
123 201910 8
The employee has 4 days in September so 4*8=32 and one day in october so 8 hours for the month of October. My issue is when there are start and end dates that cross between adjacent months. I'm not sure how to write a query to get my desired output and I'd really appreciate any help on this
It might be simpler to use a recursive query to generate series of days in each month, then aggregate by month and count:
with
data as (< your existing query here >),
cte (employee_id, dt, max_dt) as (
select employee_id, start_date, end_date from data
union all
select employee_id, dt + 1, max_dt from cte where dt + 1 < max_dt
)
select employee_id, to_char(dt, 'yyyymm') year_months, count(*) * 8 hours
from mytable
group by employee_id, to_char(dt, 'yyyymm')
This assumes 8 hours per day, as explained in your question.
I have a table of 'semesters' of variable lengths with variable breaks in between them with a constraint such that a 'start_date' is always greater than the previous 'end_date':
id start_date end_date
-----------------------------
1 2012-10-01 2012-12-20
2 2013-01-05 2013-03-28
3 2013-04-05 2013-06-29
4 2013-07-10 2013-09-20
And a table of students as follows, where a start date may occur at any time within a given semester:
id start_date n_weeks
-------------------------
1 2012-11-15 25
2 2013-02-12 8
3 2013-03-02 12
I am attempting to compute an 'end_date' by joining the 'students' on 'semesters' which takes into account the variable-length breaks in-between semesters.
I can draw in the previous semester's end date (ie from the previous row's end_date) and by subtraction find the number of days in-between semesters using the following:
SELECT start_date
, end_date
, lag(end_date) OVER () AS prev_end_date
, start_date - lag(end_date) OVER () AS days_break
FROM terms
ORDER BY start_date;
Clearly, if there were to be only two terms, it would simply be a matter of adding the 'break' in days (perhaps, cast to 'weeks') -- and thereby extend the 'end_date' by that same period of time.
But should 'n_weeks' for a given student span more than one term, how could such a query be structured ?
Been banging my head against a wall for the last couple of days and I'd be immensely grateful for any help anyone would be able to offer....
Many thanks.
Rather than just looking at the lengths of semesters or the gaps between them, you could generate a list of all the dates that are within a semester using generate_series(), like this:
SELECT
row_number() OVER () as day_number,
day
FROM
(
SELECT
generate_series(start_date, end_date, '1 day') as day
FROM
semesters
) as day_series
ORDER BY
day
(SQLFiddle demo)
This assigns each day that is during a semester an arbitrary but sequential "day number", skipping out all the gaps between semesters.
You can then use this as a sub-query/CTE JOINed to your table of students: first find the "day number" of their start date, then add 7 * n_weeks to find the "day number" of their end date, and finally join back to find the actual date for that "day number".
This assumes that there is no special handling needed for partial weeks - i.e. if n_weeks is 4, the student must be enrolled for 28 days which are within the duration of a semeseter. The approach could be adapted to measure weeks (pass 1 week as the last argument to generate_series()), with the additional step of finding which week the student's start_date falls into.
Here's a complete query (SQLFiddle demo here):
WITH semester_days AS
(
SELECT
semester_id,
row_number() OVER () as day_number,
day_date::date
FROM
(
SELECT
id as semester_id,
generate_series(start_date, end_date, '1 day') as day_date
FROM
semesters
) as day_series
ORDER BY
day_date
)
SELECT
S.id as student_id,
S.start_date,
SD_start.semester_id as start_semester_id,
S.n_weeks,
SD_end.day_date as end_date,
SD_end.semester_id as end_semester_id
FROM
students as S
JOIN
semester_days as SD_start
On SD_start.day_date = S.start_date
JOIN
semester_days as SD_end
On SD_end.day_number = SD_start.day_number + (7 * S.n_weeks)
ORDER BY
S.start_date