PL/SQL: How to sum a set of values if they fall within a specific time frame? - sql

I have a query (below) that shows the number of terminations since 1/1/17 in one column and the associated date of the terminations in the only other column. If there were no terminations on a specific date, then there is no record for that date.
I want to create rolling 12-month time buckets and sum the number of terminations in those time buckets.
For example, the most recent time bucket would have an ending date of 11:59pm on 6/30/22. The start of that time bucket would start midnight on 7/1/21. I want to sum the number of terminations in that time bucket.
I need to create 12-month time buckets and the associated number of terminations for the last 60 months, resulting in 60 time buckets.
Here is my current query:
select
count(distinct employee_number) Number_of_terminations
, to_char(term_date, 'MM/DD/YYYY') term_date
from
(
select paa.person_id
,max(paa.effective_end_date)+1 term_date
,pap.employee_number
from
apps.per_all_assignments_f paa
, apps.per_assignment_status_types past
,(select distinct paa.person_id
from
apps.per_all_assignments_f paa
, apps.per_assignment_status_types past
where paa.assignment_status_type_id = past.assignment_status_type_id
and sysdate between paa.effective_start_date and paa.effective_end_date
and past.user_status in ('Active Assignment','Transitional - Active','Transitional - Inactive','Sabbatical','Sabbatical 50%')) active_person
, apps.per_all_people_f pap
, apps.hr_organization_units org
,(select case when orgp.name = 'Random University' then orgc.attribute1 else orgp.attribute1 end unit_number
,case when orgp.name = 'Random State University' then orgc.name else orgp.name end unit_name
,orgc.attribute1 dept_number
,orgc.name dept_name
from apps.per_org_structure_elements_v2 pose
,apps.per_org_structure_versions posv
,apps.hr_all_organization_units orgp
,apps.hr_all_organization_units orgc
where pose.org_structure_version_id = posv.org_structure_version_id
and pose.organization_id_parent = orgp.organization_id
and pose.organization_id_child = orgc.organization_id
and trunc(sysdate) between posv.date_from and nvl(posv.date_to,'31-dec-4712')
and pose.org_structure_hierarchy = 'Units'
order by case when orgp.name = 'Colorado State University' then orgc.attribute1 else orgp.attribute1 end
,orgc.attribute1) u
, apps.per_jobs pj
, apps.per_job_definitions pjd
where paa.assignment_status_type_id = past.assignment_status_type_id
and paa.person_id = active_person.person_id(+)
and active_person.person_id is null
and past.user_status in ('Active Assignment','Transitional - Active','Transitional - Inactive','Sabbatical','Sabbatical 50%')
and pap.person_id = paa.person_id
and paa.organization_id = org.organization_id
and org.attribute1 = u.dept_number(+)
and paa.job_id = pj.job_id
and pj.job_definition_id = pjd.job_definition_id
and pap.employee_number is not null
and (
paa.effective_end_date like '%17' or
paa.effective_end_date like '%18' or
paa.effective_end_date like '%19' or
paa.effective_end_date like '%20' or
paa.effective_end_date like '%21' or
paa.effective_end_date like '%22'
)
group by paa.person_id
, pap.employee_number
) terms
--group by substr(term_date, 4, 6)
group by to_char(term_date, 'MM/DD/YYYY')
Here are the first rows of the results:
enter image description here
In Excel the first sum would like be calculated like this: Excel example

I don't have your data and I don't want to spend time generating some test data to match that monster query but here is a simplified example explaining how to do this:
Create a calendar table: 1 record per bucket (monthly) with start and end date.
CREATE TABLE last_60_months (start_dt, end_dt)
AS
(SELECT TRUNC(ADD_MONTHS(SYSDATE,-LEVEL+1), 'MON'), TRUNC(ADD_MONTHS(SYSDATE,-LEVEL+13), 'MON') - 1 FROM DUAL
CONNECT BY LEVEL < 61
);
Create a test table with 10000 employees and a termination date within the test buckets boundaries:
CREATE table test_emps (employee_number NUMBER, term_date DATE);
DECLARE
l_dt DATE;
l_min_dt DATE;
l_max_dt DATE;
BEGIN
SELECT MIN(start_dt), MAX(start_dt) INTO l_min_dt, l_max_dt FROM last_60_months;
FOR r IN 1 .. 10000 LOOP
SELECT TO_DATE(
TRUNC(
DBMS_RANDOM.VALUE(TO_CHAR(l_min_dt,'J')
,TO_CHAR(l_max_dt,'J')
)
),'J'
)
INTO l_dt
FROM DUAL;
INSERT INTO test_emps (employee_number, term_date) VALUES (r, l_dt );
END LOOP;
COMMIT;
END;
/
Put it all together:
SELECT COUNT(e.employee_number) as "Number_of_terminations", d.start_dt, d.end_dt
FROM test_emps e JOIN last_60_months d ON e.term_date BETWEEN d.start_dt AND d.end_dt
GROUP BY start_dt, end_dt
ORDER BY start_dt;
It should be trivial to use this technique for your own data.

Related

SQL Rowwise comparison between groups

Question
The following is a snippet of my data:
Create Table Emps(person VARCHAR(50), started DATE, stopped DATE);
Insert Into Emps Values
('p1','2015-10-10','2016-10-10'),
('p1','2016-10-11','2017-10-11'),
('p1','2017-10-12','2018-10-13'),
('p2','2019-11-13','2019-11-13'),
('p2','2019-11-14','2020-10-14'),
('p3','2020-07-15','2021-08-15'),
('p3','2021-08-16','2022-08-16');
db<>fiddle.
I want to use T-SQL to get a count of how many persons fulfil the following criteria at least once - multiples should also count as one:
For a person:
One of the dates in 'started' (say s1) is larger than at least one of the dates in 'ended' (say e1)
s1 and e1 are in the same year, to be set manually - e.g. '2021-01-01' until '2022-01-01'
Example expected response
If I put the date range '2016-01-01' until '2017-01-01' somewhere in a WHERE / HAVING clause, the output should be 1 as only p1 has both a start date and an end date that fall in 2016 where the start date is larger than the end date:
s1 = '2016-10-11', and e1 = '2016-10-10'.
Why can't I do this myself
The reason I'm stuck is that I don't know how to do this rowwise comparison between groups. The question requires comparing values across columns (start with end) across rows, within a person ID.
Use conditional aggregation to get the maximum start date and the minimum stop date in the given range.
select person
from emps
group by person
having max(case when started >= '2016-01-01' and started < '2017-01-01'
then started end) >
min(case when stopped >= '2016-01-01' and stopped < '2017-01-01'
then stopped end);
Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=45adb153fcac9ce72708f1283cac7833
I would choose to use a self-outer-join with an exists correlation, it should be pretty much the most performant, all things being equal.
select Count(*)
from emps e
where exists (
select * from emps e2
where e2.person = e.person
and e2.stopped > e.started
and e.started between '20160101' and '20170101'
and e2.started between '20160101' and '20170101'
);
You said you plan to set the dates manually, so this works where we set the start date in one CTE, and the end date in another CTE. Then we calculate the min/max for each, and use that criteria in the query where statement.
with min_max_start as (
select person,
min(started) as min_start, --obsolete
max(started) as max_start
from emps
where started >= '2016-01-01'
group by person
),
min_max_end as (
select person,
min(stopped) as min_stop,
max(stopped) as max_stop --obsolete
from emps
where stopped < '2017-01-01'
group by person
)
select count(distinct e.person)
from emps e
join min_max_start mms
on e.person = mms.person
join min_max_end mme
on e.person = mme.person
where mms.max_start> mme.min_stop
Output: 1
Try the following:
With CTE as
(
Select D.person, D.started, T.stopped,
case
when Year(D.started) = Year(T.stopped) and D.started > T.stopped
then 1
else 0
end as chk
From
(Select person, started From Emps Where started >= '2016-01-01') D
Join
(Select person, stopped From Emps Where stopped <= '2017-01-01') T
On D.person = T.person
)
Select Count(Distinct person) as CNT
From CTE
Where chk = 1;
To get the employee list who met the criteria use the following on the CTE instead of the above Select Count... query:
Select person, started, stopped
From CTE
Where chk = 1;
See a demo from db<>fiddle.

SQLite query to find datetime difference between multiple rows

Here are my two tables' structures in SQLite
CREATE TABLE user
(
id integer PRIMARY KEY,
name TEXT
);
CREATE TABLE attendanceTable
(
id Integer,
mydate datetime,
startJob boolean
);
if startJob is 1 it implies that the employee is starting the job and if startJob is 0 it means employee is stopping the job.
attendanceTable is sorted by mydate column
I want output as worked hour by individual employees.
Input of query can be two different dates e.g. 2021-08-20 and 2021-08-22
From which I want to know "How much each person has worked?"
Output should be:
[id, name, userWorkedTime]
[1, Alice, 09:00]
[2, Bob, 07:00]
12:00 to 16:00 + 22:00 to 03:00 = 9 hours
13:00 to 17:00 + 12:00 to 15:00 = 7 hours
Input of query 2021-08-20 and 2021-08-21 - output should be:
[id, name, userWorkedTime]
[1, Alice, 09:00]
[2, Bob, 04:00]
12:00 to 16:00 + 22:00 to 03:00 = 9 hours
13:00 to 17:00 = 4 hours
It may possible that Alice starts her job at 11 PM and end her job at 3 AM on next day[So working hour would be 4 hours]
I believe that the following will accomplish the results you desire:-
WITH
/* The date selection parameters - change as necessary */
cte_selection(selection_start,selection_end) AS (SELECT '2020-08-20','2020-08-22'),
/* Extract data per shift - aka combine start and end
note that extract is 1 day befor and 1 day after actual selection criteria
as previous/subsequent days may be relevant
*/
cte_part1(userid,name,periodstart,periodend,duration) AS
(
SELECT
user.id,
name,
strftime('%s',mydate),
strftime('%s',
(
SELECT mydate
FROM attendancetable
WHERE id = at.id
AND NOT startjob
AND mydate > at.mydate
ORDER BY mydate ASC
LIMIT 1
)
) AS endjob,
(strftime('%s',
(
SELECT mydate
FROM attendancetable
WHERE id = at.id
AND NOT startjob
AND mydate > at.mydate
ORDER BY mydate ASC
LIMIT 1
)
) - strftime('%s',at.mydate)) AS duration
FROM attendancetable AS at
JOIN user ON at.id = user.id
WHERE startjob
AND mydate
BETWEEN date
(
(SELECT selection_start FROM cte_selection)
,'-1 day'
)
AND date
(
(SELECT selection_end FROM cte_selection)
,'+1 day'
)
),
/* split times if period crosses a day*/
cte_part2(userid,name,periodstart,startdate,periodend,enddate,duration,startday_duration,nextday_duration) AS
(
SELECT
userid,
name,
periodstart,
date(periodstart,'unixepoch') AS startdate,
periodend,
date(periodend,'unixepoch') AS enddate,
duration,
CASE
WHEN date(periodstart,'unixepoch') = date(periodend,'unixepoch') THEN duration
ELSE strftime('%s',date(periodstart,'unixepoch')||'24:00:00') - periodstart
END AS startday_duration,
CASE
WHEN date(periodstart,'unixepoch') = date(periodend,'unixepoch') THEN 0
ELSE periodend - strftime('%s',date(periodend,'unixepoch')||'00:00:00')
END AS nextday_duration
FROM cte_part1
),
/* generate new rows for following days */
cte_part3(userid,name,periodstart,startdate,periodend,enddate,duration,startday_duration,nextday_duration) AS
(
SELECT
userid,
name,
strftime('%s',date(periodend,'unixepoch')||'00:00:00'),
date(periodend,'unixepoch'),
periodend,
enddate,
nextday_duration,
nextday_duration,
0
FROM cte_part2
WHERE nextday_duration
),
/* combine both sets */
cte_part4 AS (SELECT * FROM cte_part2 UNION ALL SELECT * FROM cte_part3)
/* Group the final data */
SELECT *,time(sum(startday_duration),'unixepoch') AS time_worked
FROM cte_part4
WHERE startdate BETWEEN (SELECT selection_start FROM cte_selection) AND (SELECT selection_end FROM cte_selection) GROUP BY userid
;
e.g. :-
and :-
Note All results with the exception of the time_worked are arbitrary values from the underlying data. However, userid and name will be correct as they would be the same for each group. The other values will be a value from the group.
you can easily apply changes to the final query to include or exclude columns.
The full testing SQL being :-
DROP TABLE IF EXISTS user;
CREATE TABLE IF NOT EXISTS user (id integer PRIMARY KEY,name TEXT);
DROP TABLE IF EXISTS attendanceTable ;
CREATE TABLE attendanceTable(id Integer,mydate datetime,startJob boolean);
INSERT INTO user VALUES (1,'Alice'),(2,'Bob');
INSERT INTO attendanceTable VALUES
(1,'2020-08-20 12:00:00',1),
(2,'2020-08-20 13:00:00',1),
(1,'2020-08-20 16:00:00',0),
(2,'2020-08-20 17:00:00',0),
(1,'2020-08-20 22:00:00',1),
(1,'2020-08-21 03:00:00',0),
(2,'2020-08-22 12:00:00',1),
(2,'2020-08-22 15:00:00',0)
;
WITH
/* The date selection parameters - change as necessary */
cte_selection(selection_start,selection_end) AS (SELECT '2020-08-20','2020-08-22'),
/* Extract data per shift - aka combine start and end
note that extract is 1 day befor and 1 day after actual selection criteria
as previous/subsequent days may be relevant
*/
cte_part1(userid,name,periodstart,periodend,duration) AS
(
SELECT
user.id,
name,
strftime('%s',mydate),
strftime('%s',
(
SELECT mydate
FROM attendancetable
WHERE id = at.id
AND NOT startjob
AND mydate > at.mydate
ORDER BY mydate ASC
LIMIT 1
)
) AS endjob,
(strftime('%s',
(
SELECT mydate
FROM attendancetable
WHERE id = at.id
AND NOT startjob
AND mydate > at.mydate
ORDER BY mydate ASC
LIMIT 1
)
) - strftime('%s',at.mydate)) AS duration
FROM attendancetable AS at
JOIN user ON at.id = user.id
WHERE startjob
AND mydate
BETWEEN date
(
(SELECT selection_start FROM cte_selection)
,'-1 day'
)
AND date
(
(SELECT selection_end FROM cte_selection)
,'+1 day'
)
),
/* split times if period crosses a day*/
cte_part2(userid,name,periodstart,startdate,periodend,enddate,duration,startday_duration,nextday_duration) AS
(
SELECT
userid,
name,
periodstart,
date(periodstart,'unixepoch') AS startdate,
periodend,
date(periodend,'unixepoch') AS enddate,
duration,
CASE
WHEN date(periodstart,'unixepoch') = date(periodend,'unixepoch') THEN duration
ELSE strftime('%s',date(periodstart,'unixepoch')||'24:00:00') - periodstart
END AS startday_duration,
CASE
WHEN date(periodstart,'unixepoch') = date(periodend,'unixepoch') THEN 0
ELSE periodend - strftime('%s',date(periodend,'unixepoch')||'00:00:00')
END AS nextday_duration
FROM cte_part1
),
/* generate new rows for following days */
cte_part3(userid,name,periodstart,startdate,periodend,enddate,duration,startday_duration,nextday_duration) AS
(
SELECT
userid,
name,
strftime('%s',date(periodend,'unixepoch')||'00:00:00'),
date(periodend,'unixepoch'),
periodend,
enddate,
nextday_duration,
nextday_duration,
0
FROM cte_part2
WHERE nextday_duration
),
/* combine both sets */
cte_part4 AS (SELECT * FROM cte_part2 UNION ALL SELECT * FROM cte_part3)
/* Group the final data */
SELECT *,time(sum(startday_duration),'unixepoch') AS time_worked
FROM cte_part4
WHERE startdate BETWEEN (SELECT selection_start FROM cte_selection) AND (SELECT selection_end FROM cte_selection) GROUP BY userid
;
DROP TABLE IF EXISTS user;
DROP TABLE IF EXISTS attendanceTable ;

SQL Case Statements with Multiple Max Conditions

I currently am working with two conditions that I would like to combine into one, but ran into some trouble. I have a dataset that includes quantity and date. I have created a date flag in the form of a case statement that flags whether it is the last day of the week, and gives it a "Y" or "N". The end result that I need is what that last DATE of the week.
My end result/goal is Column D
Here is my current source code:
select
pos.quantity_on_hand,
d.cal_date,
case
when date_key in( Select max(date_key) from edw.D_dates group by fiscal_year_nbr, fiscal_week_nbr)
then 'Y'
else 'N'
end Week_end_flag
from
edw.f_pos_daily pos,
edw.d_dates d,
where
pos.pos_date_key = d.date_key
I then create another custom column in PowerBI Desktop that looks like this:
This is what I used for my column calculation:
Last Inventory Date = RETURN(CALCULATE(MAXX(Inventory, Inventory[Cal_date]), filter ('D_Dates', 'D_Dates'[Week_end_flag]="Y")).
I tried to combine them into one, with something like this, but have failed:
case
when date_key in( Select max(date_key) from edw.D_dates group by fiscal_year_nbr, fiscal_week_nbr)
then MAX (cal_date) from edw.D_Dates where cal_date< current_date AS 'yyyy-mm-dd'
else 'N'
end Week_End_flag
Use the select command inside the then clause
change this line:
then MAX (cal_date) from edw.D_Dates where cal_date< current_date AS 'yyyy-mm-dd'
to:
then (SELECT MAX (cal_date) from edw.D_Dates where cal_date< current_date AS 'yyyy-mm-dd')
complete code:
select
pos.quantity_on_hand,
d.cal_date,
case
when date_key in( Select max(date_key) from edw.D_dates group by fiscal_year_nbr, fiscal_week_nbr)
then (SELECT MAX (cal_date) from edw.D_Dates where cal_date< current_date AS 'yyyy-mm-dd')
else 'N'
end Week_end_flag
from
edw.f_pos_daily pos,
edw.d_dates d,
where
pos.pos_date_key = d.date_key

How to solve a nested aggregate function in SQL?

I'm trying to use a nested aggregate function. I know that SQL does not support it, but I really need to do something like the below query. Basically, I want to count the number of users for each day. But I want to only count the users that haven't completed an order within a 15 days window (relative to a specific day) and that have completed any order within a 30 days window (relative to a specific day). I already know that it is not possible to solve this problem using a regular subquery (it does not allow to change subquery values for each date). The "id" and the "state" attributes are related to the orders. Also, I'm using Fivetran with Snowflake.
SELECT
db.created_at::date as Date,
count(case when
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-15,Date) and dateadd(day,-1,Date)) then db.id end)
= 0) and
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-30,Date) and dateadd(day,-16,Date)) then db.id end)
> 0) then db.user end)
FROM
data_base as db
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
In other words, I want to transform the below query in a way that the "current_date" changes for each date.
WITH completed_15_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-15,current_date) and dateadd(day,-1,current_date)
group by User
),
completed_16_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-30,current_date) and dateadd(day,-16,current_date)
group by User
)
SELECT
date(db.created_at) as Date,
count(distinct case when comp_15.completadas = 0 and comp_16.completadas > 0 then comp_15.user end) as "Total Users Churn",
count(distinct case when comp_15.completadas > 0 then comp_15.user end) as "Total Users Active",
week(Date) as Week
FROM
data_base as db
left join completadas_15_days_before as comp_15 on comp_15.user = db.user
left join completadas_16_days_before as comp_16 on comp_16.user = db.user
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
Does anyone have a clue on how to solve this puzzle? Thank you very much!
The following should give you roughly what you want - difficult to test without sample data but should be a good enough starting point for you to then amend it to give you exactly what you want.
I've commented to the code to hopefully explain what each section is doing.
-- set parameter for the first date you want to generate the resultset for
set start_date = TO_DATE('2020-01-01','YYYY-MM-DD');
-- calculate the number of days between the start_date and the current date
set num_days = (Select datediff(day, $start_date , current_date()+1));
--generate a list of all the dates from the start date to the current date
-- i.e. every date that needs to appear in the resultset
WITH date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_item
from table (generator(rowcount => ($num_days)))
)
--Create a list of all the orders that are in scope
-- i.e. 30 days before the start_date up to the current date
-- amend WHERE clause to in/exclude records as appropriate
,order_list as (
SELECT created_at, rt_id
from data_base
where created_at between dateadd(day,-30,$start_date) and current_date()
and state = 'finished'
)
SELECT dl.date_item
,COUNT (DISTINCT ol30.RT_ID) AS USER_COUNT
,COUNT (ol30.RT_ID) as ORDER_COUNT
FROM date_list dl
-- get all orders between -30 and -16 days of each date in date_list
left outer join order_list ol30 on ol30.created_at between dateadd(day,-30,dl.date_item) and dateadd(day,-16,dl.date_item)
-- exclude records that have the same RT_ID as in the ol30 dataset but have a date between 0 amd -15 of the date in date_list
WHERE NOT EXISTS (SELECT ol15.RT_ID
FROM order_list ol15
WHERE ol30.RT_ID = ol15.RT_ID
AND ol15.created_at between dateadd(day,-15,dl.date_item) and dl.date_item)
GROUP BY dl.date_item
ORDER BY dl.date_item;

List all months with a total regardless of null

I have a very small SQL table that lists courses attended and the date of attendance. I can use the code below to count the attendees for each month
select to_char(DATE_ATTENDED,'YYYY/MM'),
COUNT (*)
FROM TRAINING_COURSE_ATTENDED
WHERE COURSE_ATTENDED = 'Fire Safety'
GROUP BY to_char(DATE_ATTENDED,'YYYY/MM')
ORDER BY to_char(DATE_ATTENDED,'YYYY/MM')
This returns a list as expected for each month that has attendees. However I would like to list it as
January 2
February 0
March 5
How do I show the count results along with the nulls? My table is very basic
1234 01-JAN-15 Fire Safety
108 01-JAN-15 Fire Safety
1443 02-DEC-15 Healthcare
1388 03-FEB-15 Emergency
1355 06-MAR-15 Fire Safety
1322 09-SEP-15 Fire Safety
1234 11-DEC-15 Fire Safety
I just need to display each month and the total attendees for Fire Safety only. Not used SQL developer for a while so any help appreciated.
You would need a calendar table to select a period you want to display. Simplified code would look like this:
select to_char(c.Date_dt,'YYYY/MM')
, COUNT (*)
FROM calendar as c
left join TRAINING_COURSE_ATTENDED as tca
on tca.DATE_ATTENDED = c.Date_dt
WHERE tca.COURSE_ATTENDED = 'Fire Safety'
and c.Date_dt between [period_start_dt] and [period_end_dt]
GROUP BY to_char(c.Date_dt,'YYYY/MM')
ORDER BY to_char(c.Date_dt,'YYYY/MM')
You can create your own set required year month's on-fly with 0 count and use query as below.
Select yrmth,sum(counter) from
(
select to_char(date_attended,'YYYYMM') yrmth,
COUNT (1) counter
From TRAINING_COURSE_ATTENDED Where COURSE_ATTENDED = 'Fire Safety'
Group By Y to_char(date_attended,'YYYYMM')
Union All
Select To_Char(2015||Lpad(Rownum,2,0)),0 from Dual Connect By Rownum <= 12
)
group by yrmth
order by 1
If you want to show multiple year's, just change the 2nd query to
Select To_Char(Year||Lpad(Month,2,0)) , 0
From
(select Rownum Month from Dual Connect By Rownum <= 12),
(select 2015+Rownum-1 Year from Dual Connect By Rownum <= 3)
Try this :
SELECT Trunc(date_attended, 'MM') Month,
Sum(CASE
WHEN course_attended = 'Fire Safety' THEN 1
ELSE 0
END) Fire_Safety
FROM training_course_attended
GROUP BY Trunc(date_attended, 'MM')
ORDER BY Trunc(date_attended, 'MM')
Another way to generate a calendar table inline:
with calendar (month_start, month_end) as
( select add_months(date '2014-12-01', rownum)
, add_months(date '2014-12-01', rownum +1) - interval '1' second
from dual
connect by rownum <= 12 )
select to_char(c.month_start,'YYYY/MM') as course_month
, count(tca.course_attended) as attended
from calendar c
left join training_course_attended tca
on tca.date_attended between c.month_start and c.month_end
and tca.course_attended = 'Fire Safety'
group by to_char(c.month_start,'YYYY/MM')
order by 1;
(You could also have only the month start in the calendar table, and join on trunc(tca.date_attended,'MONTH') = c.month_start, though if you had indexes or partitioning on tca.date_attended that might be less efficient.)