SQL Query to return data for the first of every month - sql

I want to find the number of staff in a department at the start of each month, for the last 12 months.
I can get the desired output using 12 separate queries and UNION ALL similar to below:
SELECT
o.DEP_ID
,COUNT(o.STAFF_ID) STAFF_COUNT
,TRUNC(SYSDATE,'MON') EFFECTIVE_DATE
FROM
OCCUPANCIES o
WHERE
o.START_DATE <= TRUNC(SYSDATE,'MON')
AND o.END_DATE >= TRUNC(SYSDATE,'MON')
GROUP BY
o.DEP_ID
,TRUNC(SYSDATE,'MON')
UNION ALL
SELECT
o.DEP_ID
,COUNT(o.STAFF_ID) STAFF_COUNT
,ADD_MONTHS(TRUNC(SYSDATE,'MON'),-1) EFFECTIVE_DATE
FROM
OCCUPANCIES o
WHERE
o.START_DATE <= ADD_MONTHS(TRUNC(SYSDATE,'MON'),-1)
AND o.END_DATE >= ADD_MONTHS(TRUNC(SYSDATE,'MON'),-1)
GROUP BY
o.DEP_ID
,ADD_MONTHS(TRUNC(SYSDATE,'MON'),-1)
This gives me output similar to the following:
Unfortunately my real query is very long, and editing it is becoming unwieldy to say the least because I am making the same changes in 12 places each time.
Is there a way of doing this in a single SELECT statement?
EDIT: I have uploaded an example to SQLFiddle

You can generate a list of effective dates and use it in your query
SELECT
o.DEP_ID
,COUNT(o.STAFF_ID) STAFF_COUNT
,dt.EFFECTIVE_DATE
FROM
OCCUPANCIES o,
(SELECT ADD_MONTHS(TRUNC(SYSDATE,'MON'), 1-LEVEL) EFFECTIVE_DATE
FROM dual
CONNECT BY LEVEL <=12) dt
WHERE
dt.EFFECTIVE_DATE BETWEEN o.START_DATE AND o.END_DATE
GROUP BY
o.DEP_ID
,dt.EFFECTIVE_DATE

Related

SQL - Query Year by Year

I have a table of employees' info, including their employment start and end date. I want to extract a list of employees who have been with the company for the full year, year by year, for the past ten years.
So for example, if I want to get a list of employees who've been with the company throughout 2010, I'll do a query like this:
SELECT employee_name FROM employees
WHERE employment_start_date < DATE '2010-01-01'
AND employment_end_date > DATE '2010-12-31'
Now, I could repeat this process manually 10 times for each year from 2010 to 2020 (and manually append the relevant year as an additional column), but surely there's an easier way to do this with a single SQL query?
More background info:
I'm actually trying to translate my Cypher query directly into an SQL query (because different companies uses different database system). Using Cypher, I'll be doing this:
WITH [2010,2011,2012,...,2019,2020] AS years
UNWIND years as y
MATCH (e:employees)
WHERE e.employment_start_date.year < y
AND e.employment_end_date.year > y
RETURN y, e.employee_name
So I'm trying to find an SQL equivalent for this
Sample table data:
|employee_name|employment_start_date|employment_end_date|
|:---:|:---:|:---:|
|John|2009-06-01|2015-03-02|
|Mary|2010-04-02|2014-03-07|
|Joseph|2011-03-02|2011-07-03|
|Stephen|2003-06-14|2011-03-07|
|Dew|2010-06-02|2012-02-06|
Desired Results:
|Year|employee_name|
|:---:|:---:|
|2010|John|
|2010|Stephen|
|2011|John|
|2011|Mary|
|2011|Dew|
You can use:
WITH years ( year ) AS (
SELECT DATE '2010-01-01' FROM DUAL
UNION ALL
SELECT ADD_MONTHS( year, 12 )
FROM years
WHERE year < DATE '2020-01-01'
)
SELECT y.year, e.employee_name
FROM employees e
INNER JOIN years y
ON ( e.employment_start_date <= y.year
AND e.employment_end_date >= ADD_MONTHS( y.year, 12 ) )
An alternative to MT0's suggestion:
WITH years (year) AS(
SELECT EXTRACT (YEAR FROM DATE '2010-01-01') + ROWNUM -1 AS "YEAR"
FROM dual
CONNECT BY ROWNUM <=10
)
SELECT y.year, e.employee_name
FROM employee e
INNER JOIN years y
ON (
EXTRACT(YEAR FROM employment_start_date) < y.year
AND EXTRACT(YEAR FROM employment_end_date) > y.year
)

How to fill value as zero when No data exists for particular week in oracle

I have a table with following structure.
Note_title varchar2(100)
Note_created_on date
Now in a report, I want to show all notes created week-wise, So I implemented the following solution for it.
SELECT to_char(Note_created_on - 7/24,'ww')||'/'||to_char(Note_created_on - 7/24,'yyyy') as Week ,
nvl(COUNT(Note_title),'0') as AMOUNT
FROM Notes
GROUP BY to_char(Note_created_on - 7/24,'ww') ,
to_char(Note_created_on -7/24,'yyyy')
ORDER BY to_char(Note_created_on - 7/24,'ww') DESC
And i am getting correct output from it, But suppose week 42,45 do not have any created Note then its just missing it.
Sample Output:
WEEK AMOUNT
46/2018 3
44/2018 22
43/2018 45
41/2018 1
40/2018 2
39/2018 27
38/2018 23
So How can I get zero values for week 42,45 instead of leaving them out?
First you would need to generate all the weeks between each year, after that would left join with the Notes tables on the weeks and group by the weeks generated. Eg:
with weeks
as ( select level as lvl /*Assume 52 weeks in a calendar year..*/
from dual
connect by level <=52
)
,weeks_year
as (select distinct
b.lvl||'/'||trunc(Note_created_on,'YYYY') as week_year_val /*From the start of year in Note_created_on*/
from Notes a
join weeks b
on 1=1
)
SELECT a.week_year_val as Week
,COUNT(Note_title) as AMOUNT
FROM weeks_year a
LEFT JOIN Notes b
ON a.week_year_val=to_char(b.Note_created_on - 7/24,'ww')||'/'||to_char(b.Note_created_on - 7/24,'yyyy')
GROUP BY a.week_year_val
ORDER BY a.week_year_val DESC
If you want to perform this for the current year, you may use the following SQL statement which uses such a RIGHT JOIN as below :
SELECT d.week as Week,
nvl(COUNT(Note_title), '0') as AMOUNT
FROM Notes
RIGHT JOIN
(SELECT lpad(level,2,'0')|| '/' ||to_char(sysdate,'yyyy') as week,
'0' as amount FROM dual CONNECT BY level <= 53) d
ON
( d.week =
to_char(Note_created_on - 7 / 24, 'ww') ||'/'||to_char(Note_created_on - 7 / 24, 'yyyy') )
GROUP BY d.week
ORDER BY d.week DESC;
P.S. There's a common belief that a year is composed of 52 weeks, true but truncated :). So, I used 53,
Notice that select to_char( date'2016-12-31' - 7 / 24, 'ww') from dual yields 53 as a sample.
Rextester Demo
As mentioned by jarlh:
Create a list of weeks:
SELECT TO_CHAR(LEVEL, 'FM00')||'/2018' wk
FROM dual
CONNECT BY LEVEL <= 53
This query generates 53 rows, and level is just a number.. 1.. 2.. upto 53. We format it to become 01/2018, 02/2018.. 53/2018
If you plan to use this query in other years, you'd be better off making the year dynamic:
SELECT TO_CHAR(LEVEL, 'FM00')||TO_CHAR(sysdate-7/24,'/YYYY') wk
FROM dual
CONNECT BY LEVEL <= 53
(Credits to Barbaros for pointing out that the last day of any year is reported by Oracle as being in week 53, or said another way 7*52 = 364)
We left join the notes data onto it. I wasn't really clear on why you subtracted 7 hours from the date (time zone?) but I left it. I removed the complexity of the count, as you seem to only want the count of records in a particular week. I also removed the double to_char, because you can do it all in a single operation. One doesn't need to TO_CHAR(date, 'WW')||'/'||TO_CHAR(date,'YYYY') etc.. you just tochar with WW/YYYY as a format. Our query now looks like:
SELECT lst.wk as week, COALESCE(amt, 0) as amount FROM
(
SELECT TO_CHAR(LEVEL, 'FM00')||TO_CHAR(sysdate-7/24,'/YYYY') wk
FROM dual
CONNECT BY LEVEL <= 52
) lst
LEFT OUTER JOIN
(
SELECT
to_char(Note_created_on - 7/24,'ww/yyyy') as wk,
COUNT(*) as amt
FROM Notes
GROUP BY to_char(Note_created_on - 7/24,'ww/yyyy')
) dat
ON lst.wk = dat.wk
ORDER BY lst.wk
For weeks where there are no note, the left join records a null against that week, so we coalesce it to make it 0.
You can, of course, do the query in other ways (many ways), here's a compare:
SELECT lst.wk as week, COUNT(dat.wk) as amount FROM
(
SELECT TO_CHAR(LEVEL, 'FM00')||TO_CHAR(sysdate-7/24,'/YYYY') wk
FROM dual
CONNECT BY LEVEL <= 52
) lst
LEFT OUTER JOIN
(
SELECT
to_char(Note_created_on - 7/24,'ww/yyyy') as wk
FROM Notes
) dat
ON lst.wk = dat.wk
GROUP BY lst.wk
ORDER BY lst.wk
In this form we do the groupby/count after the join. By counting the dat.wk, which for some lst.wk might be NULL, we can omit the coalesce, because count(null) is 0

Same output in two different lateral joins

I'm working on a bit of PostgreSQL to grab the first 10 and last 10 invoices of every month between certain dates. I am having unexpected output in the lateral joins. Firstly the limit is not working, and each of the array_agg aggregates is returning hundreds of rows instead of limiting to 10. Secondly, the aggregates appear to be the same, even though one is ordered ASC and the other DESC.
How can I retrieve only the first 10 and last 10 invoices of each month group?
SELECT first.invoice_month,
array_agg(first.id) first_ten,
array_agg(last.id) last_ten
FROM public.invoice i
JOIN LATERAL (
SELECT id, to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE id = i.id
ORDER BY invoice_date, id ASC
LIMIT 10
) first ON i.id = first.id
JOIN LATERAL (
SELECT id, to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE id = i.id
ORDER BY invoice_date, id DESC
LIMIT 10
) last on i.id = last.id
WHERE i.invoice_date BETWEEN date '2017-10-01' AND date '2018-09-30'
GROUP BY first.invoice_month, last.invoice_month;
This can be done with a recursive query that will generate the interval of months for who we need to find the first and last 10 invoices.
WITH RECURSIVE all_months AS (
SELECT date_trunc('month','2018-01-01'::TIMESTAMP) as c_date, date_trunc('month', '2018-05-11'::TIMESTAMP) as end_date, to_char('2018-01-01'::timestamp, 'YYYY-MM') as current_month
UNION
SELECT c_date + interval '1 month' as c_date,
end_date,
to_char(c_date + INTERVAL '1 month', 'YYYY-MM') as current_month
FROM all_months
WHERE c_date + INTERVAL '1 month' <= end_date
),
invocies_with_month as (
SELECT *, to_char(invoice_date::TIMESTAMP, 'YYYY-MM') invoice_month FROM invoice
)
SELECT current_month, array_agg(first_10.id), 'FIRST 10' as type FROM all_months
JOIN LATERAL (
SELECT * FROM invocies_with_month
WHERE all_months.current_month = invoice_month AND invoice_date >= '2018-01-01' AND invoice_date <= '2018-05-11'
ORDER BY invoice_date ASC limit 10
) first_10 ON TRUE
GROUP BY current_month
UNION
SELECT current_month, array_agg(last_10.id), 'LAST 10' as type FROM all_months
JOIN LATERAL (
SELECT * FROM invocies_with_month
WHERE all_months.current_month = invoice_month AND invoice_date >= '2018-01-01' AND invoice_date <= '2018-05-11'
ORDER BY invoice_date DESC limit 10
) last_10 ON TRUE
GROUP BY current_month;
In the code above, '2018-01-01' and '2018-05-11' represent the dates between we want to find the invoices. Based on those dates, we generate the months (2018-01, 2018-02, 2018-03, 2018-04, 2018-05) that we need to find the invoices for.
We store this data in all_months.
After we get the months, we do a lateral join in order to join the invoices for every month. We need 2 lateral joins in order to get the first and last 10 invoices.
Finally, the result is represented as:
current_month - the month
array_agg - ids of all selected invoices for that month
type - type of the selected invoices ('first 10' or 'last 10').
So in the current implementation, you will have 2 rows for each month (if there is at least 1 invoice for that month). You can easily join that in one row if you need to.
LIMIT is working fine. It's your query that's broken. JOIN is just 100% the wrong tool here; it doesn't even do anything close to what you need. By joining up to 10 rows with up to another 10 rows, you get up to 100 rows back. There's also no reason to self join just to combine filters.
Consider instead window queries. In particular, we have the dense_rank function, which can number every row in the result set according to groups:
SELECT
invoice_month,
time_of_month,
ARRAY_AGG(id) invoice_ids
FROM (
SELECT
id,
invoice_month,
-- Categorize as end or beginning of month
CASE
WHEN month_rank <= 10 THEN 'beginning'
WHEN month_reverse_rank <= 10 THEN 'end'
ELSE 'bug' -- Should never happen. Just a fall back in case of a bug.
END AS time_of_month
FROM (
SELECT
id,
invoice_month,
dense_rank() OVER (PARTITION BY invoice_month ORDER BY invoice_date) month_rank,
dense_rank() OVER (PARTITION BY invoice_month ORDER BY invoice_date DESC) month_rank_reverse
FROM (
SELECT
id,
invoice_date,
to_char(invoice_date, 'Mon-yy') AS invoice_month
FROM public.invoice
WHERE invoice_date BETWEEN date '2017-10-01' AND date '2018-09-30'
) AS fiscal_year_invoices
) ranked_invoices
-- Get first and last 10
WHERE month_rank <= 10 OR month_reverse_rank <= 10
) first_and_last_by_month
GROUP BY
invoice_month,
time_of_month
Don't be intimidated by the length. This query is actually very straightforward; it just needed a few subqueries.
This is what it does logically:
Fetch the rows for the fiscal year in question
Assign a "rank" to the row within its month, both counting from the beginning and from the end
Filter out everything that doesn't rank in the 10 top for its month (counting from either direction)
Adds an indicator as to whether it was at the beginning or end of the month. (Note that if there's less than 20 rows in a month, it will categorize more of them as "beginning".)
Aggregate the IDs together
This is the tool set designed for the job you're trying to do. If really needed, you can adjust this approach slightly to get them into the same row, but you have to aggregate before joining the results together and then join on the month; you can't join and then aggregate.

What is a better alternative to a "helper" table in an Oracle database?

Let's say I have an 'employees' table with employee start and end dates, like so:
employees
employee_id start_date end_date
53 '19901117' '99991231'
54 '19910208' '20010512'
55 '19910415' '20120130'
. . .
. . .
. . .
And let's say I want to get the monthly count of employees who were employed at the end of the month. So the resulting data set I'm after would look like:
month count of employees
'20150131' 120
'20150228' 118
'20150331' 122
. .
. .
. .
The best way I currently know how to do this is to create a "helper" table to join onto, such as:
helper_tbl
month
'20150131'
'20150228'
'20150331'
.
.
.
And then do a query like so:
SELECT t0b.month,
count(t0a.employee_id)
FROM employees t0a
JOIN helper_tbl t0b
ON t0b.month BETWEEN t0a.start_date AND t0a.end_date
GROUP BY t0b.month
However, this is somewhat annoying solution to me, because it means I'm having to create these little helper tables all the time and they clutter up my schema. I feel like other people must run into the same need for "helper" tables, but I'm guessing people have figured out a better way to go about this that isn't so manual. Or do you all really just keep creating "helper" tables like I do to get around these situations?
I understand this question is a bit open-ended up for stack overflow, so let me offer a more closed-ended version of the question which is, "Given just the 'employees' table, what would YOU do to get the resulting data set that I showed above?"
You can use a CTE to generate all the month values, either form a fixed starting point or based on the earliest date in your table:
with months (month) as (
select add_months(first_month, level - 1)
from (
select trunc(min(start_date), 'MM') as first_month from employees
)
connect by level <= ceil(months_between(sysdate, first_month))
)
select * from months;
With data that was an earliest start date of 1990-11-17 as in your example, that generates 333 rows:
MONTH
-------------------
1990-11-01 00:00:00
1990-12-01 00:00:00
1991-01-01 00:00:00
1991-02-01 00:00:00
1991-03-01 00:00:00
...
2018-06-01 00:00:00
2018-07-01 00:00:00
You can then use that in a query that joins to your table, something like:
with months (month) as (
select add_months(first_month, level - 1)
from (
select trunc(min(start_date), 'MM') as first_month from employees
)
connect by level <= ceil(months_between(sysdate, first_month))
)
select m.month, count(*) as employees
from months m
left join employees e
on e.start_date <= add_months(m.month, 1)
and (e.end_date is null or e.end_date >= add_months(m.month, 1))
group by m.month
order by m.month;
Presumably you wan to include people who are still employed, so you need to allow for the end date being null (unless you're using a magic end-date value for people who are still employed...)
With dates stored as string it's a bit more complicated but you can generate the month information in a similar way:
with months (month, start_date, end_date) as (
select add_months(first_month, level - 1),
to_char(add_months(first_month, level - 1), 'YYYYMMDD'),
to_char(last_day(add_months(first_month, level - 1)), 'YYYYMMDD')
from (
select trunc(min(to_date(start_date, 'YYYYMMDD')), 'MM') as first_month from employees
)
connect by level <= ceil(months_between(sysdate, first_month))
)
select m.month, m.start_date, m.end_date, count(*) as employees
from months m
left join employees e
on e.start_date <= m.end_date
and (e.end_date is null or e.end_date > m.end_date)
group by m.month, m.start_date, m.end_date
order by m.month;
Very lightly tested with a small amount of made-up data and both seem to work.
If you want to get the employees who were employed at the end of the month, then you can use the LAST_DAY function in the WHERE clause of the your query. Also, you can use that function in the GROUP BY clause of your query. So your query would be like below,
SELECT LAST_DAY(start_date), COUNT(1)
FROM employees
WHERE start_date = LAST_DAY(start_date)
GROUP BY LAST_DAY(start_date)
or if you just want to count employees employed per month then use below query,
SELECT LAST_DAY(start_date), COUNT(1)
FROM employees
GROUP BY LAST_DAY(start_date)

How to count records for each day in a range (including days without records)

I'm trying to refine this question a little since I didn't really ask correctly last time. I am essentially doing this query:
Select count(orders)
From Orders_Table
Where Order_Open_Date<=To_Date('##/##/####','MM/DD/YYYY')
and Order_Close_Date>=To_Date('##/##/####','MM/DD/YYYY')
Where ##/##/#### is the same day. In essence this query is designed to find the number of 'open' orders on any given day. The only problem is I'm wanting to do this for each day of a year or more. I think if I knew how to define the ##/##/#### as a variable and then grouped the count by that variable then I could get this to work but I'm not sure how to do that-or there may be another way as well. I am currently using Oracle SQL on SQL developer. Thanks for any input.
You could use a "row generator" technique like this (edited for Hogan's comments):
Select RG.Day,
count(orders)
From Orders_Table,
(SELECT trunc(SYSDATE) - ROWNUM as Day
FROM (SELECT 1 dummy FROM dual)
CONNECT BY LEVEL <= 365
) RG
Where RG.Day <=To_Date('##/##/####','MM/DD/YYYY')
and RG.Day >=To_Date('##/##/####','MM/DD/YYYY')
and Order_Open_Date(+) <= RG.Day
and Order_Close_Date(+) >= RG.Day - 1
Group by RG.Day
Order by RG.Day
This should list each day of the previous year with the corresponding number of orders
Lets say you had a table datelist with a column adate
aDate
1/1/2012
1/2/2012
1/3/2012
Now you join that to your table
Select *
From Orders_Table
join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
This gives you a list of all the orders you care about, now you group by and count
Select aDate, count(*)
From Orders_Table
join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
group by adate
If you want to pass in a parameters then just generate the dates with a recursive cte
with datelist as
(
select #startdate as adate
UNION ALL
select adate + 1
from datelist
where (adate + 1) <= #lastdate
)
Select aDate, count(*)
From Orders_Table
join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
group by adate
NOTE: I don't have an Oracle DB to test on so I might have some syntax wrong for this platform, but you get the idea.
NOTE2: If you want all dates listed with 0 for those that have nothing use this as your select statement:
Select aDate, count(Order_Open_Date)
From Orders_Table
left join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
group by adate
If you want only one day you can query using TRUNC like this
select count(orders)
From orders_table
where trunc(order_open_date) = to_date('14/05/2012','dd/mm/yyyy')