Find employee tenure for a company - sql

I have written the following query to get the employees tenure yearwise.
Ie. grouped by "less than 1 year", "1-2 years", "2-3 years" and "greater than 3 years".
To get this, I compare with employee staffed end_date.
But I am not able to get the correct result when comparing with staffed end_date.
I have pasted the complete code below, but the count I am getting is not correct.
Some employee who worked for more than 2 years is falling under <1 year column.
DECLARE #Project_Id Varchar(10)='ITS-004275';
With Cte_Dates(Period,End_date,Start_date,Project_Id)
As
(
SELECT '<1 Year' AS Period, GETDATE() AS End_Date,DATEADD(YY,-1,GETDATE()) AS Start_date,#Project_Id AS Project_Id
UNION
SELECT '1-2 Years', DATEADD(YY,-1,GETDATE()),DATEADD(YY,-2,GETDATE()),#Project_Id
UNION
SELECT '2-3 Years', DATEADD(YY,-2,GETDATE()),DATEADD(YY,-3,GETDATE()),#Project_Id
UNION
SELECT '>3 Years', DATEADD(YY,-3,GETDATE()),'',#Project_Id
),
--select * from Cte_Dates
--ORDER BY Start_date DESC
Cte_Staffing(PROJECT_ID,EMP_ID,END_DATE) AS
(
SELECT FK_Project_ID,EMP_ID,MAX(End_Date)AS END_DATE FROM DP_Project_Staffing
WHERE FK_Project_ID=#Project_Id
GROUP BY FK_Project_ID,Emp_ID
)
SELECT D.PROJECT_ID,D.Start_date,D.End_date,COUNT(S.EMP_ID) AS Count,D.Period
FROM Cte_Staffing S
RIGHT JOIN Cte_Dates D
ON D.Project_Id=S.PROJECT_ID
AND S.END_DATE<D.End_date AND S.END_DATE>D.Start_date
GROUP BY D.PROJECT_ID,D.Start_date,D.End_date,D.Period

i think this will solve the problem
as you can see, you should use is like this:
DATEADD(year, -1, GETDATE())
you should also get the GETDATE() to a parameter

I find your query logic a little bit messy. Why don't you just compute the total period for every employee and use CASE clause? I can help you with code if you'll give me DP_Project_Staffing table structure. Do you have begin_date field in it?

You are taking the MAX(End_date) of the CTE staffing table. In that case, when an employee has several entries, only the most recent will apply. You want to use MIN instead.
Like this:
Cte_Staffing(PROJECT_ID,EMP_ID,END_DATE) AS
(
SELECT FK_Project_ID, EMP_ID, MIN(End_Date)AS END_DATE
FROM DP_Project_Staffing
...
Re-reading your question, you probably don't want the staffing end_date for tenure calculation; you'd want to use the start_date. (Or whatever the column is called in DP_Project_Staffing)
I would also change the WHERE/JOIN clause to be inclusive on one of the sides, so you have either
AND S.END_DATE <= D.End_date AND S.END_DATE > D.Start_date
or
AND S.END_DATE < D.End_date AND S.END_DATE >= D.Start_date
Since you are using miliseconds in the date-comparison it won't make any difference in this case. However, should you change the granularity to be only the date, which would make more sense, you would lose all records where the employee started exactly 1 year, 2 years, etc. ago.

SELECT FK_Project_ID,E.Emp_ID,MIN(Start_Date) AS Emp_Start_Date ,MAX(End_Date) AS Emp_End_Date,
E.Competency,E.First_Name+' '+E.Last_Name+' ('+E.Emp_Id+')' as Name,'Period'=
CASE
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=12 THEN '<1 Year'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>12 AND DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=24 THEN '1-2 Years'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>24 AND DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=36 THEN '2-3 Years'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>36 THEN '>3 Years'
ELSE 'NA'
END
FROM DP_Project_Staffing PS
LEFT OUTER JOIN DP_Ext_Emp_Master E
ON E.Emp_Id=PS.Emp_ID
WHERE FK_Project_ID=#PROJ_ID
GROUP BY FK_Project_ID,E.Emp_ID,E.Competency,First_Name,Last_Name

Related

PostgreSQL - how to loop over a parameter to make union of different queries of the same table

I need to count the number of events still occurring in a given year, provided these events started before the given year and ended after it. So, the query I'm using now is just a pile of unparametrized queries and I'd like to write it out with a loop on a parameter:
select
count("starting_date" ) as "number_of_events" ,'2020' "year"
from public.table_of_events
where "starting_date" < '2020-01-01' and ("closing_date" > '2020-12-31')
union all
select
count("starting_date" ) as "number_of_events" ,'2019' "year"
from public.table_of_events
where "starting_date" < '2019-01-01' and ("closing_date" > '2019-12-31')
union all
select
count("starting_date" ) as "number_of_events" ,'2018' "year"
from public.table_of_events
where "starting_date" < '2018-01-01' and ("closing_date" > '2018-12-31')
...
...
...
and so on for N years
So, I have ONE table that must be filtered according to one parameter that is both part of the select statement and the where clause
The general aspect of the "atomic" query is then
select
count("starting_date" ) as "number_of_events" , **PARAMETER** "year"
from public.table_of_events
where "starting_date" < **PARAMETER** and ("closing_date" > **PARAMETER** )
union all
Can anyone help me put this in a more formal loop?
Thanks a lot, I hope I was clear enough.
You seem to want events that span entire years. Perhaps a simple way is to generate the years, then use join and aggregate:
select gs.yyyy, count(e.starting_date)
from public.table_of_events e left join
generate_series('2015-01-01'::date,
'2021-01-01'::date,
interval '1 year'
) as gs(yyyy)
on e.starting_date < gs.yyyy and
e.closing_date >= gs.yyyy + interval '1 year'
group by gs.yyyy;

PL-SQL query to calculate customers per period from start and stop dates

I have a PL-SQL table with a structure as shown in the example below:
I have customers (customer_number) with insurance cover start and stop dates (cover_start_date and cover_stop_date). I also have dates of accidents for those customers (accident_date). These customers may have more than one row in the table if they have had more than one accident. They may also have no accidents. And they may also have a blank entry for the cover stop date if their cover is ongoing. Sorry I did not design the data format, but I am stuck with it.
I am looking to calculate the number of accidents (num_accidents) and number of customers (num_customers) in a given time period (period_start), and from that the number of accidents-per-customer (which will be easy once I've got those two pieces of information).
Any ideas on how to design a PL-SQL function to do this in a simple way? Ideally with the time periods not being fixed to monthly (for example, weekly or fortnightly too)? Ideally I will end up with a table like this shown below:
Many thanks for any pointers...
You seem to need a list of dates. You can generate one in the query and then use correlated subqueries to calculate the columns you want:
select d.*,
(select count(distinct customer_id)
from t
where t.cover_start_date <= d.dte and
(t.cover_end_date > d.date + interval '1' month or t.cover_end_date is null)
) as num_customers,
(select count(*)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as accidents,
(select count(distinct customer_id)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as num_customers_with_accident
from (select date '2020-01-01' as dte from dual union all
select date '2020-02-01' as dte from dual union all
. . .
) d;
If you want to do arithmetic on the columns, you can use this as a subquery or CTE.

Best way to sum() values between two dates (where the second date is stored as a value in the next row) - postgres

I am a little bit stuck with an issue I'm having with a psql query. I know how I'd approach this using a loop etc, but have hit a wall in SQL as I'm no expert.
If we imagine that there are 3 terms per school year. Each child may recieve an allowance for lunches each month. I would like to SUM() the allowance for each month, from the start of the term up until the next term.
Where I am stuck is that the date of the next term is a value in the next row of data in the terms table.
I have something like this:
SELECT
terms."startDate",
COALESCE((
SELECT
SUM("lunch")
FROM
"allowance"
WHERE
TO_CHAR(terms."startDate", 'YYYY-MM') >= TO_CHAR("date", 'YYYY-MM')
AND
TO_CHAR(terms."startDate", 'YYYY-MM') < TO_CHAR(??? HELP ???, 'YYYY-MM')
),0) AS "lunchMoney"
FROM "schoolTerms" AS terms
...
Where I have put TO_CHAR(??? HELP ???, 'YYYY-MM') I would like to reference the start date of the childs next term. I have looked into using a LEAD() method but couldn't figure it out.
Any help would be greatly appreciated.
I think you want and join and aggregation:
select t.startdate, t.enddate, sum(a.lunch) as lunch_money
from schoolterms t
inner join allowance a on a.date >= t.startdate and a.date < t.enddate
group by t.startdate, t.enddate
This puts each allowance in the terms it belongs, and then aggregate by term. You might want a left join, if there may be terms without any allowance.
Your current query gives no clue about what a "child" is. Presumably, that should be a column in allowance, that you might want to put in the select and group by clauses.
If you want to compute the end_date as the "next" start_date, then use lead():
select t.startdate, t.enddate, sum(a.lunch) as lunch_money
from (
select start_date,
lead(startdate) over(order by startdate) enddate
from schoolterms
) t
inner join allowance a
on a.date >= t.startdate
and (a.date < t.enddate or t.enddate is null)
group by t.startdate, t.enddate

What is a better alternative to a "helper" table in an Oracle database?

Let's say I have an 'employees' table with employee start and end dates, like so:
employees
employee_id start_date end_date
53 '19901117' '99991231'
54 '19910208' '20010512'
55 '19910415' '20120130'
. . .
. . .
. . .
And let's say I want to get the monthly count of employees who were employed at the end of the month. So the resulting data set I'm after would look like:
month count of employees
'20150131' 120
'20150228' 118
'20150331' 122
. .
. .
. .
The best way I currently know how to do this is to create a "helper" table to join onto, such as:
helper_tbl
month
'20150131'
'20150228'
'20150331'
.
.
.
And then do a query like so:
SELECT t0b.month,
count(t0a.employee_id)
FROM employees t0a
JOIN helper_tbl t0b
ON t0b.month BETWEEN t0a.start_date AND t0a.end_date
GROUP BY t0b.month
However, this is somewhat annoying solution to me, because it means I'm having to create these little helper tables all the time and they clutter up my schema. I feel like other people must run into the same need for "helper" tables, but I'm guessing people have figured out a better way to go about this that isn't so manual. Or do you all really just keep creating "helper" tables like I do to get around these situations?
I understand this question is a bit open-ended up for stack overflow, so let me offer a more closed-ended version of the question which is, "Given just the 'employees' table, what would YOU do to get the resulting data set that I showed above?"
You can use a CTE to generate all the month values, either form a fixed starting point or based on the earliest date in your table:
with months (month) as (
select add_months(first_month, level - 1)
from (
select trunc(min(start_date), 'MM') as first_month from employees
)
connect by level <= ceil(months_between(sysdate, first_month))
)
select * from months;
With data that was an earliest start date of 1990-11-17 as in your example, that generates 333 rows:
MONTH
-------------------
1990-11-01 00:00:00
1990-12-01 00:00:00
1991-01-01 00:00:00
1991-02-01 00:00:00
1991-03-01 00:00:00
...
2018-06-01 00:00:00
2018-07-01 00:00:00
You can then use that in a query that joins to your table, something like:
with months (month) as (
select add_months(first_month, level - 1)
from (
select trunc(min(start_date), 'MM') as first_month from employees
)
connect by level <= ceil(months_between(sysdate, first_month))
)
select m.month, count(*) as employees
from months m
left join employees e
on e.start_date <= add_months(m.month, 1)
and (e.end_date is null or e.end_date >= add_months(m.month, 1))
group by m.month
order by m.month;
Presumably you wan to include people who are still employed, so you need to allow for the end date being null (unless you're using a magic end-date value for people who are still employed...)
With dates stored as string it's a bit more complicated but you can generate the month information in a similar way:
with months (month, start_date, end_date) as (
select add_months(first_month, level - 1),
to_char(add_months(first_month, level - 1), 'YYYYMMDD'),
to_char(last_day(add_months(first_month, level - 1)), 'YYYYMMDD')
from (
select trunc(min(to_date(start_date, 'YYYYMMDD')), 'MM') as first_month from employees
)
connect by level <= ceil(months_between(sysdate, first_month))
)
select m.month, m.start_date, m.end_date, count(*) as employees
from months m
left join employees e
on e.start_date <= m.end_date
and (e.end_date is null or e.end_date > m.end_date)
group by m.month, m.start_date, m.end_date
order by m.month;
Very lightly tested with a small amount of made-up data and both seem to work.
If you want to get the employees who were employed at the end of the month, then you can use the LAST_DAY function in the WHERE clause of the your query. Also, you can use that function in the GROUP BY clause of your query. So your query would be like below,
SELECT LAST_DAY(start_date), COUNT(1)
FROM employees
WHERE start_date = LAST_DAY(start_date)
GROUP BY LAST_DAY(start_date)
or if you just want to count employees employed per month then use below query,
SELECT LAST_DAY(start_date), COUNT(1)
FROM employees
GROUP BY LAST_DAY(start_date)

How to calculate the longest period in days that a company has gone without headcount change?

Given an employees table with the columns EmpID,FirstName,LastName,StartDate, and EndDate.
I want to use a query on Oracle to calculate the longest period in days that a company has gone without headcount change.
Here is my query:
select MAX(endDate-startDate)
from
(select endDate
from employees
where endDate is not null)
union all
(select startDate
from employees)
But I got an error:
ORA-00904:"STARTDATE":invalid identifier
How can I fix this error?
Is my query the correct answer to this question?
Thanks
You aren't returning the startDate in the sub-query. Add startDate to the inner query.
select MAX(endDate-startDate) from
(select startDate, endDate from employees where endDate is not null)
union all
(select startDate from employees)
EDIT:
You can also try this:
select MAX(endDate-startDate) from employees where endDate is not null
However, I don't think your query is what you're looking for as it only lists the longest term employee that no longer works at the company.
In a simplistic view, you would want to put together all the start-dates (when the headcount increases) and all the end-dates (when it decreases), combine them all, arrange them in increasing order, measure the differences between consecutive dates, and take the max.
"Put together" is a UNION ALL, and measure differences between "consecutive" dates can be done with the analytic function lag().
One complication: one employee may start exactly on the same date another is terminated, so the headcount doesn't change. More generally, on any given date there may be starts and ends, and you need to exclude the dates when there are an equal number of starts and ends. So the first part of the solution is more complicated: you need to group by date and compare the start and end counts.
Something like this may work (not tested!):
with d ( dt, flag ) as (
select start_date, 's' from employees union all
select end_date , 'e' from employees
),
prep ( int ) as
select dt - lag(dt) over (order by dt)
from d
group by dt
having count(case flag when 's' then 1 end) !=
count(case flag when 'e' then 1 end)
)
select max(int) as max_interval
from prep
;
Edit - Gordon has a good point in his solution: perhaps the longest period without a change in headcount is the current period (ending "now"). For this reason, one needs to add SYSDATE to the UNION ALL, like he did. It can be added with either flag (for example 's' to be specific).
I think the answer to your question is something like this:
select max(span)
from (select (lead(dte) over (order by dte) - dte) as span
from (select startDate as dte from employees union all
select endDate as dte from employees union all
select trunc(sysdate) from dual
) d
) d;
A head-count change (presumably) occurs when an employee starts or stops. Hence, you want the largest interval between two such adjacent dates.