Retrieving active employees by month in Postgres - sql

In my employee database, I have a column hire_date which has the hiring date of an employee, and deactivate_date which has the date on which the employee was dismissed (or null if employee is still active).
To find the number of active employees at the beginning of any month, I can run the following query. For example, to see my active employees on 1st January 2019 -
SELECT count(*)
FROM employees
WHERE hire_date <= '2019-01-01' AND
(deactivate_date IS NULL OR deactivate_date > '2019-01-01')
Now, what I would like to know is the number of active employees on the 1st of every month of 2018. I can obviously run this query 12 times, but would like to know if there is a more efficient solution possible. It seems like the CROSSTAB and generate_series functions of pg will be useful, but I haven't been able to form the proper query.

Use generate_series():
SELECT gs.dte, count(e.hire_date)
FROM generate_series('2018-01-01'::date, '2018-12-01'::date, interval '1 month') gs(dte) LEFT JOIN
employees e
ON e.hire_date <= gs.dte AND
(e.deactivate_date IS NULL OR e.deactivate_date > gs.dte)
GROUP BY gs.dte
ORDER BY gs.dte;

Related

Data value on a given date

This time I have a table on a PostgreSQL database that contains the employee name, the date that he started working and the date that he leaves the company, in the cases of the employee still remains in the company, this field has null value.
Knowing this, I would like to know how many people was working on a predetermined date, ex:
I would like to know how many people works on the company in January 2021.
I don't know where to start, in some attempts I got the number of hires and layoffs per month, but I need to show this accumulated value per month, in another column.
I hope I made myself understood, I'll leave the last SQL I got here.
select reference, sum(hires) from
(
select
date_trunc('month', date_hires) as reference,
count(*) as hires
from
ponto_mais_relatorio_colaboradores
group by
date_hires
union all
select
date_trunc('month', date_layoff) as reference,
count(*)*-1 as layoffs
from
ponto_mais_relatorio_colaboradores
group by
date_layoff
) as reference
join calendar_aux on calendar_aux.ano_mes = reference
group by reference
order by reference
Break the requirement down. The question: how many are employed on any given date? That would include all hired before that date and do not have a layoff date plus all hired before with a layoff date later then the date your interested period. I.e you are interested in Jan so you still want to count an employee with a layoff date in Feb. With that in place convert into SQL. The preceding is available from select comparing dates. other issue is that Jan is not a date, it is a range of dates, so you need each date. You can use generate series to create each day in Jan. Then Join the generated dates with and selection from your table. Resulting query:
with jan_dates( jdate ) as
( select generate_series( date '2021-01-01'
, date '2021-01-31'
, interval '1' day
)::date
)
select jdate "Date", count(*) "Employees"
from jan_dates j
join employees e
on ( e.date_hires <= j.jdate
and ( e.date_layoff is null
or e.date_layoff > j.jdate
)
)
group by j.jdate
order by j.jdate;
Note: Not tested.

SQL Server one-to-many relational IF query

I have an employees table that has many employee_records. The employee_records table has a column named event_type and it is an enum that can be either hire-date, promotion, termination, title-change, or rehire. I am attempting to calculate an employees total time employed and make some calculations based on how long they have been employed.
How can I add a column that gives me the total days that they have been employed?
Essentially, I need to see if they have a record with the event_type = termination and if they do, then I need to see if they have a rehire date, and if they do, then I need to use their rehire date as the first day of their employment and calculate their time of employment that way.
As for a result, I simply need a days_employed column that reflects the actual amount of days they have been employed.
Here is what I have so far.
SELECT
employees.id,
first_name,
last_name,
email,
event_type,
CASE
WHEN DATEDIFF(DAY, employee_records.created_at, SYSDATETIME()) < 365 * 5
THEN 1
WHEN DATEDIFF(DAY, employee_records.created_at, SYSDATETIME()) < 365 * 10
THEN 2
ELSE 3
END AS benfits_type,
DATEDIFF(DAY, employee_records.created_at, SYSDATETIME()) AS days_employed,
employee_records.created_at AS hire_date
FROM
employees
JOIN
employee_records ON employees.id = employee_records.employee_id
ORDER BY
employees.id ASC;
Here is an example of how you could do this. I'll post the query first and then walk through my explanation. If I understood you correctly, you were not looking for total days the employee was hired, but rather the total days of the employee's most recent employment at the company (Max hire date to max termination date or today).
;WITH hired
AS (SELECT employee_records.employee_id id,
Max(employee_records.created_at) created_at
FROM employee_records
WHERE event_type = 'hire-date'
GROUP BY employee_records.employee_id),
latest
AS (SELECT employees.id
id,
Isnull(Cast(Max(employee_records.created_at) AS DATE), Getdate()
)
created_at
FROM employees
LEFT JOIN employee_records
ON employees.id = employee_records.employee_id
AND employee_records.event_type = 'termination'
GROUP BY employees.id)
SELECT *,
Datediff(day, hired, latestday) DaysEmployeed
FROM (SELECT hired.id,
hired.created_at AS Hired,
CASE
WHEN hired.created_at > latest.created_at THEN Cast(
Getdate() AS DATE)
ELSE latest.created_at
END AS LatestDay
FROM hired
INNER JOIN latest
ON hired.id = latest.id) JoinedCTEs
First of all I know you mentioned event_type is an integer, but for easy explanation I used a varchar.
Two CTEs to start.
First there is "hired" which will get you the latest hire date. So if an employee has multiple hire dates, it grabs the latest date.
Second there is "latest" which is the latest date an employee has a termination date, but also uses today's date as a placeholder date if an employee has never been terminated.
The final query joins the two CTEs and does a datediff by day to determine how many days an employee has been at the company. If the termination date is earlier than the hire date (An employee who was hired, terminated, rehired and is still with the company), it will take today's date as the latest date to count.

I want to list down those months in 1990 when more than 2 employees were born

Following query executes successfully but it is not giving result
SELECT TO_CHAR(DATE_OF_BIRTH,'fm MONTH'), COUNT(DATE_OF_BIRTH) "NOS"
FROM EMP
GROUP BY DATE_OF_BIRTH
HAVING DATE_OF_BIRTH <= TO_DATE('31-12-1990','DD-MM-YYYY')
AND DATE_OF_BIRTH >= TO_DATE('01-01-1990','DD-MM-YYYY')
AND COUNT(DATE_OF_BIRTH) >= 2
You need to move your desired date range out of the HAVING clause and into a WHERE clause.
Also, if you want to group everyone together by the month they were born, rather than their individual birthdays, you'll need to GROUP BY the month, rather than the DATE_OF_BIRTH.
SELECT TO_CHAR(DATE_OF_BIRTH,'fm MONTH')"Month", COUNT(TO_CHAR(DATE_OF_BIRTH,'fm MONTH'))"NoS"
FROM EMP
WHERE DATE_OF_BIRTH <= TO_DATE('31-12-1990','DD-MM-YYYY') AND DATE_OF_BIRTH >= TO_DATE('01-01-1990','DD-MM-YYYY')
GROUP BY TO_CHAR(DATE_OF_BIRTH,'fm MONTH')
HAVING COUNT(TO_CHAR(DATE_OF_BIRTH,'fm MONTH')) >= 2

Find employee tenure for a company

I have written the following query to get the employees tenure yearwise.
Ie. grouped by "less than 1 year", "1-2 years", "2-3 years" and "greater than 3 years".
To get this, I compare with employee staffed end_date.
But I am not able to get the correct result when comparing with staffed end_date.
I have pasted the complete code below, but the count I am getting is not correct.
Some employee who worked for more than 2 years is falling under <1 year column.
DECLARE #Project_Id Varchar(10)='ITS-004275';
With Cte_Dates(Period,End_date,Start_date,Project_Id)
As
(
SELECT '<1 Year' AS Period, GETDATE() AS End_Date,DATEADD(YY,-1,GETDATE()) AS Start_date,#Project_Id AS Project_Id
UNION
SELECT '1-2 Years', DATEADD(YY,-1,GETDATE()),DATEADD(YY,-2,GETDATE()),#Project_Id
UNION
SELECT '2-3 Years', DATEADD(YY,-2,GETDATE()),DATEADD(YY,-3,GETDATE()),#Project_Id
UNION
SELECT '>3 Years', DATEADD(YY,-3,GETDATE()),'',#Project_Id
),
--select * from Cte_Dates
--ORDER BY Start_date DESC
Cte_Staffing(PROJECT_ID,EMP_ID,END_DATE) AS
(
SELECT FK_Project_ID,EMP_ID,MAX(End_Date)AS END_DATE FROM DP_Project_Staffing
WHERE FK_Project_ID=#Project_Id
GROUP BY FK_Project_ID,Emp_ID
)
SELECT D.PROJECT_ID,D.Start_date,D.End_date,COUNT(S.EMP_ID) AS Count,D.Period
FROM Cte_Staffing S
RIGHT JOIN Cte_Dates D
ON D.Project_Id=S.PROJECT_ID
AND S.END_DATE<D.End_date AND S.END_DATE>D.Start_date
GROUP BY D.PROJECT_ID,D.Start_date,D.End_date,D.Period
i think this will solve the problem
as you can see, you should use is like this:
DATEADD(year, -1, GETDATE())
you should also get the GETDATE() to a parameter
I find your query logic a little bit messy. Why don't you just compute the total period for every employee and use CASE clause? I can help you with code if you'll give me DP_Project_Staffing table structure. Do you have begin_date field in it?
You are taking the MAX(End_date) of the CTE staffing table. In that case, when an employee has several entries, only the most recent will apply. You want to use MIN instead.
Like this:
Cte_Staffing(PROJECT_ID,EMP_ID,END_DATE) AS
(
SELECT FK_Project_ID, EMP_ID, MIN(End_Date)AS END_DATE
FROM DP_Project_Staffing
...
Re-reading your question, you probably don't want the staffing end_date for tenure calculation; you'd want to use the start_date. (Or whatever the column is called in DP_Project_Staffing)
I would also change the WHERE/JOIN clause to be inclusive on one of the sides, so you have either
AND S.END_DATE <= D.End_date AND S.END_DATE > D.Start_date
or
AND S.END_DATE < D.End_date AND S.END_DATE >= D.Start_date
Since you are using miliseconds in the date-comparison it won't make any difference in this case. However, should you change the granularity to be only the date, which would make more sense, you would lose all records where the employee started exactly 1 year, 2 years, etc. ago.
SELECT FK_Project_ID,E.Emp_ID,MIN(Start_Date) AS Emp_Start_Date ,MAX(End_Date) AS Emp_End_Date,
E.Competency,E.First_Name+' '+E.Last_Name+' ('+E.Emp_Id+')' as Name,'Period'=
CASE
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=12 THEN '<1 Year'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>12 AND DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=24 THEN '1-2 Years'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>24 AND DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=36 THEN '2-3 Years'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>36 THEN '>3 Years'
ELSE 'NA'
END
FROM DP_Project_Staffing PS
LEFT OUTER JOIN DP_Ext_Emp_Master E
ON E.Emp_Id=PS.Emp_ID
WHERE FK_Project_ID=#PROJ_ID
GROUP BY FK_Project_ID,E.Emp_ID,E.Competency,First_Name,Last_Name

employee database with varing salary over time

I have the following tables:
PROJECTS - project_id, name
EMPLOYEES - employee_id, name
SALARY - employee_id, date, per_hour
HOURS - log_id, project_id, employee_id, date, num_hours
I need to query how much a project is costing. Problem is that Salary can vary. For example, a person can get a raise.
The SALARY table logs the per_hour charge for an employee. With every change in cost being recorded with its date.
How can I query this information to make sure that the the log from the HOURS table is always matched to the right entry from the SALARY table. Right match being.. depending on the date of the hours log, get the row from the salary table with the highest date before the log's date.
ie.. if the work was performed on Feb 14th. Get the row for this employee from the Salary table with the highest date.. but still before the 14th.
Thank you,
What you need is an end date on SALARY. When a new record is inserted into SALARY for an employee, the previous record with the highest date (or better yet, a current flag set to 'Y' as recommended by cletus) should have its end date column set to the same date as the start date for the new record.
This should work with your current schema but be aware that it may be slow.
SELECT
SUM(h.num_hours * s.per_hour) AS cost
FROM PROJECTS p
INNER JOIN HOURS h
ON p.project_id = h.project_id
INNER JOIN (
SELECT
s1.employee_id,
s1.date AS start_date,
MIN(s2.date) AS end_date
FROM SALARY s1
INNER JOIN SALARY s2
ON s1.employee_id = s2.employee_id
AND s1.date < s2.date
GROUP BY
s1.employee_id,
s1.date) s
ON h.employee_id = s.employee_id
AND h.date >= s.start_date
AND h.date < s.end_date
In the 'Hours' table actually log the value of the salary that you use (don't link it based on ID). This will give you more flexibility in the future.
I have found the easiest way to handle queries spanning dates like this is to store a StartDate and an EndDate, where the EndDate is NULL for the current salary. I use a trigger to make sure there is only ever one NULL value for EndDate, and that there are no overlapping date ranges, or gaps between the ranges. StartDate is made not nullable, since that is never a valid value.
Then your join is pretty simple:
select h.num_hours, s.per_hour
from hours h
inner join salary s on h.employee_id = s.employee_id
and h.date >= s.StartDate and (h.date <= s.EndDate or s.EndDate is null)