How to calculate the longest period in days that a company has gone without headcount change? - sql

Given an employees table with the columns EmpID,FirstName,LastName,StartDate, and EndDate.
I want to use a query on Oracle to calculate the longest period in days that a company has gone without headcount change.
Here is my query:
select MAX(endDate-startDate)
from
(select endDate
from employees
where endDate is not null)
union all
(select startDate
from employees)
But I got an error:
ORA-00904:"STARTDATE":invalid identifier
How can I fix this error?
Is my query the correct answer to this question?
Thanks

You aren't returning the startDate in the sub-query. Add startDate to the inner query.
select MAX(endDate-startDate) from
(select startDate, endDate from employees where endDate is not null)
union all
(select startDate from employees)
EDIT:
You can also try this:
select MAX(endDate-startDate) from employees where endDate is not null
However, I don't think your query is what you're looking for as it only lists the longest term employee that no longer works at the company.

In a simplistic view, you would want to put together all the start-dates (when the headcount increases) and all the end-dates (when it decreases), combine them all, arrange them in increasing order, measure the differences between consecutive dates, and take the max.
"Put together" is a UNION ALL, and measure differences between "consecutive" dates can be done with the analytic function lag().
One complication: one employee may start exactly on the same date another is terminated, so the headcount doesn't change. More generally, on any given date there may be starts and ends, and you need to exclude the dates when there are an equal number of starts and ends. So the first part of the solution is more complicated: you need to group by date and compare the start and end counts.
Something like this may work (not tested!):
with d ( dt, flag ) as (
select start_date, 's' from employees union all
select end_date , 'e' from employees
),
prep ( int ) as
select dt - lag(dt) over (order by dt)
from d
group by dt
having count(case flag when 's' then 1 end) !=
count(case flag when 'e' then 1 end)
)
select max(int) as max_interval
from prep
;
Edit - Gordon has a good point in his solution: perhaps the longest period without a change in headcount is the current period (ending "now"). For this reason, one needs to add SYSDATE to the UNION ALL, like he did. It can be added with either flag (for example 's' to be specific).

I think the answer to your question is something like this:
select max(span)
from (select (lead(dte) over (order by dte) - dte) as span
from (select startDate as dte from employees union all
select endDate as dte from employees union all
select trunc(sysdate) from dual
) d
) d;
A head-count change (presumably) occurs when an employee starts or stops. Hence, you want the largest interval between two such adjacent dates.

Related

Data value on a given date

This time I have a table on a PostgreSQL database that contains the employee name, the date that he started working and the date that he leaves the company, in the cases of the employee still remains in the company, this field has null value.
Knowing this, I would like to know how many people was working on a predetermined date, ex:
I would like to know how many people works on the company in January 2021.
I don't know where to start, in some attempts I got the number of hires and layoffs per month, but I need to show this accumulated value per month, in another column.
I hope I made myself understood, I'll leave the last SQL I got here.
select reference, sum(hires) from
(
select
date_trunc('month', date_hires) as reference,
count(*) as hires
from
ponto_mais_relatorio_colaboradores
group by
date_hires
union all
select
date_trunc('month', date_layoff) as reference,
count(*)*-1 as layoffs
from
ponto_mais_relatorio_colaboradores
group by
date_layoff
) as reference
join calendar_aux on calendar_aux.ano_mes = reference
group by reference
order by reference
Break the requirement down. The question: how many are employed on any given date? That would include all hired before that date and do not have a layoff date plus all hired before with a layoff date later then the date your interested period. I.e you are interested in Jan so you still want to count an employee with a layoff date in Feb. With that in place convert into SQL. The preceding is available from select comparing dates. other issue is that Jan is not a date, it is a range of dates, so you need each date. You can use generate series to create each day in Jan. Then Join the generated dates with and selection from your table. Resulting query:
with jan_dates( jdate ) as
( select generate_series( date '2021-01-01'
, date '2021-01-31'
, interval '1' day
)::date
)
select jdate "Date", count(*) "Employees"
from jan_dates j
join employees e
on ( e.date_hires <= j.jdate
and ( e.date_layoff is null
or e.date_layoff > j.jdate
)
)
group by j.jdate
order by j.jdate;
Note: Not tested.

SQL In Oracle - How to search through occurrences in an interval?

I've gotten myself stuck working in Oracle with SQL for the first time. In my library example, I need to make a query on my tables for a library member who has borrowed more than 5 books in some week during the past year. Here's my attempt:
SELECT
PN.F_NAME,
PN.L_NAME,
M.ENROLL_DATE,
COUNT(*) AS BORROWED_COUNT,
(SELECT
(BD.DATE_BORROWED + INTERVAL '7' DAY)
FROM DUAL, BORROW_DETAILS BD
GROUP BY BD.DATE_BORROWED + INTERVAL '7' DAY
HAVING COUNT(*) > 5
) AS VALID_INTERVALS
FROM PERSON_NAME PN, BORROW_DETAILS BD, HAS H, MEMBER M
WHERE
PN.PID = M.PID AND
M.PID = BD.PID AND
BD.BORROWID = H.BORROWID
GROUP BY PN.F_NAME, PN.L_NAME, M.ENROLL_DATE, DATEDIFF(DAY, BD.DATE_RETURNED, VALID_INTERVALS)
ORDER BY BORROWED_COUNT DESC;
As I'm sure you can tell, Im really struggling with the Dates in oracle. For some reason DATEDIFF wont work at all for me, and I cant find any way to evaluate the VALID_INTERVAL which should be another date...
Also apologies for the all caps.
DATEDIFF is not a valid function in Oracle; if you want the difference then subtract one date from another and you'll get a number representing the number of days (or fraction thereof) between the values.
If you want to count it for a week starting from Midnight Monday then you can TRUNCate the date to the start of the ISO week (which will be Midnight of the Monday of that week) and then group and count:
SELECT MAX( PN.F_NAME ) AS F_NAME,
MAX( PN.L_NAME ) AS L_NAME,
MAX( M.ENROLL_DATE ) AS ENROLL_DATE,
TRUNC( BD.DATE_BORROWED, 'IW' ) AS monday_of_iso_week,
COUNT(*) AS BORROWED_COUNT
FROM PERSON_NAME PN
INNER JOIN MEMBER M
ON ( PN.PID = M.PID )
INNER JOIN BORROW_DETAILS BD
ON ( M.PID = BD.PID )
GROUP BY
PN.PID,
TRUNC( BD.DATE_BORROWED, 'IW' )
HAVING COUNT(*) > 5
ORDER BY BORROWED_COUNT DESC;
db<>fiddle
You haven't given your table structures or any sample data so its difficult to test; but you don't appear to need to include the HAS table and I'm assuming there is a 1:1 relationship between person and member.
You also don't want to GROUP BY names as there could be two people with the same first and last name (who happened to enrol on the same date) and should use something that uniquely identifies the person (which I assume is PID).

Mark the first two rows with a value in a column

I have an Oracle view which contains employee's attendance records including late attendance days.
Columns:
Employee Number
Date
In-Time
Out-time
Late_Arrival (if late currently getting marked as 1)
I want to mark each employees first two late arrivals of a month as "G" from a query.
Please help me with this.
Sample data as below link tps://drive.google.com/open?id=0B6Xw1eXeLyG7akZuTEdDUGNIUDg
This sounds like something for row_number() and some date arithmetic:
select t.*,
(case when late_arrival = 1 and
row_number() over (partition by trunc(date, 'MON'), late_arrival
order by date, intime
) <= 2
then 'G'
end) as flag
from t;

Sql query to find date period between multiple rows

I have a table with three columns (City_Code | Start_Date | End_Date).
Suppose i have the following data:
New_York|01/01/1985|01/01/1987
Paris|02/01/1987|01/01/1990
San Francisco|02/01/1990|04/01/1990
Paris|05/01/1990|08/01/1990
New_York|09/01/1990|11/01/1990
New_York|12/01/1990|19/01/1990
New_York|20/01/1990|28/01/1990
I would like to get the date period for which someone lived in the last city of his residence. In this example that is New_York(09/01/1990-28/01/1990) using only sql. I can get this period by manipulating the data with java , but is it possible to do it with plain sql?
Thanks in advance
You can grab the first and last date of residence by city using this:
SELECT TOP 1 City_Code, MIN(Start_Date), Max(End_Date)
FROM Table
GROUP BY City_Code
ORDER BY Max(End_Date) desc
but, the problem is that the start date will be the first date of residence in the city in question.
For 10g you don't have the option of SELECT TOP n so you must be a little creative.
WITH last_period
AS
(SELECT city, moved_in, moved_out, NVL(moved_in-LEAD(moved_out, 1) OVER (ORDER BY city), 0) AS lead
FROM periods
WHERE city = (SELECT city FROM periods WHERE moved_out = (SELECT MAX(moved_out) FROM periods)))
SELECT city, MIN(moved_in) AS moved_in, MAX(moved_out) AS moved_out
FROM last_period
WHERE lead >= 0
GROUP BY city;
This works for the example dataset that you have given. It could stand some optimisation for a large dataset but gives you a working example, tested on Oracle 10g.
If it's MySQL, you can easily use
TIME_TO_SEC(TIMEDIFF(end_date, start_date)) AS `diff_in_secs`
Having time difference in seconds you go any further.
On SQL Server, couldn't you use:
SELECT TOP 1 City_Code, Start_Date + "-" + End_Date
FROM MyTable
ORDER BY enddate DESC
That would get the date period and city with the latest end date.
This is assuming you are trying to just find the city where the person most recently lived, formatted with a dash.
Given that this is Oracle, you can simply subtract the end date and start date to get the number of days in between.
Select City_Code, (End_Date - Start_Date) Days
From MyTable
Where Start_Date = (
Select Max( T1.Start_ Date )
From MyTable As T1
)
If you are using SQL Server you can use the DateDiff() function
DATEDIFF ( datepart , startdate , enddate )
http://msdn.microsoft.com/en-us/library/ms189794.aspx
EDIT
I don't know Oracle but I did find this article
SELECT
MAX(t.duration)
FROM (
SELECT
(End_Date - Start_Date) duration
From
Table
) as t
I hope this will work.
If you want to calculate only the last period length for the last city of residence, then it's probably something like this:
SELECT TOP 1
City_Code,
End_Date - Start_Date AS Days
FROM atable
ORDER BY Start_Date DESC
But if you mean to include all the periods the person has ever lived in a city that happens to be their last city of residence, then it's a bit more complicated, but not too much:
SELECT TOP 1
City_Code,
SUM(End_Date - Start_Date) AS Days
FROM atable
GROUP BY City_Code
ORDER BY MAX(Start_Date) DESC
But the above solution most probably returns the last city information only after it calculates the data for all cities. Do we need that? Not necessarily, so maybe we should use another approach. Maybe like this:
SELECT
City_Code,
SUM(End_Date - Start_Date) AS Days
FROM atable
WHERE City_Code = (SELECT TOP 1 City_Code FROM atable ORDER BY Start_Date DESC)
GROUP BY City_Code
i'm short on time - but this feels like you could use the window function LAG to compare to the previous row and retain the appropriate begin date from that row when the city changes, and dont change it when the city is the same - this should correctly preserve the range.

Last day of the month with a twist in SQLPLUS

I would appreciate a little expert help please.
in an SQL SELECT statement I am trying to get the last day with data per month for the last year.
Example, I am easily able to get the last day of each month and join that to my data table, but the problem is, if the last day of the month does not have data, then there is no returned data. What I need is for the SELECT to return the last day with data for the month.
This is probably easy to do, but to be honest, my brain fart is starting to hurt.
I've attached the select below that works for returning the data for only the last day of the month for the last 12 months.
Thanks in advance for your help!
SELECT fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,fd.column_name
FROM super_table fd,
(SELECT TRUNC(daterange,'MM')-1 first_of_month
FROM (
select TRUNC(sysdate-365,'MM') + level as DateRange
from dual
connect by level<=365)
GROUP BY TRUNC(daterange,'MM')) fom
WHERE fd.cust_id = :CUST_ID
AND fd.coll_date > SYSDATE-400
AND TRUNC(fd.coll_date) = fom.first_of_month
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)
You probably need to group your data so that each month's data is in the group, and then within the group select the maximum date present. The sub-query might be:
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY YEAR(coll_date) * 100 + MONTH(coll_date);
This presumes that the functions YEAR() and MONTH() exist to extract the year and month from a date as an integer value. Clearly, this doesn't constrain the range of dates - you can do that, too. If you don't have the functions in Oracle, then you do some sort of manipulation to get the equivalent result.
Using information from Rhose (thanks):
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY TO_CHAR(coll_date, 'YYYYMM');
This achieves the same net result, putting all dates from the same calendar month into a group and then determining the maximum value present within that group.
Here's another approach, if ANSI row_number() is supported:
with RevDayRanked(itemDate,rn) as (
select
cast(coll_date as date),
row_number() over (
partition by datediff(month,coll_date,'2000-01-01') -- rewrite datediff as needed for your platform
order by coll_date desc
)
from super_table
)
select itemDate
from RevDayRanked
where rn = 1;
Rows numbered 1 will be nondeterministically chosen among rows on the last active date of the month, so you don't need distinct. If you want information out of the table for all rows on these dates, use rank() over days instead of row_number() over coll_date values, so a value of 1 appears for any row on the last active date of the month, and select the additional columns you need:
with RevDayRanked(cust_id, server_name, coll_date, rk) as (
select
cust_id, server_name, coll_date,
rank() over (
partition by datediff(month,coll_date,'2000-01-01')
order by cast(coll_date as date) desc
)
from super_table
)
select cust_id, server_name, coll_date
from RevDayRanked
where rk = 1;
If row_number() and rank() aren't supported, another approach is this (for the second query above). Select all rows from your table for which there's no row in the table from a later day in the same month.
select
cust_id, server_name, coll_date
from super_table as ST1
where not exists (
select *
from super_table as ST2
where datediff(month,ST1.coll_date,ST2.coll_date) = 0
and cast(ST2.coll_date as date) > cast(ST1.coll_date as date)
)
If you have to do this kind of thing a lot, see if you can create an index over computed columns that hold cast(coll_date as date) and a month indicator like datediff(month,'2001-01-01',coll_date). That'll make more of the predicates SARGs.
Putting the above pieces together, would something like this work for you?
SELECT fd.cust_id,
fd.server_name,
fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,
fd.column_name
FROM super_table fd,
WHERE fd.cust_id = :CUST_ID
AND TRUNC(fd.coll_date) IN (
SELECT MAX(TRUNC(coll_date))
FROM super_table
WHERE coll_date > SYSDATE - 400
AND cust_id = :CUST_ID
GROUP BY TO_CHAR(coll_date,'YYYYMM')
)
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)