Merge multiple records into 1 - sql

I have a table which looks like this:
I'm trying to write a query which will return this:
I'm trying to merge records based on the effect_date, but only if the end_dates are within the effect_date and end_date range.

select employee
, effect_date
, max(end_date) end_date
, max(clinical_fte)
, max(admin_fte)
, max(mgmt_fte)
, max(other_fte)
from table
group by employee
, effect_date
As stated before by Chris Farmer, the second requirement that end date has to be between effect_date and end_date is silly, because it will always be true.
I've chosen max for all records you want to merge, because you haven't stated how you want to merge them. Feel free to adjust to your needs ;)

Try this SQL Query:
select *
from TABLE
where end_date between effect_date and end_date

Related

Sum values based on start and end dates in another table

i am trying to do a calculation in BigQuery using SQL and I have no idea about how to go about this.
I have two tables:
Tablename: Periods
id INTEGER
start_date DATE
end_date DATE
And another table
Tablename: Data
date DATE
id INTEGER
value FLOAT
And what I want to do is to create a Query that can sum together the value for each id, and timerange (start_date to end_date) in the Periods table. In this case the Data table can have values for id's that are outside of the timerange in the Periods table so I need the Query to limit the summation to just from - to the start_date and end_date.
Hope someone can point me in the right direction on this.
Consider using subquery:
SELECT
id,
(SELECT SUM(value)
FROM Data
WHERE Data.id = Periods.id
AND Data.date >= Periods.start_date
AND Data.date <= Periods.end_date
) AS sums
FROM Periods

How to calculate the longest period in days that a company has gone without headcount change?

Given an employees table with the columns EmpID,FirstName,LastName,StartDate, and EndDate.
I want to use a query on Oracle to calculate the longest period in days that a company has gone without headcount change.
Here is my query:
select MAX(endDate-startDate)
from
(select endDate
from employees
where endDate is not null)
union all
(select startDate
from employees)
But I got an error:
ORA-00904:"STARTDATE":invalid identifier
How can I fix this error?
Is my query the correct answer to this question?
Thanks
You aren't returning the startDate in the sub-query. Add startDate to the inner query.
select MAX(endDate-startDate) from
(select startDate, endDate from employees where endDate is not null)
union all
(select startDate from employees)
EDIT:
You can also try this:
select MAX(endDate-startDate) from employees where endDate is not null
However, I don't think your query is what you're looking for as it only lists the longest term employee that no longer works at the company.
In a simplistic view, you would want to put together all the start-dates (when the headcount increases) and all the end-dates (when it decreases), combine them all, arrange them in increasing order, measure the differences between consecutive dates, and take the max.
"Put together" is a UNION ALL, and measure differences between "consecutive" dates can be done with the analytic function lag().
One complication: one employee may start exactly on the same date another is terminated, so the headcount doesn't change. More generally, on any given date there may be starts and ends, and you need to exclude the dates when there are an equal number of starts and ends. So the first part of the solution is more complicated: you need to group by date and compare the start and end counts.
Something like this may work (not tested!):
with d ( dt, flag ) as (
select start_date, 's' from employees union all
select end_date , 'e' from employees
),
prep ( int ) as
select dt - lag(dt) over (order by dt)
from d
group by dt
having count(case flag when 's' then 1 end) !=
count(case flag when 'e' then 1 end)
)
select max(int) as max_interval
from prep
;
Edit - Gordon has a good point in his solution: perhaps the longest period without a change in headcount is the current period (ending "now"). For this reason, one needs to add SYSDATE to the UNION ALL, like he did. It can be added with either flag (for example 's' to be specific).
I think the answer to your question is something like this:
select max(span)
from (select (lead(dte) over (order by dte) - dte) as span
from (select startDate as dte from employees union all
select endDate as dte from employees union all
select trunc(sysdate) from dual
) d
) d;
A head-count change (presumably) occurs when an employee starts or stops. Hence, you want the largest interval between two such adjacent dates.

How to JOIN two columns of same table with different filters?

I want to join two columns of same table in following way...
QUERY 1 :
SELECT MRCY,DT FROM BILLING WHERE BILL_DATE='01-SEP-14' ORDER BY MRCY;
QUERY 2 :
SELECT DT "OLD DT" FROM BILLING WHERE BILL_DATE='01-AUG-14' ORDER BY MRCY;
How to join output of these query as single result set in Oracle database?
I mean columns should apear as follows:
MRCY DT OLD DT
You may want the following:
SELECT MRCY,
MAX(CASE WHEN BILL_DATE = DATE '2014-09-01' THEN DT END) as DT,
MAX(CASE WHEN BILL_DATE = DATE '2014-08-01' THEN DT END) as OLD_DT,
FROM BILLING
WHERE BILL_DATE IN (DATE '2014-09-01', DATE '2014-08-01')
GROUP BY MRCY
ORDER BY MRCY;
Just use UNION or UNION ALL. Union would remove the duplicates.
But, why would you do that? You can have both filters in the same query.
SELECT MRCY , DT
FROM BILLING
WHERE BILL_DATE IN (to_date('01-SEP-2014', 'DD-MON-YYYY'),
to_date('01-AUG-14', 'DD-MON-YYYY')
ORDER BY MRCY;
Internally, it will be rewritten as an OR logic.
Also, note that you are using literal instead of date. And that is incorrect way of dealing with dates. Oracle would do an implicit data conversion. You must always use TO_DATE.
And, an year should always be YYYY and not just YY. Aren't you aware of the Y2K bug?
SELECT MRCY, :new_dt as DT, :old_dt "OLD_DT"
FROM BILLING WHERE BILL_DATE IN (:new_dt, :old_dt)
ORDER BY MRCY;

Find employee tenure for a company

I have written the following query to get the employees tenure yearwise.
Ie. grouped by "less than 1 year", "1-2 years", "2-3 years" and "greater than 3 years".
To get this, I compare with employee staffed end_date.
But I am not able to get the correct result when comparing with staffed end_date.
I have pasted the complete code below, but the count I am getting is not correct.
Some employee who worked for more than 2 years is falling under <1 year column.
DECLARE #Project_Id Varchar(10)='ITS-004275';
With Cte_Dates(Period,End_date,Start_date,Project_Id)
As
(
SELECT '<1 Year' AS Period, GETDATE() AS End_Date,DATEADD(YY,-1,GETDATE()) AS Start_date,#Project_Id AS Project_Id
UNION
SELECT '1-2 Years', DATEADD(YY,-1,GETDATE()),DATEADD(YY,-2,GETDATE()),#Project_Id
UNION
SELECT '2-3 Years', DATEADD(YY,-2,GETDATE()),DATEADD(YY,-3,GETDATE()),#Project_Id
UNION
SELECT '>3 Years', DATEADD(YY,-3,GETDATE()),'',#Project_Id
),
--select * from Cte_Dates
--ORDER BY Start_date DESC
Cte_Staffing(PROJECT_ID,EMP_ID,END_DATE) AS
(
SELECT FK_Project_ID,EMP_ID,MAX(End_Date)AS END_DATE FROM DP_Project_Staffing
WHERE FK_Project_ID=#Project_Id
GROUP BY FK_Project_ID,Emp_ID
)
SELECT D.PROJECT_ID,D.Start_date,D.End_date,COUNT(S.EMP_ID) AS Count,D.Period
FROM Cte_Staffing S
RIGHT JOIN Cte_Dates D
ON D.Project_Id=S.PROJECT_ID
AND S.END_DATE<D.End_date AND S.END_DATE>D.Start_date
GROUP BY D.PROJECT_ID,D.Start_date,D.End_date,D.Period
i think this will solve the problem
as you can see, you should use is like this:
DATEADD(year, -1, GETDATE())
you should also get the GETDATE() to a parameter
I find your query logic a little bit messy. Why don't you just compute the total period for every employee and use CASE clause? I can help you with code if you'll give me DP_Project_Staffing table structure. Do you have begin_date field in it?
You are taking the MAX(End_date) of the CTE staffing table. In that case, when an employee has several entries, only the most recent will apply. You want to use MIN instead.
Like this:
Cte_Staffing(PROJECT_ID,EMP_ID,END_DATE) AS
(
SELECT FK_Project_ID, EMP_ID, MIN(End_Date)AS END_DATE
FROM DP_Project_Staffing
...
Re-reading your question, you probably don't want the staffing end_date for tenure calculation; you'd want to use the start_date. (Or whatever the column is called in DP_Project_Staffing)
I would also change the WHERE/JOIN clause to be inclusive on one of the sides, so you have either
AND S.END_DATE <= D.End_date AND S.END_DATE > D.Start_date
or
AND S.END_DATE < D.End_date AND S.END_DATE >= D.Start_date
Since you are using miliseconds in the date-comparison it won't make any difference in this case. However, should you change the granularity to be only the date, which would make more sense, you would lose all records where the employee started exactly 1 year, 2 years, etc. ago.
SELECT FK_Project_ID,E.Emp_ID,MIN(Start_Date) AS Emp_Start_Date ,MAX(End_Date) AS Emp_End_Date,
E.Competency,E.First_Name+' '+E.Last_Name+' ('+E.Emp_Id+')' as Name,'Period'=
CASE
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=12 THEN '<1 Year'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>12 AND DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=24 THEN '1-2 Years'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>24 AND DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))<=36 THEN '2-3 Years'
WHEN DATEDIFF(MONTH,MIN(Start_Date),MAX(End_Date))>36 THEN '>3 Years'
ELSE 'NA'
END
FROM DP_Project_Staffing PS
LEFT OUTER JOIN DP_Ext_Emp_Master E
ON E.Emp_Id=PS.Emp_ID
WHERE FK_Project_ID=#PROJ_ID
GROUP BY FK_Project_ID,E.Emp_ID,E.Competency,First_Name,Last_Name

Sql query to find date period between multiple rows

I have a table with three columns (City_Code | Start_Date | End_Date).
Suppose i have the following data:
New_York|01/01/1985|01/01/1987
Paris|02/01/1987|01/01/1990
San Francisco|02/01/1990|04/01/1990
Paris|05/01/1990|08/01/1990
New_York|09/01/1990|11/01/1990
New_York|12/01/1990|19/01/1990
New_York|20/01/1990|28/01/1990
I would like to get the date period for which someone lived in the last city of his residence. In this example that is New_York(09/01/1990-28/01/1990) using only sql. I can get this period by manipulating the data with java , but is it possible to do it with plain sql?
Thanks in advance
You can grab the first and last date of residence by city using this:
SELECT TOP 1 City_Code, MIN(Start_Date), Max(End_Date)
FROM Table
GROUP BY City_Code
ORDER BY Max(End_Date) desc
but, the problem is that the start date will be the first date of residence in the city in question.
For 10g you don't have the option of SELECT TOP n so you must be a little creative.
WITH last_period
AS
(SELECT city, moved_in, moved_out, NVL(moved_in-LEAD(moved_out, 1) OVER (ORDER BY city), 0) AS lead
FROM periods
WHERE city = (SELECT city FROM periods WHERE moved_out = (SELECT MAX(moved_out) FROM periods)))
SELECT city, MIN(moved_in) AS moved_in, MAX(moved_out) AS moved_out
FROM last_period
WHERE lead >= 0
GROUP BY city;
This works for the example dataset that you have given. It could stand some optimisation for a large dataset but gives you a working example, tested on Oracle 10g.
If it's MySQL, you can easily use
TIME_TO_SEC(TIMEDIFF(end_date, start_date)) AS `diff_in_secs`
Having time difference in seconds you go any further.
On SQL Server, couldn't you use:
SELECT TOP 1 City_Code, Start_Date + "-" + End_Date
FROM MyTable
ORDER BY enddate DESC
That would get the date period and city with the latest end date.
This is assuming you are trying to just find the city where the person most recently lived, formatted with a dash.
Given that this is Oracle, you can simply subtract the end date and start date to get the number of days in between.
Select City_Code, (End_Date - Start_Date) Days
From MyTable
Where Start_Date = (
Select Max( T1.Start_ Date )
From MyTable As T1
)
If you are using SQL Server you can use the DateDiff() function
DATEDIFF ( datepart , startdate , enddate )
http://msdn.microsoft.com/en-us/library/ms189794.aspx
EDIT
I don't know Oracle but I did find this article
SELECT
MAX(t.duration)
FROM (
SELECT
(End_Date - Start_Date) duration
From
Table
) as t
I hope this will work.
If you want to calculate only the last period length for the last city of residence, then it's probably something like this:
SELECT TOP 1
City_Code,
End_Date - Start_Date AS Days
FROM atable
ORDER BY Start_Date DESC
But if you mean to include all the periods the person has ever lived in a city that happens to be their last city of residence, then it's a bit more complicated, but not too much:
SELECT TOP 1
City_Code,
SUM(End_Date - Start_Date) AS Days
FROM atable
GROUP BY City_Code
ORDER BY MAX(Start_Date) DESC
But the above solution most probably returns the last city information only after it calculates the data for all cities. Do we need that? Not necessarily, so maybe we should use another approach. Maybe like this:
SELECT
City_Code,
SUM(End_Date - Start_Date) AS Days
FROM atable
WHERE City_Code = (SELECT TOP 1 City_Code FROM atable ORDER BY Start_Date DESC)
GROUP BY City_Code
i'm short on time - but this feels like you could use the window function LAG to compare to the previous row and retain the appropriate begin date from that row when the city changes, and dont change it when the city is the same - this should correctly preserve the range.