Sql query to find date period between multiple rows - sql

I have a table with three columns (City_Code | Start_Date | End_Date).
Suppose i have the following data:
New_York|01/01/1985|01/01/1987
Paris|02/01/1987|01/01/1990
San Francisco|02/01/1990|04/01/1990
Paris|05/01/1990|08/01/1990
New_York|09/01/1990|11/01/1990
New_York|12/01/1990|19/01/1990
New_York|20/01/1990|28/01/1990
I would like to get the date period for which someone lived in the last city of his residence. In this example that is New_York(09/01/1990-28/01/1990) using only sql. I can get this period by manipulating the data with java , but is it possible to do it with plain sql?
Thanks in advance

You can grab the first and last date of residence by city using this:
SELECT TOP 1 City_Code, MIN(Start_Date), Max(End_Date)
FROM Table
GROUP BY City_Code
ORDER BY Max(End_Date) desc
but, the problem is that the start date will be the first date of residence in the city in question.

For 10g you don't have the option of SELECT TOP n so you must be a little creative.
WITH last_period
AS
(SELECT city, moved_in, moved_out, NVL(moved_in-LEAD(moved_out, 1) OVER (ORDER BY city), 0) AS lead
FROM periods
WHERE city = (SELECT city FROM periods WHERE moved_out = (SELECT MAX(moved_out) FROM periods)))
SELECT city, MIN(moved_in) AS moved_in, MAX(moved_out) AS moved_out
FROM last_period
WHERE lead >= 0
GROUP BY city;
This works for the example dataset that you have given. It could stand some optimisation for a large dataset but gives you a working example, tested on Oracle 10g.

If it's MySQL, you can easily use
TIME_TO_SEC(TIMEDIFF(end_date, start_date)) AS `diff_in_secs`
Having time difference in seconds you go any further.

On SQL Server, couldn't you use:
SELECT TOP 1 City_Code, Start_Date + "-" + End_Date
FROM MyTable
ORDER BY enddate DESC
That would get the date period and city with the latest end date.
This is assuming you are trying to just find the city where the person most recently lived, formatted with a dash.

Given that this is Oracle, you can simply subtract the end date and start date to get the number of days in between.
Select City_Code, (End_Date - Start_Date) Days
From MyTable
Where Start_Date = (
Select Max( T1.Start_ Date )
From MyTable As T1
)

If you are using SQL Server you can use the DateDiff() function
DATEDIFF ( datepart , startdate , enddate )
http://msdn.microsoft.com/en-us/library/ms189794.aspx
EDIT
I don't know Oracle but I did find this article

SELECT
MAX(t.duration)
FROM (
SELECT
(End_Date - Start_Date) duration
From
Table
) as t
I hope this will work.

If you want to calculate only the last period length for the last city of residence, then it's probably something like this:
SELECT TOP 1
City_Code,
End_Date - Start_Date AS Days
FROM atable
ORDER BY Start_Date DESC
But if you mean to include all the periods the person has ever lived in a city that happens to be their last city of residence, then it's a bit more complicated, but not too much:
SELECT TOP 1
City_Code,
SUM(End_Date - Start_Date) AS Days
FROM atable
GROUP BY City_Code
ORDER BY MAX(Start_Date) DESC
But the above solution most probably returns the last city information only after it calculates the data for all cities. Do we need that? Not necessarily, so maybe we should use another approach. Maybe like this:
SELECT
City_Code,
SUM(End_Date - Start_Date) AS Days
FROM atable
WHERE City_Code = (SELECT TOP 1 City_Code FROM atable ORDER BY Start_Date DESC)
GROUP BY City_Code

i'm short on time - but this feels like you could use the window function LAG to compare to the previous row and retain the appropriate begin date from that row when the city changes, and dont change it when the city is the same - this should correctly preserve the range.

Related

Issue with the repeated records in SQL

My dataset looks like below:
I am trying to get Min start date & Max end date of an employee whenever there is a team change.
The problem here is, the date is not coming for repeated team.
Any help would be appreciated..
Teradata has a nice SQL extension for normalizing overlapping date ranges. This assumes that you want to get extra rows when a month is missing, i.e. there's a gap:
SELECT
emp_id
,team
-- split the Period into seperate columns again
,Begin(pd)
,last_day(add_months(End(pd),-1)) -- end of previous month
FRO
(
SELECT NORMALIZE -- normalize overlapping periods
emp_id
,team
-- NORMALIZE only works with periods, so create a Period based on current date plus one month
,PERIOD(month_end_date
,last_day(add_months(month_end_date, 1))
) AS pd
FROM vt
) AS dt;
If I understand correctly, this is a gaps-and-islands problem that can be solved using the difference of row number.
You can use:
select emp_id, team, min(month_end_date), max(month_end_date)
from (select t.*,
row_number() over (partition by emp_id order by month_end_date) as seqnum,
row_number() over (partition by emp_id, team order by month_end_date) as seqnum_2
from t
) t
group by emp_id, team, (seqnum - seqnum_2);
Note: This puts the dates on a single row, which seems more useful than your expected results.

Teradata SQL help. Need help getting the start date and end date (yellow) of the most recent employment status. Thank you

Teradata SQL help. Need help getting the start date and end date (yellow) of the most recent employment status. Thank you. Click on the question for image.
There are several ways to get your expected result, based on your data you might apply Teradata's NORMALIZE option, a SQL extension to combine overlapping periods:
SELECT NAME, job_title, status, next(Begin(pd)) AS start_date, End(pd) AS end_date
FROM
( -- returns one group for consecutive overlapping rows
SELECT NORMALIZE
name,
job_title,
status,
-- need to subtract 1 to create a valid period
PERIOD(prior(start_date), end_date) AS pd
FROM tab
) AS dt
QUALIFY
-- return the latest row only
Row_Number()
Over(PARTITION BY name
ORDER BY start_date DESC) = 1
Caution : This returns a new group whenever name, job_title or status change.
Give this a try:
SELECT
t.name,
MAX(t.job_title) AS job_title,
status,
MIN(t.start_date) AS start_date,
MAX(t.end_date) AS end_date
FROM mytable t
GROUP BY t.name, t.status
QUALIFY RANK() OVER(PARTITION BY t.name ORDER BY t.start_date DESC) = 1
Teradata will do the aggregates first and then apply the window function. So, this will first get the MIN/MAX dates within each person's status and then assign a RANK to each of these rows based on the most recent start_date.
I don't have a system to test, but give it a try and let me know.

Mark the first two rows with a value in a column

I have an Oracle view which contains employee's attendance records including late attendance days.
Columns:
Employee Number
Date
In-Time
Out-time
Late_Arrival (if late currently getting marked as 1)
I want to mark each employees first two late arrivals of a month as "G" from a query.
Please help me with this.
Sample data as below link tps://drive.google.com/open?id=0B6Xw1eXeLyG7akZuTEdDUGNIUDg
This sounds like something for row_number() and some date arithmetic:
select t.*,
(case when late_arrival = 1 and
row_number() over (partition by trunc(date, 'MON'), late_arrival
order by date, intime
) <= 2
then 'G'
end) as flag
from t;

How to calculate the longest period in days that a company has gone without headcount change?

Given an employees table with the columns EmpID,FirstName,LastName,StartDate, and EndDate.
I want to use a query on Oracle to calculate the longest period in days that a company has gone without headcount change.
Here is my query:
select MAX(endDate-startDate)
from
(select endDate
from employees
where endDate is not null)
union all
(select startDate
from employees)
But I got an error:
ORA-00904:"STARTDATE":invalid identifier
How can I fix this error?
Is my query the correct answer to this question?
Thanks
You aren't returning the startDate in the sub-query. Add startDate to the inner query.
select MAX(endDate-startDate) from
(select startDate, endDate from employees where endDate is not null)
union all
(select startDate from employees)
EDIT:
You can also try this:
select MAX(endDate-startDate) from employees where endDate is not null
However, I don't think your query is what you're looking for as it only lists the longest term employee that no longer works at the company.
In a simplistic view, you would want to put together all the start-dates (when the headcount increases) and all the end-dates (when it decreases), combine them all, arrange them in increasing order, measure the differences between consecutive dates, and take the max.
"Put together" is a UNION ALL, and measure differences between "consecutive" dates can be done with the analytic function lag().
One complication: one employee may start exactly on the same date another is terminated, so the headcount doesn't change. More generally, on any given date there may be starts and ends, and you need to exclude the dates when there are an equal number of starts and ends. So the first part of the solution is more complicated: you need to group by date and compare the start and end counts.
Something like this may work (not tested!):
with d ( dt, flag ) as (
select start_date, 's' from employees union all
select end_date , 'e' from employees
),
prep ( int ) as
select dt - lag(dt) over (order by dt)
from d
group by dt
having count(case flag when 's' then 1 end) !=
count(case flag when 'e' then 1 end)
)
select max(int) as max_interval
from prep
;
Edit - Gordon has a good point in his solution: perhaps the longest period without a change in headcount is the current period (ending "now"). For this reason, one needs to add SYSDATE to the UNION ALL, like he did. It can be added with either flag (for example 's' to be specific).
I think the answer to your question is something like this:
select max(span)
from (select (lead(dte) over (order by dte) - dte) as span
from (select startDate as dte from employees union all
select endDate as dte from employees union all
select trunc(sysdate) from dual
) d
) d;
A head-count change (presumably) occurs when an employee starts or stops. Hence, you want the largest interval between two such adjacent dates.

Last day of the month with a twist in SQLPLUS

I would appreciate a little expert help please.
in an SQL SELECT statement I am trying to get the last day with data per month for the last year.
Example, I am easily able to get the last day of each month and join that to my data table, but the problem is, if the last day of the month does not have data, then there is no returned data. What I need is for the SELECT to return the last day with data for the month.
This is probably easy to do, but to be honest, my brain fart is starting to hurt.
I've attached the select below that works for returning the data for only the last day of the month for the last 12 months.
Thanks in advance for your help!
SELECT fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,fd.column_name
FROM super_table fd,
(SELECT TRUNC(daterange,'MM')-1 first_of_month
FROM (
select TRUNC(sysdate-365,'MM') + level as DateRange
from dual
connect by level<=365)
GROUP BY TRUNC(daterange,'MM')) fom
WHERE fd.cust_id = :CUST_ID
AND fd.coll_date > SYSDATE-400
AND TRUNC(fd.coll_date) = fom.first_of_month
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,
TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)
You probably need to group your data so that each month's data is in the group, and then within the group select the maximum date present. The sub-query might be:
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY YEAR(coll_date) * 100 + MONTH(coll_date);
This presumes that the functions YEAR() and MONTH() exist to extract the year and month from a date as an integer value. Clearly, this doesn't constrain the range of dates - you can do that, too. If you don't have the functions in Oracle, then you do some sort of manipulation to get the equivalent result.
Using information from Rhose (thanks):
SELECT MAX(coll_date) AS last_day_of_month
FROM Super_Table AS fd
GROUP BY TO_CHAR(coll_date, 'YYYYMM');
This achieves the same net result, putting all dates from the same calendar month into a group and then determining the maximum value present within that group.
Here's another approach, if ANSI row_number() is supported:
with RevDayRanked(itemDate,rn) as (
select
cast(coll_date as date),
row_number() over (
partition by datediff(month,coll_date,'2000-01-01') -- rewrite datediff as needed for your platform
order by coll_date desc
)
from super_table
)
select itemDate
from RevDayRanked
where rn = 1;
Rows numbered 1 will be nondeterministically chosen among rows on the last active date of the month, so you don't need distinct. If you want information out of the table for all rows on these dates, use rank() over days instead of row_number() over coll_date values, so a value of 1 appears for any row on the last active date of the month, and select the additional columns you need:
with RevDayRanked(cust_id, server_name, coll_date, rk) as (
select
cust_id, server_name, coll_date,
rank() over (
partition by datediff(month,coll_date,'2000-01-01')
order by cast(coll_date as date) desc
)
from super_table
)
select cust_id, server_name, coll_date
from RevDayRanked
where rk = 1;
If row_number() and rank() aren't supported, another approach is this (for the second query above). Select all rows from your table for which there's no row in the table from a later day in the same month.
select
cust_id, server_name, coll_date
from super_table as ST1
where not exists (
select *
from super_table as ST2
where datediff(month,ST1.coll_date,ST2.coll_date) = 0
and cast(ST2.coll_date as date) > cast(ST1.coll_date as date)
)
If you have to do this kind of thing a lot, see if you can create an index over computed columns that hold cast(coll_date as date) and a month indicator like datediff(month,'2001-01-01',coll_date). That'll make more of the predicates SARGs.
Putting the above pieces together, would something like this work for you?
SELECT fd.cust_id,
fd.server_name,
fd.instance_name,
TRUNC(fd.coll_date) AS coll_date,
fd.column_name
FROM super_table fd,
WHERE fd.cust_id = :CUST_ID
AND TRUNC(fd.coll_date) IN (
SELECT MAX(TRUNC(coll_date))
FROM super_table
WHERE coll_date > SYSDATE - 400
AND cust_id = :CUST_ID
GROUP BY TO_CHAR(coll_date,'YYYYMM')
)
GROUP BY fd.cust_id,fd.server_name,fd.instance_name,TRUNC(fd.coll_date),fd.column_name
ORDER BY fd.server_name,fd.instance_name,TRUNC(fd.coll_date)