Mark the first two rows with a value in a column - sql

I have an Oracle view which contains employee's attendance records including late attendance days.
Columns:
Employee Number
Date
In-Time
Out-time
Late_Arrival (if late currently getting marked as 1)
I want to mark each employees first two late arrivals of a month as "G" from a query.
Please help me with this.
Sample data as below link tps://drive.google.com/open?id=0B6Xw1eXeLyG7akZuTEdDUGNIUDg

This sounds like something for row_number() and some date arithmetic:
select t.*,
(case when late_arrival = 1 and
row_number() over (partition by trunc(date, 'MON'), late_arrival
order by date, intime
) <= 2
then 'G'
end) as flag
from t;

Related

Issue with the repeated records in SQL

My dataset looks like below:
I am trying to get Min start date & Max end date of an employee whenever there is a team change.
The problem here is, the date is not coming for repeated team.
Any help would be appreciated..
Teradata has a nice SQL extension for normalizing overlapping date ranges. This assumes that you want to get extra rows when a month is missing, i.e. there's a gap:
SELECT
emp_id
,team
-- split the Period into seperate columns again
,Begin(pd)
,last_day(add_months(End(pd),-1)) -- end of previous month
FRO
(
SELECT NORMALIZE -- normalize overlapping periods
emp_id
,team
-- NORMALIZE only works with periods, so create a Period based on current date plus one month
,PERIOD(month_end_date
,last_day(add_months(month_end_date, 1))
) AS pd
FROM vt
) AS dt;
If I understand correctly, this is a gaps-and-islands problem that can be solved using the difference of row number.
You can use:
select emp_id, team, min(month_end_date), max(month_end_date)
from (select t.*,
row_number() over (partition by emp_id order by month_end_date) as seqnum,
row_number() over (partition by emp_id, team order by month_end_date) as seqnum_2
from t
) t
group by emp_id, team, (seqnum - seqnum_2);
Note: This puts the dates on a single row, which seems more useful than your expected results.

Finding id's available in previous weeks but not in current week

How to find if an id which was present in previous weeks but not available in current week on a rolling basis. For e.g
Week1 has id 1,2,3,4,5
Week2 has id 3,4,5,7,8
Week3 has id 1,3,5,10,11
So I found out that id 1 and 2 are missing in week 2 and id 2,4,7,8 are missing in week 3 from previous 2 weeks But how to do this on a rolling window for a large amount of data distributed over a period of 20+ years
Please find the sample dataset and expected output. I am expecting the output to be partitioned based on the week_end Date
Dataset
ID|WEEK_START|WEEK_END|APPEARING_DATE
7152|2015-12-27|2016-01-02|2015-12-27
8350|2015-12-27|2016-01-02|2015-12-27
7152|2015-12-27|2016-01-02|2015-12-29
4697|2015-12-27|2016-01-02|2015-12-30
7187|2015-12-27|2016-01-02|2015-01-01
8005|2015-12-27|2016-01-02|2015-12-27
8005|2015-12-27|2016-01-02|2015-12-29
6254|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-04
3339|2016-01-03|2016-01-09|2016-01-06
7834|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-05
7152|2016-01-03|2016-01-09|2016-01-07
8350|2016-01-03|2016-01-09|2016-01-09
2403|2016-01-10|2016-01-16|2016-01-10
0157|2016-01-10|2016-01-16|2016-01-11
2228|2016-01-10|2016-01-16|2016-01-14
4697|2016-01-10|2016-01-16|2016-01-14
Excepted Output
Partition1: WEEK_END=2016-01-02
ID|MAX(LAST_APPEARING_DATE)
7152|2015-12-29
8350|2015-12-27
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
Partition1: WEEK_END=2016-01-09
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
Partition3: WEEK_END=2016-01-10
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2016-01-14
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
2403|2016-01-10
0157|2016-01-11
2228|2016-01-14
Please use below query,
select ID, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
Or, including WEEK)END,
select ID, WEEK_END, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
You can use aggregation:
select t.*, max(week_end)
from t
group by id
having max(week_end) < '2016-01-02';
Adjust the date in the having clause for the week end that you want.
Actually, your question is a bit unclear. I'm not sure if a later week end would keep the row or not. If you want "as of" data, then include a where clause:
select t.id, max(week_end)
from t
where week_end < '2016-01-02'
group by id
having max(week_end) < '2016-01-02';
If you want this for a range of dates, then you can use a derived table:
select we.the_week_end, t.id, max(week_end)
from (select '2016-01-02' as the_week_end union all
select '2016-01-09' as the_week_end
) we cross join
t
where t.week_end < we.the_week_end
group by id, we.the_week_end
having max(t.week_end) < we.the_week_end;

Teradata SQL help. Need help getting the start date and end date (yellow) of the most recent employment status. Thank you

Teradata SQL help. Need help getting the start date and end date (yellow) of the most recent employment status. Thank you. Click on the question for image.
There are several ways to get your expected result, based on your data you might apply Teradata's NORMALIZE option, a SQL extension to combine overlapping periods:
SELECT NAME, job_title, status, next(Begin(pd)) AS start_date, End(pd) AS end_date
FROM
( -- returns one group for consecutive overlapping rows
SELECT NORMALIZE
name,
job_title,
status,
-- need to subtract 1 to create a valid period
PERIOD(prior(start_date), end_date) AS pd
FROM tab
) AS dt
QUALIFY
-- return the latest row only
Row_Number()
Over(PARTITION BY name
ORDER BY start_date DESC) = 1
Caution : This returns a new group whenever name, job_title or status change.
Give this a try:
SELECT
t.name,
MAX(t.job_title) AS job_title,
status,
MIN(t.start_date) AS start_date,
MAX(t.end_date) AS end_date
FROM mytable t
GROUP BY t.name, t.status
QUALIFY RANK() OVER(PARTITION BY t.name ORDER BY t.start_date DESC) = 1
Teradata will do the aggregates first and then apply the window function. So, this will first get the MIN/MAX dates within each person's status and then assign a RANK to each of these rows based on the most recent start_date.
I don't have a system to test, but give it a try and let me know.

How to calculate the longest period in days that a company has gone without headcount change?

Given an employees table with the columns EmpID,FirstName,LastName,StartDate, and EndDate.
I want to use a query on Oracle to calculate the longest period in days that a company has gone without headcount change.
Here is my query:
select MAX(endDate-startDate)
from
(select endDate
from employees
where endDate is not null)
union all
(select startDate
from employees)
But I got an error:
ORA-00904:"STARTDATE":invalid identifier
How can I fix this error?
Is my query the correct answer to this question?
Thanks
You aren't returning the startDate in the sub-query. Add startDate to the inner query.
select MAX(endDate-startDate) from
(select startDate, endDate from employees where endDate is not null)
union all
(select startDate from employees)
EDIT:
You can also try this:
select MAX(endDate-startDate) from employees where endDate is not null
However, I don't think your query is what you're looking for as it only lists the longest term employee that no longer works at the company.
In a simplistic view, you would want to put together all the start-dates (when the headcount increases) and all the end-dates (when it decreases), combine them all, arrange them in increasing order, measure the differences between consecutive dates, and take the max.
"Put together" is a UNION ALL, and measure differences between "consecutive" dates can be done with the analytic function lag().
One complication: one employee may start exactly on the same date another is terminated, so the headcount doesn't change. More generally, on any given date there may be starts and ends, and you need to exclude the dates when there are an equal number of starts and ends. So the first part of the solution is more complicated: you need to group by date and compare the start and end counts.
Something like this may work (not tested!):
with d ( dt, flag ) as (
select start_date, 's' from employees union all
select end_date , 'e' from employees
),
prep ( int ) as
select dt - lag(dt) over (order by dt)
from d
group by dt
having count(case flag when 's' then 1 end) !=
count(case flag when 'e' then 1 end)
)
select max(int) as max_interval
from prep
;
Edit - Gordon has a good point in his solution: perhaps the longest period without a change in headcount is the current period (ending "now"). For this reason, one needs to add SYSDATE to the UNION ALL, like he did. It can be added with either flag (for example 's' to be specific).
I think the answer to your question is something like this:
select max(span)
from (select (lead(dte) over (order by dte) - dte) as span
from (select startDate as dte from employees union all
select endDate as dte from employees union all
select trunc(sysdate) from dual
) d
) d;
A head-count change (presumably) occurs when an employee starts or stops. Hence, you want the largest interval between two such adjacent dates.

Sql query to find date period between multiple rows

I have a table with three columns (City_Code | Start_Date | End_Date).
Suppose i have the following data:
New_York|01/01/1985|01/01/1987
Paris|02/01/1987|01/01/1990
San Francisco|02/01/1990|04/01/1990
Paris|05/01/1990|08/01/1990
New_York|09/01/1990|11/01/1990
New_York|12/01/1990|19/01/1990
New_York|20/01/1990|28/01/1990
I would like to get the date period for which someone lived in the last city of his residence. In this example that is New_York(09/01/1990-28/01/1990) using only sql. I can get this period by manipulating the data with java , but is it possible to do it with plain sql?
Thanks in advance
You can grab the first and last date of residence by city using this:
SELECT TOP 1 City_Code, MIN(Start_Date), Max(End_Date)
FROM Table
GROUP BY City_Code
ORDER BY Max(End_Date) desc
but, the problem is that the start date will be the first date of residence in the city in question.
For 10g you don't have the option of SELECT TOP n so you must be a little creative.
WITH last_period
AS
(SELECT city, moved_in, moved_out, NVL(moved_in-LEAD(moved_out, 1) OVER (ORDER BY city), 0) AS lead
FROM periods
WHERE city = (SELECT city FROM periods WHERE moved_out = (SELECT MAX(moved_out) FROM periods)))
SELECT city, MIN(moved_in) AS moved_in, MAX(moved_out) AS moved_out
FROM last_period
WHERE lead >= 0
GROUP BY city;
This works for the example dataset that you have given. It could stand some optimisation for a large dataset but gives you a working example, tested on Oracle 10g.
If it's MySQL, you can easily use
TIME_TO_SEC(TIMEDIFF(end_date, start_date)) AS `diff_in_secs`
Having time difference in seconds you go any further.
On SQL Server, couldn't you use:
SELECT TOP 1 City_Code, Start_Date + "-" + End_Date
FROM MyTable
ORDER BY enddate DESC
That would get the date period and city with the latest end date.
This is assuming you are trying to just find the city where the person most recently lived, formatted with a dash.
Given that this is Oracle, you can simply subtract the end date and start date to get the number of days in between.
Select City_Code, (End_Date - Start_Date) Days
From MyTable
Where Start_Date = (
Select Max( T1.Start_ Date )
From MyTable As T1
)
If you are using SQL Server you can use the DateDiff() function
DATEDIFF ( datepart , startdate , enddate )
http://msdn.microsoft.com/en-us/library/ms189794.aspx
EDIT
I don't know Oracle but I did find this article
SELECT
MAX(t.duration)
FROM (
SELECT
(End_Date - Start_Date) duration
From
Table
) as t
I hope this will work.
If you want to calculate only the last period length for the last city of residence, then it's probably something like this:
SELECT TOP 1
City_Code,
End_Date - Start_Date AS Days
FROM atable
ORDER BY Start_Date DESC
But if you mean to include all the periods the person has ever lived in a city that happens to be their last city of residence, then it's a bit more complicated, but not too much:
SELECT TOP 1
City_Code,
SUM(End_Date - Start_Date) AS Days
FROM atable
GROUP BY City_Code
ORDER BY MAX(Start_Date) DESC
But the above solution most probably returns the last city information only after it calculates the data for all cities. Do we need that? Not necessarily, so maybe we should use another approach. Maybe like this:
SELECT
City_Code,
SUM(End_Date - Start_Date) AS Days
FROM atable
WHERE City_Code = (SELECT TOP 1 City_Code FROM atable ORDER BY Start_Date DESC)
GROUP BY City_Code
i'm short on time - but this feels like you could use the window function LAG to compare to the previous row and retain the appropriate begin date from that row when the city changes, and dont change it when the city is the same - this should correctly preserve the range.