SQL First Time In Last Time Out (Transact-SQL..preferrably) - sql

I have the following table which contains all the time in and time out of people:
CREATE TABLE test (
timecardid INT
, trandate DATE
, employeeid INT
, trantime TIME
, Trantype VARCHAR(1)
, Projcode VARCHAR(3)
)
The task is to get all the earliest trantime with trantype A (perhaps using MIN) and the latest trantime with trantype Z (Using Max), all of which in that trandate (ie. trantype A for july 17 is 8:00 AM and trantype Z for july 17 is 7:00PM).
the problem is, the output should be in the same format as the table where it's coming from, meaning that I have to leave this data and filter out the rest (that aren't the earliest and latest in/out for that date, per employee)
My current solution is to use two different select commands to get all earliest, then get all the latest. then combine them both.
I was wondering though, is there a much simpler, single string solution?
Thank you very much.
EDIT (I apologize, here is the sample. Server is SQL Server 2008):
Timecardid | Trandate | employeeid | trantime | trantype | Projcode
1 2013-04-01 1 8:00:00 A SAMPLE1
2 2013-04-01 1 9:00:00 A SAMPLE1
3 2013-04-01 2 7:00:00 A SAMPLE1
4 2013-04-01 2 6:59:59 A SAMPLE1
5 2013-04-01 1 17:00:00 Z SAMPLE1
6 2013-04-01 1 17:19:00 Z SAMPLE1
7 2013-04-01 2 17:00:00 Z SAMPLE1
8 2013-04-02 1 8:00:00 A SAMPLE1
9 2013-04-02 1 9:00:00 A SAMPLE1
10 2013-04-02 2 7:00:58 A SAMPLE1
11 2013-04-02 2 18:00:00 Z SAMPLE1
12 2013-04-02 2 18:00:01 Z SAMPLE1
13 2013-04-02 1 20:00:00 Z SAMPLE1
Expected Results (the earliest in and the latest out per day, per employee, in a select command):
Timecardid | Trandate | employeeid | trantime | trantype | Projcode
1 2013-04-01 1 8:00:00 A SAMPLE1
4 2013-04-01 2 6:59:59 A SAMPLE1
6 2013-04-01 1 17:19:00 Z SAMPLE1
7 2013-04-01 2 17:00:00 Z SAMPLE1
8 2013-04-02 1 8:00:00 A SAMPLE1
10 2013-04-02 2 7:00:58 A SAMPLE1
12 2013-04-02 2 18:00:01 Z SAMPLE1
13 2013-04-02 1 20:00:00 Z SAMPLE1
Thank you very much

Perhaps this is what you're looking for:
select
t.*
from
test t
where
trantime in (
(select min(trantime) from test t1 where t1.trandate = t.trandate and trantype = 'A'),
(select max(trantime) from test t2 where t2.trandate = t.trandate and trantype = 'Z')
)
Changing my answer to account for the "per employee" requirement:
;WITH EarliestIn AS
(
SELECT trandate, employeeid, min(trantime) AS EarliestTimeIn
FROM test
WHERE trantype = 'A'
GROUP BY trandate, employeeid
),
LatestOut AS
(
SELECT trandate, employeeid, max(trantime) AS LatestTimeOut
FROM test
WHERE trantype = 'Z'
GROUP BY trandate, employeeid
)
SELECT *
FROM test t
WHERE
EXISTS (SELECT * FROM EarliestIn WHERE t.trandate = EarliestIn.trandate AND t.employeeid = EarliestIn.employeeid AND t.trantime = EarliestIn.EarliestTimeIn)
OR EXISTS (SELECT * FROM LatestOut WHERE t.trandate = LatestOut.trandate AND t.employeeid = LatestOut.employeeid AND t.trantime = LatestOut.LatestTimeOut)

Assuming timecardid column is PK or unique, and if I understand it correctly, I would do something like
DECLARE #date DATE
SET #date = '2013-07-01'
SELECT
T0.*
FROM
(SELECT DISTINCT employeeid FROM test) E
CROSS APPLY (
SELECT TOP 1
T.timecardid
FROM
test T
WHERE
T.trandate = #date
AND T.Trantype = 'A'
AND T.employeeid = E.employeeid
ORDER BY T.trantime
UNION ALL
SELECT TOP 1
T.timecardid
FROM
test T
WHERE
T.trandate = #date
AND T.Trantype = 'Z'
AND T.employeeid = E.employeeid
ORDER BY T.trantime DESC
) V
JOIN test T0 ON T0.timecardid = V.timecardid
Appropriate indexes should be set for the table, if you aware of performance.

If you're using SQL server 2012, you can use LAG/LEAD to find the max and min rows in a fairly concise way;
WITH cte AS (
SELECT *,
LAG(timecardid) OVER (PARTITION BY trandate,employeeid,trantype ORDER BY trantime) lagid,
LEAD(timecardid) OVER (PARTITION BY trandate,employeeid,trantype ORDER BY trantime) leadid
FROM test
)
SELECT timecardid,trandate,employeeid,trantime,trantype,projcode
FROM cte
WHERE trantype='A' AND lagid IS NULL
OR trantype='Z' AND leadid IS NULL;
An SQLfiddle to test with.

I would use ROW_NUMBER to sort out the rows you want to select:
;with Ordered as (
select *,
ROW_NUMBER() OVER (PARTITION BY Trandate,employeeid,trantype
ORDER BY trantime ASC) as rnEarly,
ROW_NUMBER() OVER (PARTITION BY Trandate,employeeid,trantype
ORDER BY trantime DESC) as rnLate
from
Test
)
select * from Ordered
where
(rnEarly = 1 and trantype='A') or
(rnLate = 1 and trantype='Z')
order by TimecardId
(SQLFiddle)
It produces the results you've requested, and I think it's quite readable. The reason that trantype is included in the PARTITION BY clauses is so that A and Z values receive separate numbering.

Related

Get max date for each from either of 2 columns

I have a table like below
AID BID CDate
-----------------------------------------------------
1 2 2018-11-01 00:00:00.000
8 1 2018-11-08 00:00:00.000
1 3 2018-11-09 00:00:00.000
7 1 2018-11-15 00:00:00.000
6 1 2018-12-24 00:00:00.000
2 5 2018-11-02 00:00:00.000
2 7 2018-12-15 00:00:00.000
And I am trying to get a result set as follows
ID MaxDate
-------------------
1 2018-12-24 00:00:00.000
2 2018-12-15 00:00:00.000
Each value in the id columns(AID,BID) should return the max of CDate .
ex: in the case of 1, its max CDate is 2018-12-24 00:00:00.000 (here 1 appears under BID)
in the case of 2 , max date is 2018-12-15 00:00:00.000 . (here 2 is under AID)
I tried the following.
1.
select
g.AID,g.BID,
max(g.CDate) as 'LastDate'
from dbo.TT g
inner join
(select AID,BID,max(CDate) as maxdate
from dbo.TT
group by AID,BID)a
on (a.AID=g.AID or a.BID=g.BID)
and a.maxdate=g.CDate
group by g.AID,g.BID
and 2.
SELECT
AID,
CDate
FROM (
SELECT
*,
max_date = MAX(CDate) OVER (PARTITION BY [AID])
FROM dbo.TT
) AS s
WHERE CDate= max_date
Please suggest a 3rd solution.
You can assemble the data in a table expression first, and the compute the max for each value is simple. For example:
select
id, max(cdate)
from (
select aid as id, cdate from t
union all
select bid, cdate from t
) x
group by id
You seem to only care about values that are in both columns. If this interpretation is correct, then:
select id, max(cdate)
from ((select aid as id, cdate, 1 as is_a, 0 as is_b
from t
) union all
(select bid as id, cdate, 1 as is_a, 0 as is_b
from t
)
) ab
group by id
having max(is_a) = 1 and max(is_b) = 1;

Subtract subsequent row from previous row based on User

I have the following data and I want to subtract current row from previous row based on the UserID. I tried the code below is not given me what I want
DECLARE #DATETBLE TABLE (UserID INT, Dates DATE)
INSERT INTO #DATETBLE VALUES
(1,'2018-01-01'), (1,'2018-01-02'), (1,'2018-01-03'),(1,'2018-01-13'),
(2,'2018-01-15'),(2,'2018-01-16'),(2,'2018-01-17'), (5,'2018-02-04'),
(5,'2018-02-05'),(5,'2018-02-06'),(5,'2018-02-11'), (5,'2018-02-17')
;with cte as (
select UserID,Dates, row_number() over (order by UserID) as seqnum
from #DATETBLE t
)
select t.UserID,t.Dates, datediff(day,tprev.Dates,t.Dates)as diff
from cte t left outer join
cte tprev
on t.seqnum = tprev.seqnum + 1;
Current Output
UserID Dates diff
1 2018-01-01 NULL
1 2018-01-02 1
1 2018-01-03 1
1 2018-01-13 10
2 2018-01-15 2
2 2018-01-16 1
2 2018-01-17 1
5 2018-02-04 18
5 2018-02-05 1
5 2018-02-06 1
5 2018-02-11 5
5 2018-02-17 6
My Expected Output
UserID Dates diff
1 2018-01-01 NULL
1 2018-01-02 1
1 2018-01-03 1
1 2018-01-13 10
2 2018-01-15 NULL
2 2018-01-16 1
2 2018-01-17 1
5 2018-02-04 NULL
5 2018-02-05 1
5 2018-02-06 1
5 2018-02-11 5
5 2018-02-17 6
Your tag (sql-server-2008) suggests me to use APPLY :
select t.userid, t.dates, datediff(day, t1.dates, t.dates) as diff
from #DATETBLE t outer apply
( select top (1) t1.*
from #DATETBLE t1
where t1.userid = t.userid and
t1.dates < t.dates
order by t1.dates desc
) t1;
If you have SQL Server version 2012 or higher, you could use LAG() with a partition by UserID:
SELECT UserID
, DATEDIFF(dd,COALESCE(LAG_DATES, Dates), Dates) as diff
FROM
(
SELECT UserID
, Dates
, LAG(Dates) OVER (PARTITION BY UserID ORDER BY Dates) as LAG_DATES
FROM #DATETBLE
) exp
This will give you a 0 value instead of a NULL value for the first date in the sequence though.
Since you tagged the post with SQL Server 2008, however, you may need to use a method that doesn't rely on this windowed function.

SQL - Find if column dates include at least partially a date range

I need to create a report and I am struggling with the SQL script.
The table I want to query is a company_status_history table which has entries like the following (the ones that I can't figure out)
Table company_status_history
Columns:
| id | company_id | status_id | effective_date |
Data:
| 1 | 10 | 1 | 2016-12-30 00:00:00.000 |
| 2 | 10 | 5 | 2017-02-04 00:00:00.000 |
| 3 | 11 | 5 | 2017-06-05 00:00:00.000 |
| 4 | 11 | 1 | 2018-04-30 00:00:00.000 |
I want to answer to the question "Get all companies that have been at least for some point in status 1 inside the time period 01/01/2017 - 31/12/2017"
Above are the cases that I don't know how to handle since I need to add some logic of type :
"If this row is status 1 and it's date is before the date range check the next row if it has a date inside the date range."
"If this row is status 1 and it's date is after the date range check the row before if it has a date inside the date range."
I think this can be handled as a gaps and islands problem. Consider the following input data: (same as sample data of OP plus two additional rows)
id company_id status_id effective_date
-------------------------------------------
1 10 1 2016-12-15
2 10 1 2016-12-30
3 10 5 2017-02-04
4 10 4 2017-02-08
5 11 5 2017-06-05
6 11 1 2018-04-30
You can use the following query:
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
ORDER BY company_id, effective_date
to get:
id company_id status_id effective_date grp
-----------------------------------------------
1 10 1 2016-12-15 0
2 10 1 2016-12-30 1
3 10 5 2017-02-04 2
4 10 4 2017-02-08 2
5 11 5 2017-06-05 0
6 11 1 2018-04-30 0
Now you can identify status = 1 islands using:
;WITH CTE AS
(
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
)
SELECT id, company_id, status_id, effective_date,
ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) -
cnt AS grp
FROM CTE
Output:
id company_id status_id effective_date grp
-----------------------------------------------
1 10 1 2016-12-15 1
2 10 1 2016-12-30 1
3 10 5 2017-02-04 1
4 10 4 2017-02-08 2
5 11 5 2017-06-05 1
6 11 1 2018-04-30 2
Calculated field grp will help us identify those islands:
;WITH CTE AS
(
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
), CTE2 AS
(
SELECT id, company_id, status_id, effective_date,
ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) -
cnt AS grp
FROM CTE
)
SELECT company_id,
MIN(effective_date) AS start_date,
CASE
WHEN COUNT(*) > 1 THEN DATEADD(DAY, -1, MAX(effective_date))
ELSE MIN(effective_date)
END AS end_date
FROM CTE2
GROUP BY company_id, grp
HAVING COUNT(CASE WHEN status_id = 1 THEN 1 END) > 0
Output:
company_id start_date end_date
-----------------------------------
10 2016-12-15 2017-02-03
11 2018-04-30 2018-04-30
All you want know is those records from above that overlap with the specified interval.
Demo here with somewhat more complicated use case.
Maybe this is what you are looking for? For these kind of questions, you need to join two instance of your table, in this case I am just joining with next record by Id, which probably is not totally correct. To do it better, you can create a new Id using a windowed function like row_number, ordering the table by your requirement criteria
If this row is status 1 and it's date is before the date range check
the next row if it has a date inside the date range
declare #range_st date = '2017-01-01'
declare #range_en date = '2017-12-31'
select
case
when csh1.status_id=1 and csh1.effective_date<#range_st
then
case
when csh2.effective_date between #range_st and #range_en then true
else false
end
else NULL
end
from company_status_history csh1
left join company_status_history csh2
on csh1.id=csh2.id+1
Implementing second criteria:
"If this row is status 1 and it's date is after the date range check
the row before if it has a date inside the date range."
declare #range_st date = '2017-01-01'
declare #range_en date = '2017-12-31'
select
case
when csh1.status_id=1 and csh1.effective_date<#range_st
then
case
when csh2.effective_date between #range_st and #range_en then true
else false
end
when csh1.status_id=1 and csh1.effective_date>#range_en
then
case
when csh3.effective_date between #range_st and #range_en then true
else false
end
else null -- ¿?
end
from company_status_history csh1
left join company_status_history csh2
on csh1.id=csh2.id+1
left join company_status_history csh3
on csh1.id=csh3.id-1
I would suggest the use of a cte and the window functions ROW_NUMBER. With this you can find the desired records. An example:
DECLARE #t TABLE(
id INT
,company_id INT
,status_id INT
,effective_date DATETIME
)
INSERT INTO #t VALUES
(1, 10, 1, '2016-12-30 00:00:00.000')
,(2, 10, 5, '2017-02-04 00:00:00.000')
,(3, 11, 5, '2017-06-05 00:00:00.000')
,(4, 11, 1, '2018-04-30 00:00:00.000')
DECLARE #StartDate DATETIME = '2017-01-01';
DECLARE #EndDate DATETIME = '2017-12-31';
WITH cte AS(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) AS rn
FROM #t
),
cteLeadLag AS(
SELECT c.*, ISNULL(c2.effective_date, c.effective_date) LagEffective, ISNULL(c3.effective_date, c.effective_date)LeadEffective
FROM cte c
LEFT JOIN cte c2 ON c2.company_id = c.company_id AND c2.rn = c.rn-1
LEFT JOIN cte c3 ON c3.company_id = c.company_id AND c3.rn = c.rn+1
)
SELECT 'Included' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date BETWEEN #StartDate AND #EndDate
UNION ALL
SELECT 'Following' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date > #EndDate
AND LagEffective BETWEEN #StartDate AND #EndDate
UNION ALL
SELECT 'Trailing' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date < #EndDate
AND LeadEffective BETWEEN #StartDate AND #EndDate
I first select all records with their leading and lagging Dates and then I perform your checks on the inclusion in the desired timespan.
Try with this, self-explanatory. Responds to this part of your question:
I want to answer to the question "Get all companies that have been at
least for some point in status 1 inside the time period 01/01/2017 -
31/12/2017"
Case that you want to find those id's that have been in any moment in status 1 and have records in the period requested:
SELECT *
FROM company_status_history
WHERE id IN
( SELECT Id
FROM company_status_history
WHERE status_id=1 )
AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'
Case that you want to find id's in status 1 and inside the period:
SELECT *
FROM company_status_history
WHERE status_id=1
AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'

MsSql Compare specific datetimes in sequence based on ID

I have a table where we store our data from a call and it looks like this:
CallID Arrive_Seq DateTime ActivitytypeID
1 1 2018-01-01 05:00:00 1
1 2 2018-01-01 05:00:01 2
1 3 2018-01-01 06:00:00 21
1 4 2018-01-01 06:00:01 28
1 5 2018-01-01 06:00:02 13
1 6 2018-01-01 06:00:03 22
1 7 2018-01-01 06:00:05 29
1 8 2018-01-01 06:05:00 21
1 9 2018-01-01 06:05:01 28
1 10 2018-01-01 06:05:02 13
1 11 2018-01-01 06:05:03 22
1 12 2018-01-01 06:07:45 29
Now I want to select the datediff between ActivitytypeID 21 and 29 in the arrive_sew order. In this example they occur twice (on arrive_seq 3,8 and 7,12). This order is not specific and ActivitytypeID can occur both more and less times in the sequence but they are always connected with eachother. Think of it as ActivitytypeID 21 = 'call started' AND ActivitytypeID = 29 'Call ended'.
In the example the answer whould be:
SELECT DATEDIFF (SECOND, '2018-01-01 06:00:00', '2018-01-01 06:00:05') = 5 -- Compares datetime of arrive_seq 3 and 7
AND
SELECT DATEDIFF (SECOND, '2018-01-01 06:00:05', '2018-01-01 06:07:45') = 460 -- Compares datetime of arrive_seq 21 and 29
Total duration = 465
I have tried with this code but it doesn't work all the time due to row# can change based on arrive_seq and ActivitytypeID
;WITH CallbackDuration AS (
SELECT ROW_NUMBER() OVER(ORDER BY a.time_stamp ASC) AS RowNumber, DATEDIFF(second, a.time_stamp, b.time_stamp) AS 'Duration'
FROM Table a
JOIN Table b on a.call_id = b.call_id
WHERE a.call_id = 1 AND a.activity_type = 21 AND b.activity_type = 29
GROUP BY a.time_stamp, b.time_stamp,a.call_id)
SELECT SUM(Duration) AS 'Duration' FROM CallbackDuration WHERE RowNumber in (1,5,9)
I think this is what you want:
select
call_start,
call_end,
datediff (second, call_start, call_end) as duration
from
(
select
call_timestamp as call_end,
lag(call_timestamp) over (partition by call_id order by call_timestamp) as call_start,
activity_type as call_end_activity,
lag (activity_type) over (partition by call_id order by call_timestamp) as call_start_activity
from
call_log
where
activity_type in (21, 29)
) x
where
call_start_activity = 21;
Result:
call_start call_end duration
----------------------- ----------------------- -----------
2018-01-01 06:00:00.000 2018-01-01 06:00:05.000 5
2018-01-01 06:05:00.000 2018-01-01 06:07:45.000 165
(2 rows affected)
Note that the time of the second call is based on your sample data with start time 2018-01-01 06:05:00
This query seems to return your expected result
declare #x int = 21
declare #y int = 29
;with cte(CallID, Arrive_Seq, DateTime, ActivitytypeID) as (
select
a, b, cast(c as datetime), d
from (values
(1,1,'2018-01-01 05:00:00',1)
,(1,2,'2018-01-01 05:00:01',2)
,(1,3,'2018-01-01 06:00:00',21)
,(1,4,'2018-01-01 06:00:01',28)
,(1,5,'2018-01-01 06:00:02',13)
,(1,6,'2018-01-01 06:00:03',22)
,(1,7,'2018-01-01 06:00:05',29)
,(1,8,'2018-01-01 06:05:00',21)
,(1,9,'2018-01-01 06:05:01',28)
,(1,10,'2018-01-01 06:05:02',13)
,(1,11,'2018-01-01 06:05:03',22)
,(1,12,'2018-01-01 06:07:45',29)
) t(a,b,c,d)
)
select
sum(ss)
from (
select
*, ss = datediff(ss, DateTime, lead(datetime) over (order by Arrive_Seq))
, rn = row_number() over (order by Arrive_Seq)
from
cte
where
ActivitytypeID in (#x, #y)
) t
where
rn % 2 = 1

How to get single result when a column has the same value but the second column have different value

I have this table
Id VendorId ClaimRequestDate
1 5 2017-12-14 00:00:00.000
2 5 2018-02-02 00:00:00.000
7 5 2018-02-07 11:08:25.257
I want my result to show only the latest date for each VendorId starting from date later than 2 Feb 2018
what I've done now
SELECT DISTINCT
[Project1].[Id] AS [Id],
[Project1].[VendorId] AS [VendorId],
[Project1].[ClaimRequestDate] AS [ClaimRequestDate]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[VendorId] AS [VendorId],
[Extent1].[ClaimRequestDate] AS [ClaimRequestDate]
FROM [dbo].[Claim] AS [Extent1]
WHERE [Extent1].[ClaimRequestDate] >= '2018-02-02 00:00:00.000'
) AS [Project1]
ORDER BY [Project1].[ClaimRequestDate] DESC
But my result is
Id VendorId ClaimRequestDate
7 5 2018-02-07 11:08:25.257
2 5 2018-02-02 00:00:00.000
Can someone help me with this
There are tree problem in your query. one is
WHERE [Extent1].[ClaimRequestDate] >= '2018-02-02 00:00:00.000'
Row is >= should be ">" second one is your query gain all rows from date later than 2018-02-02 If a vendorId has more than a value Query would return you can try this
SELECT * FROM Claim c
where ClaimRequestDate IN (select MAX(ClaimRequestDate) from claim c1
where c.vendorId =c1.vendorId and c1.Claimrequestdate >'2018.02.02')
Third is this query when your vendorId has more than same max(Claimrequestdate) would return all of them
Id VendorId ClaimRequestDate
1 5 2017-12-14 00:00:00.000
2 5 2018-02-02 00:00:00.000
7 5 2018-02-07 11:08:25.257
8 5 2018-02-07 11:08:25.257
returns
Id VendorId ClaimRequestDate
7 5 2018-02-07 11:08:25.257
8 5 2018-02-07 11:08:25.257
For these reason I suggest this query for use
SELECT * FROM Claim c
where CAST(ClaimRequestDate AS VARCHAR)+ CAST(ID AS VARCHAR) IN (select
MAX(CAST(ClaimRequestDate AS VARCHAR)+ CAST(ID AS VARCHAR)) from claim c1
where c.vendorId =c1.vendorId and c1.Claimrequestdate >'2018.02.02'
)
Try the following SQL:
select aa.* from [Claim] as aa inner join
(
select [VendorId], max([Id]) as maxId from [Claim]
where [ClaimRequestDate] >= '2018-02-03 00:00:00'
group by [VendorId]
) as bb on aa.[Id] = bb.[maxId]