Get the aggregated result of a GROUP BY for each value on WHERE clause in TSQL - sql

I have a table in SQL Server with the following format
MType (Integer), MDate (Datetime), Status (SmallInt)
1, 10-05-2018, 1
1, 15-05-2018, 1
2, 25-3-2018, 0
3, 12-01-2018, 1
....
I want to get the MIN MDate for specific MTypes for future dates. In case there isn't one, then the MType should be returned but with NULL value.
Here is what I have done until now:
SELECT m.MType,
MIN(m.MDate)
FROM MyTypes m
WHERE m.MType IN ( 1, 2, 3, 4)
AND m.MDate > GETDATE()
AND m.Status = 1
GROUP BY m.MType
Obviously, the above will return only the following:
1, 10-05-2018
Since there are any other rows with future date and status equals to 1.
However, the results I want are:
1, 10-05-2018
2, NULL
3, NULL
4, NULL //this is missing in general from the table. No MType with value 4
The table is big, so performance is something to take into account. Any ideas how to proceed?

One way is to join the table to itself and filter the date in the ON clause.
SELECT a.Mtype, MIN(b.MDate)
FROM MyTypes a
LEFT JOIN MyTypes b
ON a.MType = b.MType
AND b.MDate > GETDATE()
AND b.Status = 1
WHERE a.MType IN ( 1, 2, 3)
GROUP BY a.MType
Here's a Demo.

I don't know what is logic behind but it seems to use of look-up tables
SELECT a.MType, l.MDate
FROM
(
values (1),(2),(3),(4)
)a (MType)
LEFT JOIN (
SELECT m.MType,
MIN(m.MDate) MDate
FROM MyTypes m
WHERE m.MDate > GETDATE()
AND m.Status = 1
GROUP BY m.MType
)l on l.MType = a.MType

Use a windows function and a union to a numbers table:
declare #t table (MType int, MDate datetime, [Status] smallint)
Insert into #t values (1, convert(date, '10-05-2018', 103), 1)
,(1, convert(date, '15-05-2018', 103), 1)
,(2, convert(date, '25-03-2018', 103), 0)
,(3, convert(date, '12-01-2018', 103), 1)
Select DISTINCT Mtype
, min(iiF(MDate>getdate() and status = 1, MDate, NUll)) over (Partition By Mtype) as MDate
from ( SELECT TOP 10000 row_number() over(order by t1.number) as MType
, '1900-01-01' as MDate, 0 as [Status]
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
union
Select Mtype, MDate, [Status] from #t
) x
where MType in (1,2,3,4)
order by x.MType

Related

Group records only if it have intersected periods

I have table like this
declare #data table
(
id int not null,
groupid int not null,
startDate datetime not null,
endDate datetime not null
)
insert into #data values
(1, 1, '20150101', '20150131'),
(2, 1, '20150114', '20150131'),
(3, 1, '20150201', '20150228');
and my current selecting statement is:
select groupid, 'some data', min(id), count(*)
from #data
group by groupid
But now I need to group records if it have intersected periods
desired result:
1, 'some data', 1, 2
1, 'some data', 3, 1
Is someone know how to do this?
One method is to identify the beginning of each group -- because it doesn't overlap with the previous one. Then, count the number of these as a group identifier.
with overlaps as (
select id
from #data d
where not exists (select 1
from #data d2
where d.groupid = d2.groupid and
d.startDate >= d2.startDate and
d.startDate < d2.endDate
)
),
groups as (
select d.*,
count(o.id) over (partition by groupid
order by d.startDate) as grpnum
from #data d left join
overlaps o
on d.id = o.id
)
select groupid, min(id), count(*),
min(startDate) as startDate, max(endDate) as endDate
from groups
group by grpnum, groupid;
Notes: This is using cumulative counts, which are available in SQL Server 2012+. You can do something similar with a correlated subquery or apply in earlier versions.
Also, this query assumes that the start dates are unique. If they are not, the query can be tweaked, but the logic becomes a bit more complicated.

SQL Last activity of given type

So I have a Visitor table, and a Visitor_activity table. Say:
Visitor
Visitor_ID Int
Visitor_name varchar(20)
Visitor_Activity
ID Int
Visitor_ID Int
Activity_Type char(3) -- values IN or OUT
Activity_Time datetime
Visitors might sign in and out multiple times in a day.
I'd like a nice query to tell me all visitors who are in: i.e. the last activity for today (on activity_time) was an "IN" not an "OUT". Any advice much appreciated.
It's T-SQL by the way, but I think it's more of an in-principle question.
One way to solve this is to use a correlated not exists predicate:
select Activity_Time, Visitor_ID
from Visitor_Activity t1
where Activity_Type = 'IN'
and not exists (
select 1
from Visitor_Activity
where Activity_Type = 'OUT'
and Visitor_ID = t1.Visitor_ID
and Activity_Time > t1.Activity_Time
and cast(Activity_Time as date) = cast(t1.Activity_Time as date)
)
This basically says get all visitor_id that have type = IN for which there doesn't exists any type = OUT record with a later time (on the same date).
Sample SQL Fiddle
SELECT
v.*
FROM
Visitors v
JOIN Visitor_Activity va ON va.Visitor_ID = v.Visitor_ID
WHERE
va.Activity_Type = 'IN'
AND NOT EXISTS ( SELECT
*
FROM
Visitor_Activity va_out
WHERE
va_out.Visitor_ID = va.Visitor_ID
AND va_out.Activity_Type = 'OUT'
AND va_out.Activity_Time > va.Activity_Time )
with visitorsInOut as (
select Visitor_id,
max(case when Activity_Type = 'in' then Activity_Time else null end) inTime,
max(case when Activity_Type = 'out' then Activity_Time else null end) outTime
from Visitor_Activity
where datediff(dd, Activity_Time, getdate()) = 0
group by Visitor_id)
select Visitor_id
from visitorsInOut
where inTime > outTime or outTime is null
This uses a CTE to find the activity record with the greatest Activity_Time where the Activity_Type = 'IN' and assigns it RowNum 1. Then you can INNER JOIN the CTE to the Visitor table, filtering by the CTE results where RowNum = 1.
; WITH VisAct AS(
SELECT act.Visitor_ID
, ROW_NUMBER() OVER(PARTITION BY Visitor_ID ORDER BY Activity_Time DESC) AS RowNum
FROM Visitor_Activity act
WHERE act.Activity_Type = 'IN'
AND act.Activity_Time >= CAST(GETDATE() AS DATE)
)
SELECT vis.Visitor_ID, vis.Visitor_name
FROM Visitor vis
INNER JOIN VisAct act
ON act.Visitor_ID = vis.Visitor_ID
WHERE act.Row_Num = 1
You can pull the most recent action for each visitor, and then only return those where the last action for today was to check in.
SELECT v.Visitor_ID, v.Visitor_Name, va.Activity_Type, va.Activity_Time
FROM Visitor AS v
INNER JOIN (SELECT Visitor_ID, Activity_Type, Activity_Time, RANK() OVER (PARTITION BY Visitor_ID ORDER BY Activity_Time DESC) AS LastAction
FROM Visitor_Activity
-- checks for today, can be omitted if you still want
-- to see someone checked in from yesterday
WHERE DATEDIFF(d, 0, Activity_Time) = DATEDIFF(d, 0, getdate())
) AS va ON va.Visitor_ID = v.Visitor_ID
WHERE LastAction = 1
AND Activity_Type = 'IN'
With CROSS APPLY:
DECLARE #d DATE = '20150320'
DECLARE #v TABLE
(
visitor_id INT ,
visitor_name NVARCHAR(MAX)
)
DECLARE #a TABLE
(
visitor_id INT ,
type CHAR(3) ,
time DATETIME
)
INSERT INTO #v
VALUES ( 1, 'A' ),
( 2, 'B' ),
( 3, 'C' )
INSERT INTO #a
VALUES ( 1, 'in', '2015-03-20 19:32:27.513' ),
( 1, 'out', '2015-03-20 19:32:27.514' ),
( 1, 'in', '2015-03-20 19:32:27.515' ),
( 2, 'in', '2015-03-20 19:32:27.516' ),
( 2, 'out', '2015-03-20 19:32:27.517' ),
( 3, 'in', '2015-03-20 19:32:27.518' ),
( 3, 'out', '2015-03-20 19:32:27.519' ),
( 3, 'in', '2015-03-20 19:32:27.523' )
SELECT *
FROM #v v
CROSS APPLY ( SELECT *
FROM ( SELECT TOP 1
type
FROM #a a
WHERE a.visitor_id = v.visitor_id
AND a.time >= #d
AND a.time < DATEADD(dd, 1, #d)
ORDER BY time DESC
) i
WHERE type = 'in'
) c
Output:
visitor_id visitor_name type
1 A in
3 C in
The principle:
First you are selecting all visitors.
Then you are applying to visitor last activity
SELECT TOP 1
type
FROM #a a
WHERE a.visitor_id = v.visitor_id
AND a.time >= #d
AND a.time < DATEADD(dd, 1, #d)
ORDER BY time DESC
Then you are selecting from previous step in order to get empty set which will filter out visitors whose last activity was not 'in'. If last activity was 'in' you get one row in result and thus applying works. If last activity is 'out' then outer query will result in empty set, and by design CROSS APPLY will eliminate such visitor.

For every row with data I need a row for each category

I have timesheet data that I need to create a report for by date range. I need to have a row for each person for each day, and each time type. If there's no entry for that time type on a given day, i want null data. I've tried a left join, but it doesn't seem to be working. A cross join will give erroneous data.
The tables I have are a Person table (personID, Name), a TimeLog table (TimeLogID, StartDate, EndDate, TimeLogTypeID), and a TimeLogType table (TimeLogTypeID, PersonID, Description, DeletedInd)
All I can get in the result set is the rows with data, and not the empty rows for each TimeLogType
Here's what I have so far:
DECLARE
#startDate DATE,
#endDate DATE
SET #startDate = '2014-05-01'
SET #endDate = '2014-05-30'
SELECT
CONVERT(DATE, TimeLog.StartDateTime, 101) AS TimeLogDay,
SUM(dbo.fnCalculateHoursAsDecimal(TimeLog.StartDateTime, TimeLog.EndDateTime)) AS Hours,
TimeLog.PersonID,
TimeLog.TimeLogTypeID
INTO #HourTable
FROM
TimeLog
WHERE
TimeLog.StartDateTime BETWEEN #startDate AND #endDate
GROUP BY
CONVERT(DATE, TimeLog.StartDateTime, 101),
TimeLog.TimeLogTypeID,
TimeLog.PersonID
SELECT
TimeLogType.Description,
#HourTable.*
FROM
TimeLogType LEFT JOIN
#HourTable ON TimeLogType.TimeLogTypeID = #HourTable.TimeLogTypeID
WHERE
ISNULL(TimeLogType.DeletedInd, 0) = 0
ORDER BY
PersonID, TimeLogDay, TimeLogType.TimeLogTypeID
The data goes something like this:
TimeLogType:
1, Billable
2, Non-Billable
Person:
1, Billy
2, Tom
TimeLog:
1, 1, 2014-05-01 08:00:00, 2014-05-01 09:00:00, 1, 0
2, 1, 2014-05-01 09:00:00, 2014-05-01 10:00:00, 1, 0
3, 2, 2014-05-01 08:00:00, 2014-05-01 08:30:00, 2, 0
4, 2, 2014-05-01 08:30:00, 2014-05-01 09:00:00, 1, 0
5, 1, 2014-05-02 08:00:00, 2014-05-02 09:00:00, 2, 0
Expected Output: (order by person, date, timelog type)
Day, Person, Bill Type, Total Hours
2014-05-01, Billy, Billiable, 2.0
2014-05-01, Billy, Non-Billiable, NULL
2014-05-02, Billy, Billiable, 1.0
2014-05-02, Billy, Non-Billiable, NULL
etc...
2014-05-01, Tom, Billiable, 0.5
2014-05-01, Tom, Non-Billiable, 0.5
etc...
You need to generate all the combinations first and then use left join to bring in the information you want. I think the query is like this:
with dates as (
select dateadd(day, number - 1, mind) as thedate
from (select min(StartDate) as mind, max(EndDate) as endd
from TimeLogType
) tlt join
master..spt_values v
on dateadd(day, v.number, mind) <= tlt.endd
)
select p.PersonId, tlt.TimeLogTypeId, d.thedate,
from Person p cross join
(select tlt.* from TimeLogType tlt where ISNULL(TimeLogType.DeletedInd, 0) = 0
) tlt cross join
date d left join
TimeLog tl
on tl.Person_id = p.PersonId and tl.TimeLogTypeId = tlt.TimeLogTypeId and
d.thedate >= tl.StartDate and d.thedate <= tl.EndDate
After reading Gordon's answer here's what I've come up with. I created it in steps so I could see what was going on. I created the dates w/o the master..spt_values table. I also created a temp table of people so I could select just the ones that had a TimeLogRecord, and then re-use it to pull in details for the final select. Let me know if there's any way to make this run faster.
DECLARE
#startDate DATE,
#endDate DATE
SET #startDate = '2014-01-01'
SET #endDate = '2014-01-31'
-- create day rows --
;WITH dates(TimeLogDay) AS
(
SELECT #startDate AS TimeLogDay
UNION ALL
SELECT DATEADD(d, 1, TimeLogDay)
FROM dates
WHERE TimeLogDay < #enddate
)
-- create a type row for each day --
SELECT
dates.TimeLogDay,
tlt.TimeLogTypeID
INTO #TypeDate
FROM
dates CROSS JOIN
(SELECT
TimeLogType.TimeLogTypeID
FROM
TimeLogType
WHERE
ISNULL(TimeLogType.DeletedInd, 0) = 0
) AS TLT
-- create a temp person table for referance later ---
SELECT * INTO #person FROM Person WHERE Person.personID IN
(SELECT Timelog.PersonID FROM TimeLog WHERE TimeLog.StartDateTime BETWEEN #startDate AND #endDate)
-- sum up the log times and tie in the date/type rows --
SELECT
#TypeDate.TimeLogDay,
#TypeDate.TimeLogTypeID,
#person.PersonID,
SUM(dbo.fnCalculateHoursAsDecimal(TimeLog.StartDateTime, TimeLog.EndDateTime)) AS Hours
INTO #Hours
FROM
#person CROSS JOIN
#TypeDate LEFT JOIN
TimeLog ON
TimeLog.PersonID = #person.PersonID AND
TimeLog.TimeLogTypeID = #TypeDate.TimeLogTypeID AND
#TypeDate.TimeLogDay = CONVERT(DATE, TimeLog.StartDateTime, 101)
GROUP BY
#TypeDate.TimeLogDay,
#TypeDate.TimeLogTypeID,
#person.PersonID
-- now tie in the details to complete --
SELECT
#Hours.TimeLogDay,
TimeLogType.Description,
Person.LastName,
Person.FirstName,
#Hours.Hours
FROM
#Hours LEFT JOIN
Person ON #Hours.PersonID = Person.PersonID LEFT JOIN
TimeLogType ON #Hours.TimeLogTypeID = TimeLogType.TimeLogTypeID
ORDER BY
Person.FirstName,
Person.LastName,
#Hours.TimeLogDay,
TimeLogType.SortOrder

Only select records that do not start within the time frame of another record

I'm trying to achieve the following goal using MS SQL Server 2005 but do not know how to do it.
The goal is to select only records that do not start within the same time period as an anchor record.
Rows that have same ID are a group and evaluated as part of that group.
Start with the earliest date (A) based on StartDate, compare to the next row (B) that has the same ID.
If B starts within A, mark B as invalid. Continue to compare A against all remaining records that have the same ID. Mark any starting within A as invalid.
Flag the next record that does not overlap with A as Valid. Now repeat the same process as above (i.e. check to see if any subsequent records start within the time frame of the new valid record).
Repeat this process until all records have been analyzed.
Example: Create the following table.
if object_id ('tempdb..#Dates') is not null drop table #Dates
create table #Dates (ID int, StartDate datetime, EndDate datetime)
Insert into #Dates
Select 1, '7/23/2003' , '8/22/2003' union all
select 1, '8/21/2003' , '11/19/2003' union all
select 1, '11/18/2003' , '12/18/2003' union all
select 1, '12/17/2003' , '1/16/2004' union all
select 1, '1/15/2004' , '2/14/2004' union all
select 1, '2/11/2004' , '2/26/2004' union all
select 1, '9/14/2004' , '10/14/2004' union all
select 1, '10/5/2004' , '10/20/2004' union all
select 1, '11/20/2004' , '12/20/2004' union all
select 1, '12/19/2004' , '1/18/2005' union all
select 1, '1/12/2005' , '1/27/2005' union all
select 1, '2/27/2005' , '3/11/2005'
Expected output after applying the overlap logic rules:
ID StartDate EndDate Valid
-- --------- --------- -----
1 7/23/2003 8/22/2003 1
1 8/21/2003 11/19/2003 0
1 11/18/2003 12/18/2003 1
1 12/17/2003 1/16/2004 0
1 1/15/2004 2/14/2004 1
1 2/11/2004 2/26/2004 0
1 9/14/2004 10/14/2004 1
1 10/5/2004 10/20/2004 0
1 11/20/2004 12/20/2004 1
1 12/19/2004 1/18/2005 0
1 1/12/2005 1/27/2005 1
1 2/27/2005 3/11/2005 1
I figured out how to answer my own question. Used recursive SQL after ordering the records using row_number.
if object_id ('tempdb..#Dates') is not null drop table #Dates
create table #Dates (ID int, StartDate datetime, EndDate datetime)
Insert into #Dates
Select 1, '7/23/2003' , '8/22/2003' union all
select 1, '8/21/2003' , '11/19/2003' union all
select 1, '11/18/2003' , '12/18/2003' union all
select 1, '12/19/2004' , '1/18/2005' union all
select 1, '1/12/2005' , '1/27/2005' union all
select 1, '2/27/2005' , '3/11/2005' union all
select 1, '12/17/2003' , '1/16/2004' union all
select 1, '1/15/2004' , '2/14/2004' union all
select 1, '2/11/2004' , '2/26/2004' union all
select 1, '9/14/2004' , '10/14/2004' union all
select 1, '10/5/2004' , '10/20/2004' union all
select 1, '11/20/2004' , '12/20/2004'
--Phase 1: Apply ordering to dates
if object_id ('tempdb..#OrderedRecords') is not null drop table #OrderedRecords
select *, N = row_number () over (partition by ID order by StartDate asc, EndDate desc)
into #OrderedRecords
from #Dates
--Phase 2: Apply Overlap Rules (Subsume records that overlap)
;with Subsume (ID, N, StartDate, EndDate, IntermediateStartDate, IntermediateEndDate, Valid) as
(
select ID, N, StartDate, EndDate, IntermediateStartDate = StartDate, IntermediateEndDate = EndDate,
Valid = 1
from #OrderedRecords
where N = 1
UNION ALL
select c.ID, c.N, y.StartDate, y.EndDate,
IntermediateStartDate = case when c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate then y.IntermediateStartDate else c.StartDate end,
IntermediateEndDate = case when c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate then y.IntermediateEndDate else c.EndDate end,
Valid = case when (c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate) then 0 else 1 end
from #OrderedRecords c
join Subsume y
on y.ID = c.ID
and y.N = c.n - 1
and y.IntermediateStartDate >= c.EndDate
UNION ALL
select c.ID, c.N, c.StartDate, c.EndDate,
IntermediateStartDate = case when c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate then y.IntermediateStartDate else c.StartDate end,
IntermediateEndDate = case when c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate then y.IntermediateEndDate else c.EndDate end,
Valid = case when (c.StartDate between y.IntermediateStartDate and y.IntermediateEndDate) then 0 else 1 end
from #OrderedRecords c
join Subsume y
on y.ID = c.ID
and y.N = c.n - 1
and y.IntermediateStartDate < c.EndDate
)
Select ID, StartDate, EndDate, Valid
from Subsume
OPTION (MAXRECURSION 0)

Trouble with contradicting where clause

I am trying to display what each user has spend their time doing for the week(either internal or external work) but the time is all on the same column on the table, is it possible to split it into 2 different columns and still have it so that it only shows each user once not each time they entered time which could be multiple times throughout the week.
The SQL below gives me each users tracked time for the week but internal and external on different rows.
SELECT SUM(FilteredMag_Time.mag_hoursspent) AS Time,
FilteredSystemUser.fullname,
FilteredMag_project.mag_typename
FROM FilteredSystemUser
INNER JOIN FilteredMag_Task
INNER JOIN FilteredMag_project ON FilteredMag_Task.mag_projectid = FilteredMag_project.mag_projectid
INNER JOIN FilteredMag_Time ON FilteredMag_Task.mag_taskid = FilteredMag_Time.mag_taskid
ON FilteredSystemUser.systemuserid = FilteredMag_Time.createdby
WHERE (FilteredMag_Time.mag_starttime BETWEEN DATEADD(dd, - (DATEPART(dw, GETDATE()) - 1), GETDATE())
AND DATEADD(dd, - (DATEPART(dw, GETDATE()) - 7), GETDATE()))
GROUP BY FilteredSystemUser.fullname, FilteredMag_project.mag_typename
ORDER BY FilteredSystemUser.fullname
Here is an example of the current output.
Time fullname mag_typename
------------------ --------------------- -------------------------
1.2500000000 David Sutton External
8.2500000000 Gayan Perera External
9.0000000000 Paul Nieuwelaar Internal
14.8700000000 Roshan Mehta External
6.0000000000 Roshan Mehta Internal
2.7800000000 Simon Phillips External
4.6600000000 Simon Phillips Internal
You can make use of SQL Server PIVOT.
Something like
DECLARE #Table TABLE(
userID INT,
typeID VARCHAR(20),
TimeSpent FLOAT
)
INSERT INTO #Table SELECT 1, 'INTERNAL', 1
INSERT INTO #Table SELECT 2, 'INTERNAL', 1
INSERT INTO #Table SELECT 1, 'INTERNAL', 1
INSERT INTO #Table SELECT 1, 'INTERNAL', 1
INSERT INTO #Table SELECT 2, 'EXTERNAL', 3
INSERT INTO #Table SELECT 1, 'EXTERNAL', 3
SELECT *
FROM
(
SELECT userID, typeID, TimeSpent
FROM #Table
) s
PIVOT (SUM(TimeSpent) FOR typeID IN ([INTERNAL],[EXTERNAL])) pvt
Output:
userID INTERNAL EXTERNAL
----------- ---------------------- ----------------------
1 3 3
2 1 3
Assuming FilteredMag_project.mag_typename is 'INTERNAL' or 'EXTERNAL', try the following:
SELECT SUM(CASE FilteredMag_project.mag_typename
WHEN 'INTERNAL' THEN FilteredMag_Time.mag_hoursspent
ELSE 0 END) AS InternalTime,
SUM(CASE FilteredMag_project.mag_typename
WHEN 'EXTERNAL' THEN FilteredMag_Time.mag_hoursspent
ELSE 0 END) AS ExternalTime,
FilteredSystemUser.fullname
FROM FilteredSystemUser
INNER JOIN FilteredMag_Task
INNER JOIN FilteredMag_project ON FilteredMag_Task.mag_projectid = FilteredMag_project.mag_projectid
INNER JOIN FilteredMag_Time ON FilteredMag_Task.mag_taskid = FilteredMag_Time.mag_taskid
ON FilteredSystemUser.systemuserid = FilteredMag_Time.createdby
WHERE (FilteredMag_Time.mag_starttime BETWEEN DATEADD(dd, - (DATEPART(dw, GETDATE()) - 1), GETDATE())
AND DATEADD(dd, - (DATEPART(dw, GETDATE()) - 7), GETDATE()))
GROUP BY FilteredSystemUser.fullname
ORDER BY FilteredSystemUser.fullname