SQL Server 2014 - Return Team name based on most recent date (somewhat dynamically) - sql

My title is misleading because I don't know how to sum it up better than that :)
I have a table that keeps a history of changes made to users and what teams they belong to. It starts with their initial team and date, then adds an entry via a trigger when we change their teams in the UserList table.
Our business, like many, loves month to month data. I don't want to have entries for every single month if they don't change teams. Ill get to why that's a problem.
Here is an example of the data in the TeamHistory Table
UserID|CurrentTeam|ChangeDate
User1-|Team1------|01-01-2016
User1-|Team2------|03-01-2016
When I run a view or query that rolls the data up by person and media type (I can have 4 entries for a single person in a single month - voice, fax, email and voicemail) I then need to add the team that they were working on for that month.
Using that above example, if I ran the data for all of last year, I would expect Jan-May to display Team1. Then from June to Dec, Team 2. The problem is if I join the date field in my view/query with this table and use an = sign, then I only get data for 1-1 and 6-1, clearly because I only have those values in the table to match against. If I tell it to do < or <=, I start encountering duplicates as its just not specific enough.
If we need an example query, I can try to work something up that's not one of these massive views.
So lets assume this is my data:
Userid| Month |Media|Calls
User1-|-01/01/2016|Voice|200
User1-|-01/01/2016|Email|100
User1-|-02/01/2016|Voice|250
User1-|-02/01/2016|Email|120
User1-|-03/01/2016|Voice|250
User1-|-03/01/2016|Email|120
And the TeamHistory table has 2 entries, the team they started on for 1/1/2016 and then they switched for 3/1/2016. How do I join the two data sets, using the date and userid as my variables, to pull in the corresponding Team? Especially when I wont have an actual entry for 2/1/2016?
Id want my final dataset to look like this:
Userid|Team | Month |Media|Calls
User1-|Team1|-01/01/2016|Voice|200
User1-|Team1|-01/01/2016|Email|100
User1-|Team1|-02/01/2016|Voice|250
User1-|Team1|-02/01/2016|Email|120
User1-|Team2|-03/01/2016|Voice|250
User1-|Team2|-03/01/2016|Email|120

Since you're using SQL Server (2012 and newer) you can use the LEAD() function to identify an end date for a given range:
;with cte aS (SELECT 'User1' as UserID, 'Team1' AS CurrentTeam, CAST('2016-01-01' AS DATE) as ChangeDate
UNION SELECT 'User1' as UserID, 'Team2' AS CurrentTeam, CAST('2016-06-01' AS DATE) as ChangeDate
UNION SELECT 'User1' as UserID, 'Team1' AS CurrentTeam, CAST('2016-08-15' AS DATE) as ChangeDate
UNION SELECT 'User2' as UserID, 'Team1' AS CurrentTeam, CAST('2016-02-01' AS DATE) as ChangeDate
UNION SELECT 'User2' as UserID, 'Team2' AS CurrentTeam, CAST('2016-07-01' AS DATE) as ChangeDate
)
SELECT *,COALESCE(LEAD(ChangeDate,1) OVER(PARTITION BY UserID ORDER BY ChangeDate),CAST(GETDATE() AS DATE)) as End_Dt
FROM cte
Returns:
UserID CurrentTeam ChangeDate End_Dt
User1 Team1 2016-01-01 2016-06-01
User1 Team2 2016-06-01 2016-08-15
User1 Team1 2016-08-15 2017-01-05
User2 Team1 2016-02-01 2016-07-01
User2 Team2 2016-07-01 2017-01-05
You could then join those ranges to a calendar table to get the individual months as well as calculate which team they spent more days in for a given month.
The LEAD() function returns the next row's value for a given field, PARTITION BY is used to reset the next row based on some grouping, in this case you want the value per UserID, and ORDER BY is used to specify what the next row should be, in this case from one ChangeDate to the next.

You might try this:
--A simple person table
DECLARE #pers TABLE(Person VARCHAR(100));
INSERT INTO #pers VALUES('Bob'),('Tim');
--a table reflecting your work-data
--attention Tim is changing in July to Team Read and still in July back to Blue
DECLARE #Team TABLE(Person VARCHAR(100),Team VARCHAR(100),ChangeDate DATE);
INSERT INTO #Team VALUES
('Bob','Red' ,{d'2016-04-01'})
,('Tim','Blue',{d'2016-04-13'})
,('Tim','Red' ,{d'2016-07-22'})
,('Bob','Blue',{d'2016-06-15'})
,('Tim','Blue',{d'2016-07-28'})
,('Bob','Red' ,{d'2016-10-15'})
,('Tim','Red' ,{d'2016-12-28'})
;
--A CTE to mock-up a numbers/tally/date-table
WITH FirstOfMonthDays(d) AS
(
SELECT {d'2016-01-01'}
UNION ALL SELECT {d'2016-02-01'}
UNION ALL SELECT {d'2016-03-01'}
UNION ALL SELECT {d'2016-04-01'}
UNION ALL SELECT {d'2016-05-01'}
UNION ALL SELECT {d'2016-06-01'}
UNION ALL SELECT {d'2016-07-01'}
UNION ALL SELECT {d'2016-08-01'}
UNION ALL SELECT {d'2016-09-01'}
UNION ALL SELECT {d'2016-10-01'}
UNION ALL SELECT {d'2016-11-01'}
UNION ALL SELECT {d'2016-12-01'}
)
--I use CONVERT(VARCHAR(6),ChangeDate,112) to get a string of YYYYMM
,Numbered AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY Person, CONVERT(VARCHAR(6),ChangeDate,112) ORDER BY ChangeDate DESC) AS Nr
,t.*
FROM #Team AS t
)
--Pick out the one with Nr=1, these are the last changes per month
,LastChangeInMonth AS
(
SELECT *
FROM Numbered
WHERE Nr=1
)
--The actual query
SELECT fom.d
,p.Person
,(
SELECT TOP 1 t.Team
FROM LastChangeInMonth AS t
WHERE t.Person=p.Person
AND CONVERT(VARCHAR(6),t.ChangeDate,112)<=CONVERT(VARCHAR(6),fom.d,112)
ORDER BY t.ChangeDate DESC
) AS fittingTeam
FROM FirstOfMonthDays AS fom
CROSS JOIN #pers AS p
ORDER BY p.Person,fom.d
Since you are using SQL Server 2014 (please tag your questions correctly!) this would be a bit easier with LEAD()/LAG/(), but the idea was the same...
The result
2016-01-01 Bob NULL
2016-02-01 Bob NULL
2016-03-01 Bob NULL
2016-04-01 Bob Red
2016-05-01 Bob Red
2016-06-01 Bob Blue
2016-07-01 Bob Blue
2016-08-01 Bob Blue
2016-09-01 Bob Blue
2016-10-01 Bob Red
2016-11-01 Bob Red
2016-12-01 Bob Red
2016-01-01 Tim NULL
2016-02-01 Tim NULL
2016-03-01 Tim NULL
2016-04-01 Tim Blue
2016-05-01 Tim Blue
2016-06-01 Tim Blue
2016-07-01 Tim Blue
2016-08-01 Tim Blue
2016-09-01 Tim Blue
2016-10-01 Tim Blue
2016-11-01 Tim Blue
2016-12-01 Tim Red

Related

How can I transform SQL to show start and stop dates and show every day?

Person
State
StartDate
joe
blue
2/4/2020
bob
red
12/1/2019
bob
black
12/3/2009
joe
blue
2/4/2018
joe
red
12/1/2015
mary
black
12/3/2009
I have a table set up as shown above. I want to transform this to the following
Person
State
StartDate
EndDate
joe
blue
2/4/2020
bob
red
12/1/2019
bob
black
12/3/2009
11/30/2019
joe
blue
2/4/2018
2/3/2020
joe
red
12/1/2015
2/3/2018
mary
black
12/3/2009
After this, I want to have one line for every calendar day that a Person is in a given state. If there is no end date, the days in a given state should stop at the current date.
How can I do this with SQL only?
Perhaps the window function lead() in concert with dateadd() would be a good option
Example
Select *
,EndDate = dateadd(day,-1,lead(StartDate,1) over (partition by Person Order by StartDate))
From YourTable A
Returns
Person State StartDate EndDate
bob black 2009-12-03 2019-11-30
bob red 2019-12-01 NULL
joe red 2015-12-01 2018-02-03
joe blue 2018-02-04 2020-02-03
joe blue 2020-02-04 NULL
mary black 2009-12-03 NULL
EDIT - To Expand Into Daily
We take the query above and add IsNull(...,convert(date,getdate())) to trap the end dates. Then we create an ad-hoc calendar table and perform a simple join.
Select A.*
,B.D
From (
Select *
,EndDate = IsNull(dateadd(day,-1,lead(StartDate,1) over (partition by Person Order by StartDate)),convert(date,getdate()))
From YourTable A
) A
Join (
Select Top (datediff(day,'1999-12-31',getdate()))
D=dateadd(day,Row_Number() Over (Order By (Select NULL)),convert(date,'1999-12-31'))
From master..spt_values n1, master..spt_values n2
) B on D between StartDate and EndDate
Order By Person,D
Returns 10,210 rows
I like to use recursive CTEs for expanding data. It is pretty simple in your case:
with cte as (
select person, state, startdate,
lead(dateadd(day, -1, startdate),
1,
convert(date, getdate())
) over (partition by person order by startdate) as enddate
from t
union all
select person, state, dateadd(day, 1, startdate), enddate
from cte
where startdate < enddate
)
select person, state, startdate
from cte
option (maxrecursion 0);
Here is a db<>fiddle.

Selecting the most recent date

I have data structured like this:
ID | Enrolment_Date | Appointment1_Date | Appointment2_Date | .... | Appointment150_Date |
112 01/01/2015 01/02/2015 01/03/2018 01/08/2018
113 01/06/2018 01/07/2018 NULL NULL
114 01/04/2018 01/05/2018 01/06/2018 NULL
I need a new variable which counts the number of months between the enrolment_date and the most recent appointment. The challenge is is that all individuals have a different number of appointments.
Update: I agree with the comments that this is poor table design and it needs to be reformatted. Could proposed solutions please include suggested code on how to transform the table?
Since the OP is currently stuck with this bad design, I will point out a temporary solution. As others have suggested, you really must change the structure here. For now, this will suffice:
SELECT '['+ NAME + '],' FROM sys.columns WHERE OBJECT_ID = OBJECT_ID ('TableA') -- find all columns, last one probably max appointment date
SELECT ID,
Enrolment_Date,
CASE WHEN Appointment150_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment150_Date)
WHEN Appointment149_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment149_Date)
WHEN Appointment148_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment148_Date)
WHEN Appointment147_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment147_Date)
WHEN Appointment146_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment146_Date)
WHEN Appointment145_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment145_Date)
WHEN Appointment144_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment144_Date) -- and so on
END AS NumberOfMonths
FROM TableA
This is a very ugly temporary solution and should be considered as such.
You will need to restructure your data, the given structure is poor database design. Create two separate tables - one called users and one called appointments. The users table contains the user id, enrollment date and any other specific user information. Each row in the appointments table contains the user's unique id and a specific appointment date. Structuring your tables like this will make it easier to write a query to get days/months since last appointment.
For example:
Users Table:
ID, Enrollment_Date
1, 2018-01-01
2, 2018-03-02
3, 2018-05-02
Appointments Table:
ID, Appointment_Date
1, 2018-01-02
1, 2018-02-02
1, 2018-02-10
2, 2018-05-01
You would then be able to write a query to join the two tables together and calculate the difference between the enrollment date and min value of the appointment date.
It is better if you can create two tables.
Enrolment Table (dbo.Enrolments)
ID | EnrolmentDate
1 | 2018-08-30
2 | 2018-08-31
Appointments Table (dbo.Appointments)
ID | EnrolmentID | AppointmentDate
1 | 1 | 2018-09-02
2 | 1 | 2018-09-03
3 | 2 | 2018-09-01
4 | 2 | 2018-09-03
Then you can try something like this.
If you want the count of months from Enrolment Date to the final appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MAX(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
And, If you want the count of months from Enrolment Date to the nearest appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MIN(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
Try this on sqlfiddle
You have a lousy data structure, as others have noted. You really one a table with one row per appointment. After all, what happens after the 150th appointment?
select t.id, t.Enrolment_Date,
datediff(month, t.Enrolment_Date, m.max_Appointment_Date) as months_diff
from t cross apply
(select max(Appointment_Date) as max_Appointment_Date
from (values (Appointment1_Date),
(Appointment2_Date),
. . .
(Appointment150_Date)
) v(Appointment_Date)
) m;

Advanced Sql query solution required

player team start_date end_date points
John Jacob SportsBallers 2015-01-01 2015-03-31 100
John Jacob SportsKings 2015-04-01 2015-12-01 115
Joe Smith PointScorers 2014-01-01 2016-12-31 125
Bill Johnson SportsKings 2015-01-01 2015-06-31 175
Bill Johnson AllStarTeam 2015-07-01 2016-12-31 200
The above table has many more rows. I was asked the below questions in an interview.
1.)For each player, which team were they play for on 2015-01-01?
I could not answer this one.
2.)For each player, how can we get the team for whom they scored the most points?
select team from Players
where points in (select max(points) from players group by player).
Please, solutions for both.
1
select *
from PlayerTeams
where startdate <='2015-01-01' and enddate >= '2015-01-01'
2
Select player, team, points
from(
Select *, row_number() over (partition by player order by points desc) as rank
From PlayerTeams) as player
where rank = 1
For #1:
Select Player
,Team
From table
Where '2015-01-01' between start_date and end_date
For #2:
select t.Player
,t.Team
from table t
inner join (select Player
,Max(points)
from table
group by Player) m
on t.Player = m.Player
and t.points = m.points

Add a counter by date, user per day to a query

I have a table of data which stores scans into a building, and this contains well over a million rows of data. I am attempting to add a temporary status column within this query, which counts the scans on a daily basis. For the purpose of this question lets use this as the main data table:
CREATE TABLE DataTable (DataTableID INT IDENTITY(1,1) NOT NULL,
User VARCHAR(50),
EventTime DATETIME)
from this I have narrowed it down to show only the scans for today:
SELECT * FROM DataTable
WHERE CONVERT(DATE,EventTime) = CONVERT(DATE, SYSDATETIME())
It is at this point in which I want to add a status column to this query above. The Status column:
WHEN ODD - will mean that the person is in the building
WHEN EVEN - will mean that the person is not in the building
(This is simply an integer field which starts on 1, and will increment by 1 per scan on that day, PER USER). How would I go about doing this?
I do want to make this a view after so its worth mentioning in case this affects the query syntax
Also its worth mentioning that I cant add a status column to the main table as this would prevent the door access program working, otherwise I would add something in here to control that.
EXAMPLE DATA:
DataTableID User EventTime Status
1 Joe 30/08/2016 09:00:00 1
2 Alan 30/08/2016 08:45:00 1
3 John 30/08/2016 09:02:00 1
4 Steven 30/08/2016 07:30:00 1
5 Joe 30/08/2016 11:00:00 2
6 Mike 30/08/2016 17:30:00 1
7 Joe 30/08/2016 12:00:00 3
You want a simple windowing function for this. Take a look at the query below and let me know if you have any questions. This is ordered by EventTime rather than DataTableID for the windowing, it's then ordered by DataTableID in the final query. This is going to make sure you don't have any issues if your data isn't in the correct order in the table.
Temp table for testing;
CREATE TABLE #DataTable
(DataTableID INT IDENTITY(1,1) NOT NULL,
[User] VARCHAR(50),
EventTime DATETIME)
Fill it with sample data;
INSERT INTO #DataTable
VALUES
('Joe', '2016-08-30 09:00:00')
,('Alan', '2016-08-30 08:45:00')
,('John', '2016-08-30 09:02:00')
,('Steven', '2016-08-30 07:30:00')
,('Joe', '2016-08-30 11:00:00')
,('Mike', '2016-08-30 17:30:00')
,('Joe', '2016-08-30 12:00:00')
Query
SELECT
DataTableID
,[User]
,EventTime
,ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY EventTime) Status
FROM #DataTable
WHERE CONVERT(DATE,EventTime) = CONVERT(DATE, SYSDATETIME())
ORDER BY DataTableID
Output
DataTableID User EventTime Status
1 Joe 2016-08-30 09:00:00.000 1
2 Alan 2016-08-30 08:45:00.000 1
3 John 2016-08-30 09:02:00.000 1
4 Steven 2016-08-30 07:30:00.000 1
5 Joe 2016-08-30 11:00:00.000 2
6 Mike 2016-08-30 17:30:00.000 1
7 Joe 2016-08-30 12:00:00.000 3
Something like:
select *, row_number() over(partition by user, cast(eventtime as date) order by eventtime) as status
from datatable
should do the trick.
However, I'd suggest to create a calculated column as cast(eventtime as date), and compound index on this and user column and the original eventtime column as well for performance reasons.

How to find most recent date given a set a values that fulfill condition *

I've been trying to build an sql query that finds from (table) the most recent date for selected id's that fulfill the condition where 'type' is in hierarchy 'vegetables'. My goal is to be able to get the whole row once max(date) and hierarchy conditions are met for each id.
Example values
ID DATE PREFERENCE AGE
123 1/3/2013 carrot 14
123 1/3/2013 apple 12
123 1/2/2013 carrot 14
124 1/5/2013 carrot 13
124 1/3/2013 apple 13
124 1/2/2013 carrot 14
125 1/4/2013 carrot 13
125 1/3/2013 apple 14
125 1/2/2013 carrot 13
I tried the following
SELECT *
FROM table
WHERE date in
(SELECT max(date) FROM (table) WHERE id in (123,124,125))
and preference in
(SELECT preference FROM (hierarchy_table)
WHERE hierarchy = vegetables))
and id in (123,24,125)
but it doesn't give me the most recent date for each id that meets the hierarchy conditions. (ex. in this scenario I would only get id 124)
Thank you in advance!
SELECT max(date) FROM (table) WHERE id in (123,124,125)
is giving you the max date from all dates, you need to group them.
Try replacing with:
SELECT max(date) FROM (table) GROUP BY id
This way you will get the max date for each id
I figured this out. Please see the query below as an example:
SELECT * FROM (table) t
WHERE t.date in
(SELECT max(date) FROM table sub_t where t.ID = sub_t.ID and (date !> (currentdate))
and preference in
(SELECT preference FROM (hierarchy_table) WHERE hierarchy ='vegetables')
and ID in ('124')
Change:
max(date)
To:
-- if your date data is in mm/dd/yyyy
max( str_to_date( date, '%m/%d/%Y' ) )
OR
-- if your date data is in dd/mm/yyyy
max( str_to_date( date, '%d/%m/%Y' ) )