Add a counter by date, user per day to a query - sql

I have a table of data which stores scans into a building, and this contains well over a million rows of data. I am attempting to add a temporary status column within this query, which counts the scans on a daily basis. For the purpose of this question lets use this as the main data table:
CREATE TABLE DataTable (DataTableID INT IDENTITY(1,1) NOT NULL,
User VARCHAR(50),
EventTime DATETIME)
from this I have narrowed it down to show only the scans for today:
SELECT * FROM DataTable
WHERE CONVERT(DATE,EventTime) = CONVERT(DATE, SYSDATETIME())
It is at this point in which I want to add a status column to this query above. The Status column:
WHEN ODD - will mean that the person is in the building
WHEN EVEN - will mean that the person is not in the building
(This is simply an integer field which starts on 1, and will increment by 1 per scan on that day, PER USER). How would I go about doing this?
I do want to make this a view after so its worth mentioning in case this affects the query syntax
Also its worth mentioning that I cant add a status column to the main table as this would prevent the door access program working, otherwise I would add something in here to control that.
EXAMPLE DATA:
DataTableID User EventTime Status
1 Joe 30/08/2016 09:00:00 1
2 Alan 30/08/2016 08:45:00 1
3 John 30/08/2016 09:02:00 1
4 Steven 30/08/2016 07:30:00 1
5 Joe 30/08/2016 11:00:00 2
6 Mike 30/08/2016 17:30:00 1
7 Joe 30/08/2016 12:00:00 3

You want a simple windowing function for this. Take a look at the query below and let me know if you have any questions. This is ordered by EventTime rather than DataTableID for the windowing, it's then ordered by DataTableID in the final query. This is going to make sure you don't have any issues if your data isn't in the correct order in the table.
Temp table for testing;
CREATE TABLE #DataTable
(DataTableID INT IDENTITY(1,1) NOT NULL,
[User] VARCHAR(50),
EventTime DATETIME)
Fill it with sample data;
INSERT INTO #DataTable
VALUES
('Joe', '2016-08-30 09:00:00')
,('Alan', '2016-08-30 08:45:00')
,('John', '2016-08-30 09:02:00')
,('Steven', '2016-08-30 07:30:00')
,('Joe', '2016-08-30 11:00:00')
,('Mike', '2016-08-30 17:30:00')
,('Joe', '2016-08-30 12:00:00')
Query
SELECT
DataTableID
,[User]
,EventTime
,ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY EventTime) Status
FROM #DataTable
WHERE CONVERT(DATE,EventTime) = CONVERT(DATE, SYSDATETIME())
ORDER BY DataTableID
Output
DataTableID User EventTime Status
1 Joe 2016-08-30 09:00:00.000 1
2 Alan 2016-08-30 08:45:00.000 1
3 John 2016-08-30 09:02:00.000 1
4 Steven 2016-08-30 07:30:00.000 1
5 Joe 2016-08-30 11:00:00.000 2
6 Mike 2016-08-30 17:30:00.000 1
7 Joe 2016-08-30 12:00:00.000 3

Something like:
select *, row_number() over(partition by user, cast(eventtime as date) order by eventtime) as status
from datatable
should do the trick.
However, I'd suggest to create a calculated column as cast(eventtime as date), and compound index on this and user column and the original eventtime column as well for performance reasons.

Related

Selecting the most recent date

I have data structured like this:
ID | Enrolment_Date | Appointment1_Date | Appointment2_Date | .... | Appointment150_Date |
112 01/01/2015 01/02/2015 01/03/2018 01/08/2018
113 01/06/2018 01/07/2018 NULL NULL
114 01/04/2018 01/05/2018 01/06/2018 NULL
I need a new variable which counts the number of months between the enrolment_date and the most recent appointment. The challenge is is that all individuals have a different number of appointments.
Update: I agree with the comments that this is poor table design and it needs to be reformatted. Could proposed solutions please include suggested code on how to transform the table?
Since the OP is currently stuck with this bad design, I will point out a temporary solution. As others have suggested, you really must change the structure here. For now, this will suffice:
SELECT '['+ NAME + '],' FROM sys.columns WHERE OBJECT_ID = OBJECT_ID ('TableA') -- find all columns, last one probably max appointment date
SELECT ID,
Enrolment_Date,
CASE WHEN Appointment150_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment150_Date)
WHEN Appointment149_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment149_Date)
WHEN Appointment148_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment148_Date)
WHEN Appointment147_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment147_Date)
WHEN Appointment146_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment146_Date)
WHEN Appointment145_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment145_Date)
WHEN Appointment144_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment144_Date) -- and so on
END AS NumberOfMonths
FROM TableA
This is a very ugly temporary solution and should be considered as such.
You will need to restructure your data, the given structure is poor database design. Create two separate tables - one called users and one called appointments. The users table contains the user id, enrollment date and any other specific user information. Each row in the appointments table contains the user's unique id and a specific appointment date. Structuring your tables like this will make it easier to write a query to get days/months since last appointment.
For example:
Users Table:
ID, Enrollment_Date
1, 2018-01-01
2, 2018-03-02
3, 2018-05-02
Appointments Table:
ID, Appointment_Date
1, 2018-01-02
1, 2018-02-02
1, 2018-02-10
2, 2018-05-01
You would then be able to write a query to join the two tables together and calculate the difference between the enrollment date and min value of the appointment date.
It is better if you can create two tables.
Enrolment Table (dbo.Enrolments)
ID | EnrolmentDate
1 | 2018-08-30
2 | 2018-08-31
Appointments Table (dbo.Appointments)
ID | EnrolmentID | AppointmentDate
1 | 1 | 2018-09-02
2 | 1 | 2018-09-03
3 | 2 | 2018-09-01
4 | 2 | 2018-09-03
Then you can try something like this.
If you want the count of months from Enrolment Date to the final appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MAX(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
And, If you want the count of months from Enrolment Date to the nearest appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MIN(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
Try this on sqlfiddle
You have a lousy data structure, as others have noted. You really one a table with one row per appointment. After all, what happens after the 150th appointment?
select t.id, t.Enrolment_Date,
datediff(month, t.Enrolment_Date, m.max_Appointment_Date) as months_diff
from t cross apply
(select max(Appointment_Date) as max_Appointment_Date
from (values (Appointment1_Date),
(Appointment2_Date),
. . .
(Appointment150_Date)
) v(Appointment_Date)
) m;

SQL Server 2014 - Return Team name based on most recent date (somewhat dynamically)

My title is misleading because I don't know how to sum it up better than that :)
I have a table that keeps a history of changes made to users and what teams they belong to. It starts with their initial team and date, then adds an entry via a trigger when we change their teams in the UserList table.
Our business, like many, loves month to month data. I don't want to have entries for every single month if they don't change teams. Ill get to why that's a problem.
Here is an example of the data in the TeamHistory Table
UserID|CurrentTeam|ChangeDate
User1-|Team1------|01-01-2016
User1-|Team2------|03-01-2016
When I run a view or query that rolls the data up by person and media type (I can have 4 entries for a single person in a single month - voice, fax, email and voicemail) I then need to add the team that they were working on for that month.
Using that above example, if I ran the data for all of last year, I would expect Jan-May to display Team1. Then from June to Dec, Team 2. The problem is if I join the date field in my view/query with this table and use an = sign, then I only get data for 1-1 and 6-1, clearly because I only have those values in the table to match against. If I tell it to do < or <=, I start encountering duplicates as its just not specific enough.
If we need an example query, I can try to work something up that's not one of these massive views.
So lets assume this is my data:
Userid| Month |Media|Calls
User1-|-01/01/2016|Voice|200
User1-|-01/01/2016|Email|100
User1-|-02/01/2016|Voice|250
User1-|-02/01/2016|Email|120
User1-|-03/01/2016|Voice|250
User1-|-03/01/2016|Email|120
And the TeamHistory table has 2 entries, the team they started on for 1/1/2016 and then they switched for 3/1/2016. How do I join the two data sets, using the date and userid as my variables, to pull in the corresponding Team? Especially when I wont have an actual entry for 2/1/2016?
Id want my final dataset to look like this:
Userid|Team | Month |Media|Calls
User1-|Team1|-01/01/2016|Voice|200
User1-|Team1|-01/01/2016|Email|100
User1-|Team1|-02/01/2016|Voice|250
User1-|Team1|-02/01/2016|Email|120
User1-|Team2|-03/01/2016|Voice|250
User1-|Team2|-03/01/2016|Email|120
Since you're using SQL Server (2012 and newer) you can use the LEAD() function to identify an end date for a given range:
;with cte aS (SELECT 'User1' as UserID, 'Team1' AS CurrentTeam, CAST('2016-01-01' AS DATE) as ChangeDate
UNION SELECT 'User1' as UserID, 'Team2' AS CurrentTeam, CAST('2016-06-01' AS DATE) as ChangeDate
UNION SELECT 'User1' as UserID, 'Team1' AS CurrentTeam, CAST('2016-08-15' AS DATE) as ChangeDate
UNION SELECT 'User2' as UserID, 'Team1' AS CurrentTeam, CAST('2016-02-01' AS DATE) as ChangeDate
UNION SELECT 'User2' as UserID, 'Team2' AS CurrentTeam, CAST('2016-07-01' AS DATE) as ChangeDate
)
SELECT *,COALESCE(LEAD(ChangeDate,1) OVER(PARTITION BY UserID ORDER BY ChangeDate),CAST(GETDATE() AS DATE)) as End_Dt
FROM cte
Returns:
UserID CurrentTeam ChangeDate End_Dt
User1 Team1 2016-01-01 2016-06-01
User1 Team2 2016-06-01 2016-08-15
User1 Team1 2016-08-15 2017-01-05
User2 Team1 2016-02-01 2016-07-01
User2 Team2 2016-07-01 2017-01-05
You could then join those ranges to a calendar table to get the individual months as well as calculate which team they spent more days in for a given month.
The LEAD() function returns the next row's value for a given field, PARTITION BY is used to reset the next row based on some grouping, in this case you want the value per UserID, and ORDER BY is used to specify what the next row should be, in this case from one ChangeDate to the next.
You might try this:
--A simple person table
DECLARE #pers TABLE(Person VARCHAR(100));
INSERT INTO #pers VALUES('Bob'),('Tim');
--a table reflecting your work-data
--attention Tim is changing in July to Team Read and still in July back to Blue
DECLARE #Team TABLE(Person VARCHAR(100),Team VARCHAR(100),ChangeDate DATE);
INSERT INTO #Team VALUES
('Bob','Red' ,{d'2016-04-01'})
,('Tim','Blue',{d'2016-04-13'})
,('Tim','Red' ,{d'2016-07-22'})
,('Bob','Blue',{d'2016-06-15'})
,('Tim','Blue',{d'2016-07-28'})
,('Bob','Red' ,{d'2016-10-15'})
,('Tim','Red' ,{d'2016-12-28'})
;
--A CTE to mock-up a numbers/tally/date-table
WITH FirstOfMonthDays(d) AS
(
SELECT {d'2016-01-01'}
UNION ALL SELECT {d'2016-02-01'}
UNION ALL SELECT {d'2016-03-01'}
UNION ALL SELECT {d'2016-04-01'}
UNION ALL SELECT {d'2016-05-01'}
UNION ALL SELECT {d'2016-06-01'}
UNION ALL SELECT {d'2016-07-01'}
UNION ALL SELECT {d'2016-08-01'}
UNION ALL SELECT {d'2016-09-01'}
UNION ALL SELECT {d'2016-10-01'}
UNION ALL SELECT {d'2016-11-01'}
UNION ALL SELECT {d'2016-12-01'}
)
--I use CONVERT(VARCHAR(6),ChangeDate,112) to get a string of YYYYMM
,Numbered AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY Person, CONVERT(VARCHAR(6),ChangeDate,112) ORDER BY ChangeDate DESC) AS Nr
,t.*
FROM #Team AS t
)
--Pick out the one with Nr=1, these are the last changes per month
,LastChangeInMonth AS
(
SELECT *
FROM Numbered
WHERE Nr=1
)
--The actual query
SELECT fom.d
,p.Person
,(
SELECT TOP 1 t.Team
FROM LastChangeInMonth AS t
WHERE t.Person=p.Person
AND CONVERT(VARCHAR(6),t.ChangeDate,112)<=CONVERT(VARCHAR(6),fom.d,112)
ORDER BY t.ChangeDate DESC
) AS fittingTeam
FROM FirstOfMonthDays AS fom
CROSS JOIN #pers AS p
ORDER BY p.Person,fom.d
Since you are using SQL Server 2014 (please tag your questions correctly!) this would be a bit easier with LEAD()/LAG/(), but the idea was the same...
The result
2016-01-01 Bob NULL
2016-02-01 Bob NULL
2016-03-01 Bob NULL
2016-04-01 Bob Red
2016-05-01 Bob Red
2016-06-01 Bob Blue
2016-07-01 Bob Blue
2016-08-01 Bob Blue
2016-09-01 Bob Blue
2016-10-01 Bob Red
2016-11-01 Bob Red
2016-12-01 Bob Red
2016-01-01 Tim NULL
2016-02-01 Tim NULL
2016-03-01 Tim NULL
2016-04-01 Tim Blue
2016-05-01 Tim Blue
2016-06-01 Tim Blue
2016-07-01 Tim Blue
2016-08-01 Tim Blue
2016-09-01 Tim Blue
2016-10-01 Tim Blue
2016-11-01 Tim Blue
2016-12-01 Tim Red

Counting number of Days in the where clause

I have a SQL Server table that looks like this:
ID | Club Name | Booking Date | Submission Date
---+-------------+-------------------------+-------------------------
1 | Basketball | 2015-10-21 00:00:00.000 | 9/18/2015 3:23:42 PM
2 | Tennis | 2015-10-14 00:00:00.000 | 9/28/2015 1:50:25 PM
3 | Basketball | 2015-10-06 00:00:00.000 | 9/29/2015 11:08:20 AM
1 | Other | 2015-10-21 00:00:00.000 | 9/29/2015 11:08:39 AM
I want to know how many times each club did a submission less than 15 days from the booking date..
The solution I came up with was adding a new column and running a the datefiff function and storing the value in the new column.. Then just grouping by club name and adding a parameter for > 15 on the new column..
The question I have is: can this be done on the fly with out having to create the new column? how much would that affect performance if its done on the fly?
Yes, this can be done inline, in a query. In a database, you almost never want to store a calculated column, which is what that datediff column would be. Instead, you can do the math in the WHERE clause.
SELECT
*
FROM
myTable
WHERE
DATEDIFF(day, -15, BookingDate) >= SubmissionDate
I wrote that pretty quickly, so the date math might be going in the wrong direction (checking in the future instead of in the past) but playing with the above query should set you on the right path. Just keep in mind that, if this table gets very big, you're going to be doing a TON of DATEDIFFs and that can have a performance impact.
Something like this?
Declare #Table table (Id int,Club_Name varchar(50),Booking_Date datetime,Sumbission_Date datetime)
Insert #Table values
(1,'Basketball','2015-10-21 00:00:00.000','9/18/2015 3:23:42 PM'),
(2,'Tennis ','2015-10-14 00:00:00.000','9/28/2015 1:50:25 PM'),
(3,'Basketball','2015-10-06 00:00:00.000','9/29/2015 11:08:20 AM'),
(1,'Other ','2015-10-21 00:00:00.000','9/29/2015 11:08:39 AM')
Select Club_Name
,Submissions= count(*)
,Early = sum(case when datediff(DD,Sumbission_Date,Booking_Date)<15 then 1 else 0 end)
From #Table
Group By Club_Name
Returns
Club_Name Submissions Early
Basketball 2 1
Other 1 0
Tennis 1 0
Try this.
SELECT ID,
ClubName,
Sum(Value) As Ttle
FROM
(
SELECT ID,
ClubName,
COUNT(*) AS Value
FROM TableName
GROUP BY ID,
ClubName,
RecordDate
HAVING DATEDIFF(D, BookingDate, SubmissionDate) > 15
) Data
GROUP BY ID,
ClubName,
ORDER BY ttle DESC

Renumbering rows in SQL Server

I'm kinda new into the SQL Server and I'm having the following question: is there any possibility to renumber the rows in a column?
For ex:
id date name
1 2016-01-02 John
2 2016-01-02 Jack
3 2016-01-02 John
4 2016-01-02 John
5 2016-01-03 Jack
6 2016-01-03 Jack
7 2016-01-04 John
8 2016-01-03 Jack
9 2016-01-02 John
10 2016-01-04 Jack
I would like that all "Johns" to start with id 1 and go on (2, 3, 4 etc) and all "Jacks" have the following number when "John" is done (5, 6, 7 etc). Thanks!
I hope this helps..
declare #t table (id int ,[date] date,name varchar(20))
insert into #t
( id, date, name )
values (1,'2016-01-02','John')
,(2,'2016-01-02','Jack')
,(3,'2016-01-02','John')
,(4,'2016-01-02','John')
,(5,'2016-01-03','Jack')
,(6,'2016-01-03','Jack')
,(7,'2016-01-04','John')
,(8,'2016-01-03','Jack')
,(9,'2016-01-02','John')
,(10,'2016-01-04','Jack')
select
row_number() over(order by name,[date]) as ID,
date ,
name
from
#t
order by name
The id should just be an internal identifier you use for joins etc - I wouldn't change it. But you could query such a numbering using a window function:
SELECT ROW_NUMBER() OVER (ORDER BY CASE name WHEN 'John' THE 1 ELSE 2 END) AS rn,
date,
name
FROM mytable
Instead of renumbering the id column, you can use ROW_NUMBER window function to renumber the rows as per your requirement. for e.g.:
SELECT ROW_NUMBER() OVER(PARTITION BY name ORDER BY date) as rowid,date,name
FROM tablename

Total number of days for a task before going on to the next one, grouped by person

I am trying to figure out how to show how many days have been worked on a certain task by using the dates in between each “task login” for each person. I think this can be done with one query? I'm open to suggestions and/or ideas.
The Table:
--------+-----------+----------
Person | TaskLogin | Date
--------+-----------+----------
Jane | A | 2013-01-01
Jane | B | 2013-01-03
Jane | A | 2013-01-06
Jane | B | 2013-01-10
Bob | A | 2013-01-01
Bob | A | 2013-01-06
---------------------------------------------------------------------
Row 1: Jane starts task A starting 2013-01-01 and works on it until starting Task B on 2013-01-03 = 2 days worked on Task A
Row 2: Jane starts on task B starting 2013-01-03 and works on it until starting task A on 2013-01-06 = 3 days worked on Task B
Row 3: Jane starts on task A starting 2013-01-06 and works on it until starting task B on 2013-01-10 = 4 days worked on Task A
Row 4: Skip because that is the highest date for Jane (Jane may or may not finish task B 2013-01-10 but we will not count it)
Row 5: Bob starts task A starting on 2013-01-01 and works on it until continuing to work on task A by logging it again on 2013-01-06 = 5 days worked on task A
Row 6: Skip because that is the highest date for Bob
A = 11 days because 2 + 4 + 5
B = 3 days because of Row 2
The output:
------+---------------------
Tasks | Time between Tasks
------+---------------------
A | 11 days
B | 3 days
**EDIT:*****
The solutions of Nicarus and Gordon Linoff (first pre-2013 solution specifically, with my edits in the comments) works. Note that (select distinct * from table t) t for table can be added to Gordon Linoff's solution to accommodate for the case of someone logging in twice in the same day.
What you are looking for is the lead() function. This is only available in SQL Server 2012. Before that, the easiest way is a correlated subquery:
select TaskLogin, sum(datediff(day, date, nextdate)) as days
from (select t.*,
(select top 1 date
from table t2
where t2.person = t.person
order by date desc
) as nextdate
from table t
) t
where nextdate is not null
group by TaskLogin;
In SQL Server 2012, it would be:
select TaskLogin, sum(datediff(day, date, nextdate)) as days
from (select t.*, lead(date) over (partition by person order by date) as nextdate
from table t
) t
where nextdate is not null
group by TaskLogin;
Maybe not the most elegant way, but it certainly works:
-- Setup table/insert values --
IF OBJECT_ID('TempDB.dbo.#TaskAccounting') IS NOT NULL BEGIN
DROP TABLE #TaskAccounting
END
CREATE TABLE #TaskAccounting
(
Person VARCHAR(4) NOT NULL,
TaskLogin CHAR(1) NOT NULL,
TaskDate DATETIME NOT NULL
)
INSERT INTO #TaskAccounting
VALUES ('Jane','A','2013-01-01')
INSERT INTO #TaskAccounting
VALUES ('Jane','B','2013-01-03')
INSERT INTO #TaskAccounting
VALUES ('Jane','A','2013-01-06')
INSERT INTO #TaskAccounting
VALUES ('Jane','B','2013-01-10')
INSERT INTO #TaskAccounting
VALUES ('Bob','A','2013-01-01')
INSERT INTO #TaskAccounting
VALUES ('Bob','A','2013-01-06');
-- Use a CTE to add sequence and join on it --
WITH Tasks AS (
SELECT
Person,
TaskLogin,
TaskDate,
ROW_NUMBER() OVER(PARTITION BY Person ORDER BY TaskDate) AS Sequence
FROM
#TaskAccounting
)
SELECT
a.TaskLogin AS Tasks,
CAST(SUM(DATEDIFF(DD,a.TaskDate,b.TaskDate)) AS VARCHAR) + ' days' AS TimeBetweenTasks
FROM
Tasks a
JOIN
Tasks b
ON (a.Person = b.Person)
AND (a.Sequence = b.Sequence - 1)
GROUP BY
a.TaskLogin