I have a table which has information on races that have taken place, it holds participants who took part, where they finished in the race and what time they finished. I would like to add a time difference column which shows how far behind each participant was behind the winner.
Race ID Finish place Time Name
1 1 00:00:10 Matt
1 2 00:00:11 Mick
1 3 00:00:17 Shaun
2 1 00:00:13 Claire
2 2 00:00:15 Helen
What I would like to See
Race ID Finish place Time Time Dif Name
1 1 00:00:10 Matt
1 2 00:00:11 00:00:01 Mick
1 3 00:00:17 00:00:07 Shaun
2 1 00:00:13 Claire
2 2 00:00:15 00:00:02 Helen
I have seen similar questions asked but I was unable to relate it to my problem.
My initial idea was to have a number of derived tables which filtered out by finish place but there could be more than 10 racers so things would start to get messy. I'm using Management Studio 2012
You can use min() as a window function:
select t.*,
(case when time <> min_time then time - min_time
end) as diff
from (select t.*, min(t.time) over (partition by t.race_id) as min_time
from t
) t
I would be more inclined to express this as seconds:
(case when time <> min_time then datediff(second, min_time, time)
end) as diff
Using http://www.convertcsv.com/csv-to-sql.htm to build example data:
DROP TABLE IF EXISTS mytable
CREATE TABLE mytable(
Race_ID INTEGER
,Finish_place INTEGER
,Time VARCHAR(30)
,Name VARCHAR(30)
);
INSERT INTO mytable(Race_ID,Finish_place,Time,Name) VALUES (1, 1,'00:00:10','Matt');
INSERT INTO mytable(Race_ID,Finish_place,Time,Name) VALUES (1, 2,'00:00:11','Mick');
INSERT INTO mytable(Race_ID,Finish_place,Time,Name) VALUES (1, 3,'00:00:17','Shaun');
INSERT INTO mytable(Race_ID,Finish_place,Time,Name) VALUES (2, 1,'00:00:13','Claire');
INSERT INTO mytable(Race_ID,Finish_place,Time,Name) VALUES (2, 2,'00:00:15','Helen');
A CTE with only first finshed places would be easier to understand.
WITH CTE_FIRST
AS (
SELECT
M.Race_ID
,M.Finish_place
,M.Time
,M.Name
FROM mytable M
WHERE M.Finish_place = 1
)
SELECT
M.Race_ID
,M.Finish_place
,M.Time
,CASE
WHEN m.Finish_place = 1
THEN NULL
ELSE CONVERT(VARCHAR, DATEADD(ss, DATEDIFF(SECOND, c.Time, M.Time), 0), 108)
END AS [Time Dif]
,M.Name
FROM mytable M
INNER JOIN CTE_FIRST c
ON M.Race_ID = c.Race_ID
You can use window functions. MIN([time]) OVER (PARTITION BY race_id ORDER BY finish_place) gives first row's time value in the same race. DATEDIFF(SECOND, (MIN([time]) OVER (PARTITION BY race_id ORDER BY finish_place)), time) gives the difference.
Related
I have a relatively simple query and now have to make what appears to be simple change to that query. It's not going very well though.
The query:
select filekey, eventdate, hourstype, hours from hourshist
yields the following results:
filekey eventdate hourstype hours
1 6/1/2018 1 9
1 6/1/2018 2 3
1 6/2/2018 1 8
Which was fine until a change was requested that stated if hourstype 1 and 2 occur on the same day hourstype remains as 1 and hours (with type 1 or 2) are summed. Any other hourstype (3,4,5,etc.) would not be summed and would show as its own row. Days that have either hourstype 1 or 2 but not both should not change at all. It should yield the following results:
filekey eventdate hourstype hours
1 6/1/2018 1 12
1 6/2/2018 1 8
Thanks for your help.
To avoid over-complicating your aggregation by using case statements I would use two queries and a union to keep it fairly simple. The following produces your desired results in my environment.
SELECT filekey, eventdate, 1 AS 'hourstype', SUM(hours) AS 'hours' FROM hourshist
WHERE hourstype IN (1,2)
GROUP BY filekey, eventdate
UNION
SELECT filekey, eventdate, hourstype, hours FROM hourshist
WHERE hourstype NOT IN (1,2);
I used the following to create the test environment you described:
CREATE TABLE hourshist
(
filekey INT,
eventdate DATE,
hourstype INT,
hours INT
);
INSERT INTO hourshist
VALUES (1,'6/1/2018',1,9)
,(1,'6/1/2018',2,3)
,(1,'6/2/2018',1,8);
I feel like the task is straight forward but I am having hard time getting it to do what I want.
Here is a table in my database:
ID |Empl_Acc_ID |CheckIn |CheckOut |WeekDay
----------------------------------------------------------------------------
1 | 1 | 2017-09-24 08:03:02.143 | 2017-09-24 12:00:00.180 | Sun
2 | 1 | 2017-09-24 13:02:23.457 | 2017-09-24 17:01:02.640 | Sun
3 | 2 | 2017-09-24 08:05:23.457 | 2017-09-24 13:01:02.640 | Mon
4 | 2 | 2017-09-24 14:05:23.457 | 2017-09-24 17:00:02.640 | Mon
5 | 3 | 2017-09-24 07:05:23.457 | 2017-09-24 11:30:02.640 | Tue
6 | 3 | 2017-09-24 12:31:23.457 | 2017-09-24 16:01:02.640 | Tue
and so on....
I want to group Empl_Acc_ID by the same date and sum up the total hours each employee worked that day. Each employee could have either one or more records per day depending on how many breaks he/she took that day.
For example if Empl_Acc_ID (2) worked 3 different days with one break, the table will contain 6 records for that person but in my query I want to see 3 records with the total hours they worked each day.
Here is how I constructed the query:
select distinct w.Empl_Acc_ID, ws.fullWorkDayHours
from Work_Schedule as w
INNER JOIN (
SELECT Empl_Acc_ID, fullWorkDayHours = Sum(DATEDIFF(hour, w.CheckIn, w.CheckOut))
from Work_Schedule w
GROUP BY Empl_Acc_ID
) ws on w.Empl_Acc_ID = ws.Empl_Acc_ID
This query does not quite get me what I need. It only returns the sum of hours per employee for all the days they worked. Also, this query only has 2 columns but I want to see more columns. when I tried adding more columns, the records no longer are distinct by Empl_Acc_ID.
What is wrong with the query?
Thank you
You do not need self-join this table in that case, just group by casting the datetime field to date.
create table Work_Schedule (
ID TINYINT,
Empl_Acc_ID TINYINT,
CheckIn DATETIME,
CheckOut DATETIME,
WeekDay CHAR(3)
);
INSERT INTO Work_Schedule VALUES (1, 1,'2017-09-24 08:03:02.143','2017-09-24 12:00:00.180','Sun');
INSERT INTO Work_Schedule VALUES (2, 1,'2017-09-24 13:02:23.457','2017-09-24 17:01:02.640','Sun');
INSERT INTO Work_Schedule VALUES (3, 2,'2017-09-24 08:05:23.457','2017-09-24 13:01:02.640','Mon');
INSERT INTO Work_Schedule VALUES (4, 2,'2017-09-24 14:05:23.457','2017-09-24 17:00:02.640','Mon');
INSERT INTO Work_Schedule VALUES (5, 3,'2017-09-24 07:05:23.457','2017-09-24 11:30:02.640','Tue');
INSERT INTO Work_Schedule VALUES (6, 3,'2017-09-24 12:31:23.457','2017-09-24 16:01:02.640','Tue');
SELECT w.Empl_Acc_ID,
CAST(CheckIn AS DATE) [date],
SUM(DATEDIFF(hour, w.CheckIn, w.CheckOut)) fullWorkDayHours
FROM Work_Schedule w
GROUP BY w.Empl_Acc_ID, CAST(CheckIn AS DATE)
DROP TABLE Work_Schedule;
Empl_Acc_ID date fullWorkDayHours
1 2017-09-24 8
2 2017-09-24 8
3 2017-09-24 8
Try this. You just have to group by date and employee account.
select Employee.Empl_Acc_ID, FirstName, LastName, Username,
convert(varchar(10), checkin, 101) as checkin, convert(varchar(10),
checkout, 101) as checkout, sum(datediff(hour, checkin, checkout)) as hours
from Employee
inner join Employee_Account on Employee.Empl_Acc_ID =
Employee_Account.Empl_Acc_ID
inner join Work_Schedule on Employee_Account.Empl_Acc_ID =
Work_Schedule.Empl_Acc_ID
group by convert(varchar(10), checkin, 101), convert(varchar(10), checkout,
101), Employee.Empl_Acc_ID, FirstName, LastName, Username
order by Employee.Empl_Acc_ID
You do not group by date, that's the issue:
SELECT DISTINCT w.Empl_Acc_ID, ws.fullWorkDayHours, ws.CheckInDate
FROM Work_Schedule as w
INNER JOIN (
SELECT Empl_Acc_ID, CAST(w.CheckIn AS DATE) AS [CheckInDate], fullWorkDayHours = Sum(DATEDIFF(hour,
w.CheckIn, w.CheckOut))
from Work_Schedule w
GROUP BY Empl_Acc_ID, CAST(w.CheckIn AS DATE)
) ws on w.Empl_Acc_ID = ws.Empl_Acc_ID
No need of doing self join, it works fine without it:
Select distinct Empl_Acc_ID, Sum(DATEDIFF(hour,CheckIN,CheckOut)) As
FullDayWorkHours from EMP2
where DATEPART(day,CheckIn)=DATEPART(day,CheckOut)
Group By Empl_Acc_ID
I've been playing with window functions in SQL Server 2012 and can't get this to work, as I'm hoping to avoid a cursor and going row by row. My problem is that I need to add a group number to each record. The tricky part is that the group number has to increment each time a column value changes, even if it changes back to a value that existed before earlier in the sequence of records.
Here's an example of the data and my desired outcome:
if object_id('tempdb..#data') is not null
drop table #data
create table #data
(
id int identity(1,1)
,mytime datetime
,distance int
,direction varchar(20)
)
insert into #data (mytime, distance, direction)
values
('2016-01-01 08:00',10,'North')
,('2016-01-01 08:30',18,'North')
,('2016-01-01 09:00',15,'North')
,('2016-01-01 09:30',12,'South')
,('2016-01-01 10:00',16,'South')
,('2016-01-01 10:30',45,'North')
,('2016-01-01 11:00',23,'North')
,('2016-01-01 11:30',14,'South')
,('2016-01-01 12:00',40,'South')
Desired outcome:
mytime Distance Direction GroupNumber
--------------------------------------------------------
2016-01-01 8:00 10 North 1
2016-01-01 8:30 18 North 1
2016-01-01 9:00 15 North 1
2016-01-01 9:30 12 South 2
2016-01-01 10:00 16 South 2
2016-01-01 10:30 45 North 3
2016-01-01 11:00 23 North 3
2016-01-01 11:30 14 South 4
2016-01-01 12:00 40 South 4
Is this possible using window functions?
One way would be
WITH T
AS (SELECT *,
CASE
WHEN LAG(direction)
OVER (ORDER BY ID) = direction THEN 0
ELSE 1
END AS Flag
FROM #data)
SELECT mytime,
Distance,
Direction,
SUM(Flag) OVER (ORDER BY id) AS GroupNumber
FROM T
The above assumes Direction doesn't contain any NULLs. It would need a minor adjustment if this is possible. But you would also need to define whether or not two consecutive NULL should be treated as equal (assuming this was the case then the below variant would work)
WITH T
AS (SELECT *,
prev = LAG(direction) OVER (ORDER BY ID),
rn = ROW_NUMBER() OVER (ORDER BY ID)
FROM #data)
SELECT mytime,
Distance,
Direction,
SUM(CASE WHEN rn > 1 AND EXISTS(SELECT prev
INTERSECT
SELECT Direction) THEN 0 ELSE 1 END) OVER (ORDER BY id) AS GroupNumber
FROM T
ORDER BY ID
I have a SQL Server table that looks like this:
ID | Club Name | Booking Date | Submission Date
---+-------------+-------------------------+-------------------------
1 | Basketball | 2015-10-21 00:00:00.000 | 9/18/2015 3:23:42 PM
2 | Tennis | 2015-10-14 00:00:00.000 | 9/28/2015 1:50:25 PM
3 | Basketball | 2015-10-06 00:00:00.000 | 9/29/2015 11:08:20 AM
1 | Other | 2015-10-21 00:00:00.000 | 9/29/2015 11:08:39 AM
I want to know how many times each club did a submission less than 15 days from the booking date..
The solution I came up with was adding a new column and running a the datefiff function and storing the value in the new column.. Then just grouping by club name and adding a parameter for > 15 on the new column..
The question I have is: can this be done on the fly with out having to create the new column? how much would that affect performance if its done on the fly?
Yes, this can be done inline, in a query. In a database, you almost never want to store a calculated column, which is what that datediff column would be. Instead, you can do the math in the WHERE clause.
SELECT
*
FROM
myTable
WHERE
DATEDIFF(day, -15, BookingDate) >= SubmissionDate
I wrote that pretty quickly, so the date math might be going in the wrong direction (checking in the future instead of in the past) but playing with the above query should set you on the right path. Just keep in mind that, if this table gets very big, you're going to be doing a TON of DATEDIFFs and that can have a performance impact.
Something like this?
Declare #Table table (Id int,Club_Name varchar(50),Booking_Date datetime,Sumbission_Date datetime)
Insert #Table values
(1,'Basketball','2015-10-21 00:00:00.000','9/18/2015 3:23:42 PM'),
(2,'Tennis ','2015-10-14 00:00:00.000','9/28/2015 1:50:25 PM'),
(3,'Basketball','2015-10-06 00:00:00.000','9/29/2015 11:08:20 AM'),
(1,'Other ','2015-10-21 00:00:00.000','9/29/2015 11:08:39 AM')
Select Club_Name
,Submissions= count(*)
,Early = sum(case when datediff(DD,Sumbission_Date,Booking_Date)<15 then 1 else 0 end)
From #Table
Group By Club_Name
Returns
Club_Name Submissions Early
Basketball 2 1
Other 1 0
Tennis 1 0
Try this.
SELECT ID,
ClubName,
Sum(Value) As Ttle
FROM
(
SELECT ID,
ClubName,
COUNT(*) AS Value
FROM TableName
GROUP BY ID,
ClubName,
RecordDate
HAVING DATEDIFF(D, BookingDate, SubmissionDate) > 15
) Data
GROUP BY ID,
ClubName,
ORDER BY ttle DESC
Thanks to Mike for the suggestion to add the create/insert statements.
create table test (
pid integer not null,
date date not null,
primary key (pid, date)
);
insert into test values
(1,'2014-10-1')
, (1,'2014-10-2')
, (1,'2014-10-3')
, (1,'2014-10-5')
, (1,'2014-10-7')
, (2,'2014-10-1')
, (2,'2014-10-2')
, (2,'2014-10-3')
, (2,'2014-10-5')
, (2,'2014-10-7');
I want to add a new column that is 'days in current streak'
so the result would look like:
pid | date | in_streak
-------|-----------|----------
1 | 2014-10-1 | 1
1 | 2014-10-2 | 2
1 | 2014-10-3 | 3
1 | 2014-10-5 | 1
1 | 2014-10-7 | 1
2 | 2014-10-2 | 1
2 | 2014-10-3 | 2
2 | 2014-10-4 | 3
2 | 2014-10-6 | 1
I've been trying to use the answers from
PostgreSQL: find number of consecutive days up until now
Return rows of the latest 'streak' of data
but I can't work out how to use the dense_rank() trick with other window functions to get the right result.
Building on this table (not using the SQL keyword "date" as column name.):
CREATE TABLE tbl(
pid int
, the_date date
, PRIMARY KEY (pid, the_date)
);
Query:
SELECT pid, the_date
, row_number() OVER (PARTITION BY pid, grp ORDER BY the_date) AS in_streak
FROM (
SELECT *
, the_date - '2000-01-01'::date
- row_number() OVER (PARTITION BY pid ORDER BY the_date) AS grp
FROM tbl
) sub
ORDER BY pid, the_date;
Subtracting a date from another date yields an integer. Since you are looking for consecutive days, every next row would be greater by one. If we subtract row_number() from that, the whole streak ends up in the same group (grp) per pid. Then it's simple to deal out number per group.
grp is calculated with two subtractions, which should be fastest. An equally fast alternative could be:
the_date - row_number() OVER (PARTITION BY pid ORDER BY the_date) * interval '1d' AS grp
One multiplication, one subtraction. String concatenation and casting is more expensive. Test with EXPLAIN ANALYZE.
Don't forget to partition by pid additionally in both steps, or you'll inadvertently mix groups that should be separated.
Using a subquery, since that is typically faster than a CTE. There is nothing here that a plain subquery couldn't do.
And since you mentioned it: dense_rank() is obviously not necessary here. Basic row_number() does the job.
You'll get more attention if you include CREATE TABLE statements and INSERT statements in your question.
create table test (
pid integer not null,
date date not null,
primary key (pid, date)
);
insert into test values
(1,'2014-10-1'), (1,'2014-10-2'), (1,'2014-10-3'), (1,'2014-10-5'),
(1,'2014-10-7'), (2,'2014-10-1'), (2,'2014-10-2'), (2,'2014-10-3'),
(2,'2014-10-5'), (2,'2014-10-7');
The principle is simple. A streak of distinct, consecutive dates minus row_number() is a constant. You can group by the constant, and take the dense_rank() over that result.
with grouped_dates as (
select pid, date,
(date - (row_number() over (partition by pid order by date) || ' days')::interval)::date as grouping_date
from test
)
select * , dense_rank() over (partition by grouping_date order by date) as in_streak
from grouped_dates
order by pid, date
pid date grouping_date in_streak
--
1 2014-10-01 2014-09-30 1
1 2014-10-02 2014-09-30 2
1 2014-10-03 2014-09-30 3
1 2014-10-05 2014-10-01 1
1 2014-10-07 2014-10-02 1
2 2014-10-01 2014-09-30 1
2 2014-10-02 2014-09-30 2
2 2014-10-03 2014-09-30 3
2 2014-10-05 2014-10-01 1
2 2014-10-07 2014-10-02 1