SQL How to group by but with special conditions - sql

I have a SQL Server 2008 table where I have a list of employees with timestamps.
I have a script that groups by employee the dates.
What I need is to group by employee but I have to exclude the timestamps that are in the same day and the difference between them are less than 8 hours.
Here is a table that explains better:
I created a SQL Fiddle with the table and sample data.
http://sqlfiddle.com/#!3/3b956/1
Any clue?

What you really want is lag(), which is in SQL Server 2012+. With lag(), you would do:
select t.*
from (select t.*, lag(date) over (partition by EmployeeId order by date) as prev_date
from t
) t
where not (cast(prev_date as date) = cast(date as date) and
date <= dateadd(hour, 8, prev_date)
) or
prev_date is null;
In SQL Server 2008, you can do something similar with outer apply:
select t.*
from t outer apply
(select top 1 prev.*
from t prev
where prev.Employee_id = t.EmployeeId and
prev.date < t.date and
cast(prev.date as date) = cast(t.date as date)
order by prev.date desc
) prev
where prev.date is null or
t.date > dateadd(hour, 8, prev.date);
You may need an order by to maintain the same ordering.

This should also work by excluding rows for which there exist previuos row with diffrence less than 8 hours:
select p1.employeeid, count(*) as [count]
from punch p1
where not exists(select * from punch p2
where p2.employeeid = p1.employeeid and p2.id < p1.id and
dateadd(hour, 8, p2.date) > p1.date)
group by p1.employeeid

Related

How to extrapolate dates in SQL Server to calculate the daily counts?

This is how the data looks like. It's a long table
I need to calculate the number of people employed by day
How to write SQL Server logic to get this result? I treid to create a DATES table and then join, but this caused an error because the table is too big. Do I need a recursive logic?
For future questions, don't post images of data. Instead, use a service like dbfiddle. I'll anyhow add a sketch for an answer, with a better-prepared question you could have gotten a complete answer. Anyhow here it goes:
-- extrema is the least and the greatest date in staff table
with extrema(mn, mx) as (
select least(min(hired),min(retired)) as mn
, greatest(max(hired),max(retired)) as mx
from staff
), calendar (dt) as (
-- we construct a calendar with every date between extreme values
select mn from extrema
union all
select dateadd(day, 1, d)
from calendar
where dt < (select mx from extrema)
)
-- finally we can count the number of employed people for each such date
select dt, count(1)
from calendar c
join staff s
on c.dt between s.hired and s.retired
group by dt;
If you find yourself doing this kind of calculation often, it is a good idea to create a calendar table. You can add other attributes to it such as if it is a day of in the middle of the week etc.
With a constraint as:
CHECK(hired <= retired)
the first part can be simplified to:
with extrema(mn, mx) as (
select min(hired) as mn
, max(retired) as mx
from staff
),
Assuming Current Employees have a NULL retirement date
Declare #Date1 date = '2015-01-01'
Declare #Date2 date = getdate()
Select A.Date
,HeadCount = count(B.name)
From ( Select Top (DateDiff(DAY,#Date1,#Date2)+1)
Date=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),#Date1)
From master..spt_values n1,master..spt_values n2
) A
Left Join YourTable B on A.Date >= B.Hired and A.Date <= coalesce(B.Retired,getdate())
Group BY A.Date
You need a calendar table for this. You start with the calendar, and LEFT JOIN everything else, using BETWEEN logic.
You can use a real table. Or you can generate it on the fly, like this:
WITH
L0 AS ( SELECT c = 1
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT c = 1 FROM L0 A, L0 B, L0 C, L0 D ),
Nums AS ( SELECT rownum = ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM L1 ),
Dates AS (
SELECT TOP (DATEDIFF(day, '20141231', GETDATE()))
Date = DATEADD(day, rownum, '20141231')
FROM Nums
)
SELECT
d.Date,
NumEmployed = COUNT(*)
FROM Dates d
JOIN YourTable t ON d.Date BETWEEN t.Hired AND t.Retired
GROUP BY
d.Date;
If your dates have a time component then you need to use >= AND < logic
Try limiting the scope of your date table. In this example I have a table of dates named TallyStickDT.
SELECT dt, COUNT(name)
FROM (
SELECT dt
FROM tallystickdt
WHERE dt >= (SELECT MIN(hired) FROM #employees)
AND dt <= GETDATE()
) A
LEFT OUTER JOIN #employees E ON A.dt >= E.Hired AND A.dt <= e.retired
GROUP BY dt
ORDER BY dt

Find the max date to last one year transaction for each group

I have to query in sql server where I have to find for each id it's volume such that we have last 1 year date for each id with it's volume.
for example below is my data ,
for each id I need to query the last 1 year transaction from when we have the entry for that id as you can see from the snippet for id 1 we have the latest date as 7/31/2020 so I need the last 1 year entry from that date for that id, The highlighted one is exclude because that date is more than 1 year from the latest date for that id
Similarly for Id 3 we have all the date range in one year from the latest date for that particular id
I tried using the below query and I can get the latest date for each id but I am not sure how to extract all the dates for each id from the latest date to one year, I would appreciate if some one could help me.
I am using Microsoft sql server would need the query which executes in sql server, Table name is emp and have millions of id
Select *
From emp as t
inner join (
Select tm.id, max(tm.date_tran) as MaxDate
From emp tm
Group by tm.id
) tm on t.id = tm.id and t.date_tran = tm.MaxDate
To exclude transactions where the date difference between the tran_date and the maximum tran_date for each id is greater than 1 year, something like this:
;with max_cte(id, max_date) as (
Select id, max(date_tran)
From emp tm
Group by id )
Select *
From emp e
join max_cte mc on e.id=mc.id
and datediff(d, e.date_tran, mc.max_date)<=365;
Update: per comments, added volume. Thnx GMB :)
;with max_cte(id, date_tran, volume, max_date) as (
Select *, dateadd(year, -1, max(date_tran) over(partition by id)) max_date
From #emp tm)
Select id, sum(volume) sum_volume
From max_cte mc
where mc.date_tran>max_date
group by id;
You can do this with window functions:
select id, sum(volume) total_volume
from (
select t.*, max(date_tran) over(partition by id) max_date_tran
from mytable t
) t
where date_tran > dateadd(year, -1, max_date_tran)
group by id
Alternatively, you can use a correlated subquery for filtering:
select id, sum(volume) total_volume
from mytable t
where t.date_tran > (
select dateadd(year, -1, max(t1.date_tran))
from mytable t1
where t1.id = t.id
)
The second query would take advantage of an index on (id, date_tran).
this should do the trick for you:
SELECT
*
FROM
emp
JOIN
(
SELECT
MAX(date_tran) max_date_tran
, Id
FROM
emp
GROUP BY
id
) emp2
ON emp2.Id = emp.Id
AND DATEADD(YEAR, -1, emp2.max_date_tran) <= emp.date_tran;
Your code is good. Just add the date difference function to get the particular time in between the transaction, like the following:
Select *
From emp as t
inner join ( Select id as id, max(date_tran) as maxdate
From emp tm
Group by id
) tm on t.id = tm.id and datediff(d, e.date_tran, mc.maxdate)<=365;

Order by decimals for counts of same value

EDIT: I have edited this question to make the query simpler:
ReportTracking:
Userid, ReportId, Duration, CreatedDate
Query:
SELECT t.UserId, COUNT(DISTINCT(t.ReportId)) AS ReportsRead
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId
Sample Result:
UserId ReportsRead
1 22
2 13
3 2
4 2
5 2
What I need to do is assign a number value to Reports Read. Essentially because there are 3 users who read swimming and they tie in terms of ranking (they each have 2 read only) I need to order them by who read the report last. I need to assign them all a decimal number value based on order of reading. So the person who read the report last would get .1, the person who read it first would get .3.
I'm not quite sure how to achieve this, the key part is that they do have have a decimal number value that ranks them and this decimal should be few decimal points long as the records are rather long. My idea was to use DateCreated and convert it a number value which I can substract from a max. But since there are multiple dates (one for each report), I'm not sure how to grab the latest one and only use that date with my report count.
I'm not sure why you need to assign decimals...
Just order by ReportsRead desc, max(createdDate) (this should be most recent read for a user in the select).
Also distinct isn't a function it's a statement. No need for the ()
SELECT t.UserId
, COUNT(DISTINCT t.ReportId) AS ReportsRead
max(t.createDate) Asc) RN
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId
ORDER BY ReportsRead DESC, max(createdDate)
if you need the numbers and plan on displaying them
WITH CTE AS (
SELECT t.UserId
, COUNT(DISTINCT t.ReportId) AS ReportsRead
, row_number() over (partition by count(Distinct t.reportID) order by max(t.createDate) Asc) RN
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId)
SELECT *
FROM CTE
ORDER BY ReportsRead DESC, RN
You can rank your rows within ReportsRead partition to obtain a ranking by ordering on the max(createddate). documentation: SQL Server Rank function
here is an example: http://sqlfiddle.com/#!18/1eefc/11
You may simplify the query by using CTE to reuse column aliases but the concept is:
SELECT t.UserId
, COUNT(DISTINCT( t.ReportId )) AS ReportsRead
, CAST(RANK()
OVER(
partition BY COUNT(DISTINCT( t.ReportId ))
ORDER BY MAX(t.createdDate) DESC) AS DECIMAL) / 10 ranking
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId
ORDER BY ReportsRead DESC
, ranking;

How to get value by a range of dates?

I have a table like so
And With this code I get the 5 latest values for each domainId
;WITH grp AS
(
SELECT DomainId, [Date],Passed, DatabasePerformance,ServerPerformance,
rn = ROW_NUMBER() OVER
(PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
FROM grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
WHERE rn < 7 AND t.date != g.[Date]
ORDER BY DomainId, [Date] DESC
What I Want
Well I would like to know how many tickets were sold for each of these 5 latest rows but with the following condition:
Each of these rows come with their own date which differs.
for each date I want to check how many were sold the last 15minutes AND how many were sold the last 30mns.
Example:
I get these 5 rows for each domainId
I want to extend the above with two columns, "soldTicketsLast15" and "soldTicketsLast30"
The date column contains all the dates I need and for each of these dates I want to go back 15 min and go back 30min to and get how many tickets were sold
Example:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -15, '2016-04-12 12:10:28.2270000')
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -30, '2016-04-12 12:10:28.2270000')
How can i accomplish this?
I'd use OUTER APPLY or CROSS APPLY.
;WITH grp AS
(
SELECT
DomainId, [Date], Passed, DatabasePerformance, ServerPerformance,
rn = ROW_NUMBER() OVER (PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT
g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
,A15.SoldTicketsLast15
,A30.SoldTicketsLast30
FROM
grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast15
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -15, g.[Date])
) AS A15
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast30
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -30, g.[Date])
) AS A30
WHERE
rn < 7
AND T.[date] != g.[Date]
ORDER BY DomainId, [Date] DESC;
To make the correlated APPLY queries efficient there should be an appropriate index, like the following:
CREATE NONCLUSTERED INDEX [IX_DomainId_Date] ON [dbo].[DomainDetailDataHistory]
(
[DomainId] ASC,
[Date] ASC
)
INCLUDE ([SoldTickets])
This index may also help to make the main part of your query (grp) efficient.
If I understood your question correctly, you want to get the tickets sold from one of your dates (in the Date column) going back 15 minutes and 30 minutes. Assuming that you are using your DATEADD function correctly, the following should work:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] BETWEEN [DATE] AND DATEADD(minute, -15, '2016-04-12 12:10:28.2270000') GROUP BY [SoldTickets]
The between operator allows you to retrieve results between two date parameters. In the SQL above, we also need a group by since you are using a GROUPING function (MAX). The group by would depend on what you want to group by but I think in your case it would be SoldTickets.
The SQL above will give you the ones between the date and 15 minutes back. You could do something similar with the 30 minutes back.

SQL adding missing dates to query

I'm trying to add missing dates to a SQL query but it does not work.
Please can you tell me what I'm doing wrong.
I only have read only rights to database.
SQL query:
With cteDateGen AS
(
SELECT 0 as Offset, CAST(DATEADD(dd, 0, '2015-11-01') AS DATE) AS WorkDate
UNION ALL
SELECT Offset + 1, CAST(DATEADD(dd, Offset, '2015-11-05') AS DATE)
FROM cteDateGen
WHERE Offset < 100
), -- generate date from to --
cte AS (
SELECT COUNT(*) OVER() AS 'total' ,ROW_NUMBER()OVER (ORDER BY c.dt DESC) as row
, c.*
FROM clockL c
RIGHT JOIN cteDateGen d ON CAST(c.dt AS DATE) = d.WorkDate
WHERE
c.dt between '2015-11-01' AND '2015-11-05' and
--d.WorkDate BETWEEN '2015-11-01' AND '2015-11-05'
and c.id =10
) -- select user log and add missing dates --
SELECT *
FROM cte
--WHERE row BETWEEN 0 AND 15
--option (maxrecursion 0)
I think your problem is simply the dates in the CTE. You can also simplify it a bit:
With cteDateGen AS (
SELECT 0 as Offset, CAST('2015-11-01' AS DATE) AS WorkDate
UNION ALL
SELECT Offset + 1, DATEADD(day, 1, WorkDate) AS DATE)
-----------------------------------^
FROM cteDateGen
WHERE Offset < 100
), -- generate date from to --
cte AS
(SELECT COUNT(*) OVER () AS total,
ROW_NUMBER() OVER (ORDER BY c.dt DESC) as row,
c.*
FROM cteDateGen d LEFT JOIN
clockL c
ON CAST(c.dt AS DATE) = d.WorkDate AND c.id = 10
-----------------------------------------------^
WHERE d.WorkDate between '2015-11-01' AND '2015-11-05'
) -- select user log and add missing dates --
SELECT *
FROM cte
Notes:
Your query used a constant for the second date in the CTE. The constant was different from the first constant. Hence, it was missing some days.
I think that LEFT JOIN is much easier to follow than RIGHT JOIN. LEFT JOIN is basically "keep all rows in the first table".
The WHERE clause was undoing the outer join in any case. The c.id logic needs to move to the ON clause.
The date arithmetic in the first CTE was unnecessarily complex.