I have the following table Jobs:
|Id | StartDateTime | EndDateTime
+----+---------------------+----------------------
|1 | 2020-10-20 23:00:00 | 2020-10-21 05:00:00
|2 | 2020-10-21 10:00:00 | 2020-10-21 11:00:00
Note job id 1 spans October 20 and 21.
I am using the following query
SELECT DAY(StartDateTime), COUNT(id)
FROM Job
GROUP BY DAY(StartDateTime)
To get the following output. But the problem I am facing is that day 21 is not including job id 1. Since the job spans two days I want to include it in both days 20 and 21.
Day | TotalJobs
----+----------
20 | 1
21 | 1
I am struggling to get the following expected output:
Day | TotalJobs
----+----------
20 | 1
21 | 2
One method is to generate the days that you want and then count overlaps:
with days as (
select convert(date, min(j.startdatetime)) as startd,
convert(date, max(j.enddatetime)) as endd
from jobs j
union all
select dateadd(day, 1, startd), endd
from days
where startd < endd
)
select days.startd, count(j.id)
from days left join
jobs j
on j.startdatetime < dateadd(day, 1, startd) and
j.enddatetime >= startd
group by days.startd;
Here is a db<>fiddle.
You can first group by with same start and end date and then group by for start and end date having different start and end date
SELECT a.date, SUM(counts) from (
SELECT DAY(StartDateTime) as date, COUNT(id) counts
FROM Table1
WHERE DAY(StartDateTime) = DAY(EndDateTime)
GROUP BY StartDateTime
UNION ALL
SELECT DAY(EndDateTime), COUNT(id)
FROM Table1
WHERE DAY(StartDateTime) != DAY(EndDateTime)
GROUP BY EndDateTime
UNION ALL
SELECT DAY(StartDateTime), COUNT(id)
FROM Table1
WHERE DAY(StartDateTime) != DAY(EndDateTime)
GROUP BY StartDateTime) a
GROUP BY a.date
Here is SQL Fiddle link
SQL Fiddle
Also replace Table1 with Jobs when running over your db context
I have the following data set:
I want to create a new column that sums the last 7 days of sales. So the query result should look be the following:
Pls help
Thanks!
In standard SQL, you would use a window function -- assuming you have data for each day:
select t.*,
sum(sales) over (partition by itemid order by date rows between 6 preceding and current row) as sales_7
from t;
use sum() aggregate function and group by
select country,itemid,year,monthnumber,week sum(sales) as sales_last_7days from your_table
where date>=DATEADD(day, -7, getdate()) and date< getdate()
group by country,itemid,year,monthnumber,week
with window:
select (list other columns here), sum(sum(sales)) over
(partition by week
order by day
rows between 6 preceding and current row)
from table
group by date, week;
note that week doesen't change group by beacause a date is reffered to one week only, but it is needed in window.
Seems you are working with SQL Server if so, then you can use apply :
select t.*, t1.[last7day]
from table t outer apply
(select sum(t1.sales) as [last7day]
from table t1
where t.itemid = t1.itemid and
t1.date <= dateadd(day, -6, t.dt)
) t1;
If you don't have exactly one day for each row, for example if you have a list of transactions...
The below example completely confused me the first time I saw it, so I've tried to comment as much as I can to explain what's happening.
Suppose we have a table tbl with date column dt and amount column amt, and for each date in tbl we want to return a rolling sum of the amount from the current day and the past 6 days.
select distinct -- see note after code on what this distinct is doing.
dt
, ( -- Has to be in brackets to denote we're returning 1 value per row.
-- for each row of T1:
select sum(b.amt) -- the sum of amounts in T2. The where clause will restrict which rows in T2 will be summed.
from tbl T2
where T2.dt between T1.dt - 6 and T1.dt -- for each row in T1, give me all rows in T2 where the date is between 6 days before this T1 row's date and T1 row's date, giving us our rolling sum
-- WARNING: CHECK YOUR VERSION OF SQL FOR HOW TO SUBTRACT DAYS FROM A DATE, I'VE MADE IT (T1.dt - 6) FOR SIMPLICITY
-- we don't need a group by, because we're returning one value for each row in T1
)
from tbl T1
We have a main version of tbl, aliased T1. We then have a secondary table, aliased T2. For each row in T1, we're going to ask for a set of rows in T2 that we're going to sum before giving it to our main query.
To understand what's happening, run the code without the distinct. You'll notice that we have the same number of rows as in tbl, because the T2 statement is happening for every row in T1.
Notes:
If you have any days for which no rows exist in your table you will not get a calculation for this day. To be certain this doesn't happen, join your table to a table containing a distinct list of consecutive dates, and use this as your date column.
If you have nulls in your amount column the calculation will still work, but if the rolling average contains only nulls you will have null instead of 0 as your result. If that troubles you convert all your nulls to zero's before (or after) you use the query.
The beginning of the period will have a 'ramp up'. But this would be the same whatever method you use to do a rolling sum. If it bothers you, don't return the first 6 days.
Finally a worked example if you're playing along at home using SQL Server:
with tbl as (
-- a list of transactions from 1.10.2019 to 14.10.2019
select cast('2019-10-01' as date) dt, 1 amt
union select cast('2019-10-02' as date), 4
union select cast('2019-10-01' as date), 10
union select cast('2019-10-03' as date), 3
union select cast('2019-10-04' as date), 20
union select cast('2019-10-04' as date), 2
union select cast('2019-10-04' as date), 12
union select cast('2019-10-04' as date), 17
union select cast('2019-10-05' as date), null -- a whole week of null values because we all had the week off... I hope this data wasn't important
union select cast('2019-10-06' as date), null
union select cast('2019-10-07' as date), null
union select cast('2019-10-08' as date), null
union select cast('2019-10-09' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-10' as date), null
union select cast('2019-10-11' as date), null
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-12' as date), 1
union select cast('2019-10-13' as date), 2
union select cast('2019-10-14' as date), 1000
)
select distinct
a.dt
, (
select sum(b.amt)
from tbl b
where b.dt between dateadd(dd, -6, a.dt) and a.dt
) past_7_days_amt
from tbl a
Returns:
+------------+-----------------+
| dt | past_7_days_amt |
+------------+-----------------+
| 2019-10-01 | 11 |
| 2019-10-02 | 15 |
| 2019-10-03 | 18 |
| 2019-10-04 | 69 |
| 2019-10-05 | 69 |
| 2019-10-06 | 69 |
| 2019-10-07 | 69 |
| 2019-10-08 | 58 |
| 2019-10-09 | 54 |
| 2019-10-10 | 51 |
| 2019-10-11 | NULL |
| 2019-10-12 | 1 |
| 2019-10-13 | 3 |
| 2019-10-14 | 1003 |
+------------+-----------------+
I have a BiqQuery query that basically takes a date as a parameter and calculates the number of active users our app had near that date.
Right now, if I want to make a graph over a year of active users, I have to run the query 12 times (once per month) and collate the results manually, which is error-prone and time consuming.
Is there a way to make a single bigquery query that runs the subquery 12 times and puts the results on 12 different rows?
For example, if my query is
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
How can I get a table like
| Date | Count |
|------------|---------|
| 2017-01-01 | 50000 |
| 2017-02-01 | 40000 |
| 2017-03-01 | 30000 |
| 2017-04-01 | 20000 |
| 2017-05-01 | 10000 |
Supposing that you have a column called date and one called user_id and you want to calculate distinct users on a monthly basis, you can run a query such as:
#standardSQL
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;
(Here you can replace YourTable with the subquery that you want to run). As a self-contained example:
#standardSQL
WITH YourTable AS (
SELECT DATE '2017-06-25' AS date, 10 AS user_id UNION ALL
SELECT DATE '2017-05-04', 11 UNION ALL
SELECT DATE '2017-06-20', 10 UNION ALL
SELECT DATE '2017-04-01', 11 UNION ALL
SELECT DATE '2017-06-02', 12 UNION ALL
SELECT DATE '2017-04-13', 10
)
SELECT
DATE_TRUNC(date, MONTH) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM YourTable
GROUP BY month
ORDER BY month ASC;
Elliot taught me UNION ALL and it seemed to do the trick:
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-01-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-02-01'
UNION ALL
SELECT COUNT(*) FROM MyTable WHERE activityTime < date '2017-03-01'
Maybe there's a nicer way to parameterize the dates in the WHERE clause, but this did the trick for me.
When I want to group a bunch of time stamps by day, by
CONVERT (datetime, CONVERT (varchar, dbo.MEASUREMENT_Battery.STAMP, 101))
it produces for me a "day" stamp that SQL Server still views as a date and can be sorted and used as such.
What I'm trying to figure out is if it's possible to do the same thing by hour. I tried
CAST(DATEPART(Month, STAMP) AS varchar) + '/' + CAST(DATEPART(Day, STAMP) AS varchar) + '/' + CAST(DATEPART(Year, STAMP) AS varchar) + ' ' + CAST(DATEPART(Hour, STAMP) AS varchar) + ':00:00.000'
and this "works" but SQL Server doesn't view this as a date anymore so I can't sort properly.
The end result I want is right though: ex: 9/9/2015 9:00:00.000
Do NOT convert into a string, until you absolutely have to "present" the result.
CONVERT() or FORMAT() return string representations of temporal information
The following method returns a datetime value truncated to the hour without resorting to string manipulation (and hence fast).
select
dateadd(hour, datediff(hour,0, dbo.MEASUREMENT_Battery.STAMP ), 0)
, count(*)
from dbo.MEASUREMENT_Battery
group by
dateadd(hour, datediff(hour,0, dbo.MEASUREMENT_Battery.STAMP ), 0)
SQL Fiddle
MS SQL Server 2014 Schema Setup:
CREATE TABLE MEASUREMENT_Battery
([STAMP] datetime)
;
INSERT INTO MEASUREMENT_Battery
([STAMP])
VALUES
('2015-11-12 07:40:15'),
('2015-11-12 08:40:15'),
('2015-11-12 09:40:15'),
('2015-11-12 10:40:15'),
('2015-11-12 11:40:15'),
('2015-11-12 12:40:15'),
('2015-11-12 13:40:15'),
('2015-11-12 14:40:15')
;
NOTE: the output below for column [Stamp] is the default display
Results:
| | |
|----------------------------|---|
| November, 12 2015 07:00:00 | 1 |
| November, 12 2015 08:00:00 | 1 |
| November, 12 2015 09:00:00 | 1 |
| November, 12 2015 10:00:00 | 1 |
| November, 12 2015 11:00:00 | 1 |
| November, 12 2015 12:00:00 | 1 |
| November, 12 2015 13:00:00 | 1 |
| November, 12 2015 14:00:00 | 1 |
If you absolutely insist on dipay of a date/time value a paricular way, then you may add the display format in the select clause (but not needed in the group by clause!)
select
FORMAT(dateadd(hour, datediff(hour,0, dbo.MEASUREMENT_Battery.STAMP ), 0) , 'MM/dd/yyyy HH')
, count(*)
from dbo.MEASUREMENT_Battery
group by
dateadd(hour, datediff(hour,0, dbo.MEASUREMENT_Battery.STAMP ), 0)
What happens is that when you use the DateTime Style 101 (at the end of the second CONVERT) the Date will be converted to mm/dd/yyyy and the time to 00:00:00.000 always as stated here an:
https://msdn.microsoft.com/en-us/library/ms187928.aspx
Now, from what I understand from your question is that you would like to include the hour as well and this can be done like this:
SELECT FORMAT(STAMP , 'MM/dd/yyyy HH') + ':00:00.000'
Note:
':00:00.000' is optional and is just for a nicer output.
This only works in SQL Server 2012 and later version.
Testing with some test date we will see that we get the expected result:
-- Drop temp table if it exists
IF OBJECT_ID('tempdb..#T') IS NOT NULL DROP TABLE #T
-- Create temp table
CREATE TABLE #T ( myDate DATETIME )
-- Insert dummy values
INSERT INTO #T VALUES ( '2015-12-25 14:00:00.000' ) -- 14
INSERT INTO #T VALUES ( '2015-12-25 14:00:00.000' ) -- 14
INSERT INTO #T VALUES ( '2015-12-25 15:00:00.000' )
INSERT INTO #T VALUES ( '2015-12-25 16:00:00.000' )
INSERT INTO #T VALUES ( '2015-12-25 17:00:00.000' ) -- 17
INSERT INTO #T VALUES ( '2015-12-25 17:00:00.000' ) -- 17
-- Select query
SELECT COUNT( myDate ), MAX( FORMAT( myDate , 'MM/dd/yyyy HH') + ':00:00.000' ) FROM #T
GROUP BY DATEPART( hour, myDate )
Output:
2 12/25/2015 14:00:00.000
1 12/25/2015 15:00:00.000
1 12/25/2015 16:00:00.000
2 12/25/2015 17:00:00.000
I'm using SQL Server Management Studio 2008 for query creation. Reporting Services 2008 for report creation.
I have been trying to work this out over a couple of weeks and I have hit a brick wall. I’m hoping someone will be able to come up with the solution as right now my brain has turned to mush.
I am currently developing an SQL query that will be feeding data through to a Reporting Services report. The aim of the report is to show the percentage of availability for first aid providers within locations around the county we are based in. The idea is that there should be only one first aider providing cover at a time at each of our 20 locations.
This has all been working fine apart from the first aiders at one location have been overlapping their cover at the start and end of each period of cover.
Example of cover overlap:
| Location | start_date | end_date |
+----------+---------------------+---------------------+
| Wick | 22/06/2015 09:00:00 | 22/06/2015 19:00:00 |
| Wick | 22/06/2015 18:30:00 | 23/06/2015 09:00:00 |
| Wick | 23/06/2015 09:00:00 | 23/06/2015 18:30:00 |
| Wick | 23/06/2015 18:00:00 | 24/06/2015 09:00:00 |
+----------+---------------------+---------------------+
In a perfect world the database that they set their cover in wouldn’t allow them to do this but it’s an externally developed database that doesn’t allow us to make changes like that to it. We also aren’t allowed to create functions, stored procedures, tally tables etc…
The query itself should return the number of minutes that each location has had first aid cover for, then broken down into hours of the day. Any overlap in cover shouldn’t end up adding additional cover and should be merged. One person can be on at a time, if they overlap then it should only count as one person lot of cover.
Example Output:
+----------+---------------------+---------------------+----------+--------------+--------+-------+------+----------+
| Location | fromDt | toDt | TimeDiff | Availability | DayN | DayNo | Hour | DayCount |
+----------+---------------------+---------------------+----------+--------------+--------+-------+------+----------+
| WicK | 22/06/2015 18:00:00 | 22/06/2015 18:59:59 | 59 | 100 | Monday | 1 | 18 | 0 |
| WicK | 22/06/2015 18:30:00 | 22/06/2015 18:59:59 | 29 | 50 | Monday | 1 | 18 | 0 |
| WicK | 22/06/2015 19:00:00 | 22/06/2015 19:59:59 | 59 | 100 | Monday | 1 | 19 | 0 |
+----------+---------------------+---------------------+----------+--------------+--------+-------+------+----------+
Example Code:
DECLARE
#StartTime datetime,
#EndTime datetime,
#GivenDate datetime;
SET #GivenDate = '2015-06-22';
SET #StartTime = #GivenDate + ' 00:00:00';
SET #EndTime = '2015-06-23' + ' 23:59:59';
Declare #Sample Table
(
Location Varchar(50),
StartDate Datetime,
EndDate Datetime
)
Insert #Sample
Select
sta.location,
act.Start,
act.END
from emp,
con,
sta,
act
where
emp.ID = con.ID
and con.location = sta.location
and SUBSTRING(sta.ident,3,2) in ('51','22')
and convert(varchar(10),act.start,111) between #GivenDate and #EndTime
and act.ACT= 18
group by sta.location,
act.Start,
act.END
order by 2
;WITH Yak (location, fromDt, toDt, maxDt,hourdiff)
AS (
SELECT location,
StartDate,
/*check if the period of cover rolls onto the next hour */
convert(datetime,convert(varchar(21),
CONVERT(varchar(10),StartDate,111)+' '
+convert(varchar(2),datepart(hour,StartDate))+':59'+':59'))
,
EndDate
,dateadd(hour,1,dateadd(hour, datediff(hour, 0, StartDate), 0))-StartDate
FROM #Sample
UNION ALL
SELECT location,
dateadd(second,1,toDt),
dateadd(hour, 1, toDt),
maxDt,
hourdiff
FROM Yak
WHERE toDt < maxDt
) ,
TAB1 (location, FROMDATE,TODATE1,TODATE) AS
(SELECT
location,
#StartTime,
convert(datetime,convert(varchar(21),
CONVERT(varchar(10),#StartTime,120)+' '
+convert(varchar(2),datepart(hour,#StartTime))+':59'+':59.999')),
#EndTime
from #Sample
UNION ALL
SELECT
location,
(DATEADD(hour, 1,(convert(datetime,convert(varchar(21),
CONVERT(varchar(10),FROMDATE,120)+' '
+convert(varchar(2),datepart(hour,FROMDATE))+':00'+':00.000')))))ToDate,
(DATEADD(hour, 1,(convert(datetime,convert(varchar(21),
CONVERT(varchar(10),TODATE1,120)+' '
+convert(varchar(2),datepart(hour,TODATE1))+':59'+':59.999'))))) Todate1,
TODATE
FROM TAB1 WHERE TODATE1 < TODATE
),
/*CTE Tab2 adds zero values to all possible hours between start and end dates */
TAB2 AS
(SELECT location, FROMDATE,
CASE WHEN TODATE1 > TODATE THEN TODATE ELSE TODATE1 END AS TODATE
FROM TAB1)
SELECT location,
fromDt,
/* Display MaxDT as start time if cover period goes into next dat */
CASE WHEN toDt > maxDt THEN maxDt ELSE toDt END AS toDt,
/* If the end date is on the next day find out the minutes between the start date and the end of the day or find out the minutes between the next day and the end date */
Case When ToDt > Maxdt then datediff(mi,fromDt,maxDt) else datediff(mi,FromDt,ToDt) end as TimeDiff,
Case When ToDt > Maxdt then round(datediff(S,fromDt,maxDt)/3600.0*100,0) else round(datediff(S,FromDt,ToDt)/3600.0*100.0,0) end as Availability,
/*Display the name of the day of the week*/
CASE WHEN toDt > maxDt THEN datename(dw,maxDt) ELSE datename(dw,fromDt) END AS DayN,
CASE WHEN toDt > maxDt THEN case when datepart(dw,maxDt)-1 = 0 then 7 else datepart(dw,maxDt)-1 end ELSE case when datepart(dw,fromDt)-1 = 0 then 7 else datepart(dw,fromDt)-1 END end AS DayNo
,DATEPART(hour, fromDt) as Hour,
'0' as DayCount
FROM Yak
where Case When ToDt > Maxdt then datediff(mi,fromDt,maxDt) else datediff(mi,FromDt,ToDt) end <> 0
group by location,fromDt,maxDt,toDt
Union all
SELECT
tab2.location,
convert(varchar(19),Tab2.FROMDATE,120),
convert(varchar(19),Tab2.TODATE,120),
'0',
'0',
datename(dw,FromDate) DayN,
case when datepart(dw,FromDate)-1 = 0 then 7 else datepart(dw,FromDate)-1 end AS DayNo,
DATEPART(hour, fromDate) as Hour,
COUNT(distinct datename(dw,fromDate))
FROM TAB2
Where datediff(MINUTE,convert(varchar(19),Tab2.FROMDATE,120),convert(varchar(19),Tab2.TODATE,120)) > 0
group by location, TODATE, FROMDATE
Order by 2
option (maxrecursion 0)
I have tried the following forum entries but they haven't worked in my case:
http://forums.teradata.com/forum/general/need-help-merging-consecutive-and-overlapping-date-spans
Checking for time range overlap, the watchman problem [SQL]
Calculate Actual Downtime ignoring overlap in dates/times
Sorry for being so lengthy but I thought I would try to give you as much detail as possible. Any help will be really appreciated. Thank you.
So the solution I came up with uses temp tables, which you can easily change to be CTEs so you can avoid using a stored procedure.
I tried working with window functions to find overlapping records and get the min and max times, the issue is where you have overlap chaining e.g. 09:00 - 09:10, 09:05 - 09:15, 09:11 - 09:20, so all minutes from 09:00 to 09:20 are covered, but it's almost impossible to tell that 09:00 - 09:10 is related to 09:11 - 09:20 without recursing through the results until you get to the bottom of the chain. (Hopefully that makes sense).
So I exploded out all of the date ranges into every minute between the StartDate and EndDate, then you can use the ROW_NUMBER() window function to catch any duplicates, which in turn you can use to see how many different people covered the same minute.
CREATE TABLE dbo.dates
(
Location VARCHAR(64),
StartDate DATETIME,
EndDate DATETIME
);
INSERT INTO dbo.dates VALUES
('Wick','20150622 09:00:00','20150622 19:00:00'),
('Wick','20150622 18:30:00','20150624 09:00:00'),
('Wick','20150623 09:00:00','20150623 18:30:00'),
('Wick','20150623 18:00:00','20150624 09:00:00'),
('Wick','20150630 09:00:00','20150630 09:30:00'),
('Wick','20150630 09:00:00','20150630 09:45:00'),
('Wick','20150630 09:10:00','20150630 09:25:00'),
('Wick','20150630 09:35:00','20150630 09:55:00'),
('Wick','20150630 09:57:00','20150630 10:10:00');
SELECT ROW_NUMBER() OVER (PARTITION BY Location ORDER BY StartDate) [Id],
Location,
StartDate,
EndDate
INTO dbo.overlaps
FROM dbo.dates;
SELECT TOP 10000 N=IDENTITY(INT, 1, 1)
INTO dbo.Num
FROM master.dbo.syscolumns a CROSS JOIN master.dbo.syscolumns b;
SELECT 0 [N] INTO dbo.Numbers;
INSERT INTO dbo.Numbers SELECT * FROM dbo.Num;
SELECT [Location] = raw.Location,
[WorkedDate] = CAST([MinuteWorked] AS DATE),
[DayN] = DATENAME(WEEKDAY, [MinuteWorked]),
[DayNo] = DATEPART(WEEKDAY, [MinuteWorked]) -1,
[Hour] = DATEPART(HOUR, [MinuteWorked]),
[MinutesWorked] = SUM(IIF(raw.[Minutes] = 1, 1, 0)),
[MaxWorkers] = MAX(raw.[Minutes])
FROM
(
SELECT
o.Location,
DATEADD(MINUTE, n.N, StartDate) [MinuteWorked],
ROW_NUMBER() OVER (PARTITION BY o.Location, DATEADD(MINUTE, n.N, StartDate) ORDER BY DATEADD(MINUTE, n.N, StartDate)) [Minutes]
FROM dbo.overlaps o
INNER JOIN dbo.Numbers n ON n.N < DATEDIFF(MINUTE, StartDate, EndDate)
) raw
GROUP BY
raw.Location,
CAST([MinuteWorked] AS DATE),
DATENAME(WEEKDAY, [MinuteWorked]),
DATEPART(WEEKDAY, [MinuteWorked]) - 1,
DATEPART(HOUR, [MinuteWorked])
Here's a subset of the results:
Location WorkedDate DayN DayNo Hour MinutesWorked MaxWorkers
Wick 2015-06-24 Wednesday 3 8 60 2
Wick 2015-06-30 Tuesday 2 9 58 3
Wick 2015-06-30 Tuesday 2 10 10 1
Here's the fiddle