SQL Server: Average counts by hour and day of week - sql

Background
I have a table set up in a SQL Server environment that contains a log of various activity that I'm tracking. Particular log items use unique codes to categorize what activity is taking place and a datetime field tracks when that activity occurred.
Problem
I would like to, using either a single query or a stored procedure, get an average of hourly counts of activity, grouped by day of the week. Example:
Day | Hour | Average Count
-------------------------------
Monday | 8 | 5
Monday | 9 | 5
Monday | 10 | 9
...
Tuesday | 8 | 4
Tuesday | 9 | 3
...etc
Right now I've got a query setup that spits out the counts per hour per day, but my problem is taking it a step further and getting average by day of week. Here's my current query:
SELECT CAST([time] AS date) AS ForDate,
DATEPART(hour, [time]) AS OnHour,
COUNT(*) AS Totals
FROM [log] WHERE [code] = 'tib_imp.8'
GROUP BY CAST(time AS date),
DATEPART(hour,[time])
ORDER BY ForDate Asc, OnHour Asc
Any suggestions as to how I might accomplish this?
Thanks in advance!

Guessing here:
SELECT [Day], [Hour], [DayN], AVG(Totals) AS [Avg]
FROM
(
SELECT
[Day] = DATENAME(WEEKDAY, [time]),
[DayN] = DATEPART(WEEKDAY, [time]),
[Hour] = DATEPART(HOUR, [time]),
Totals = COUNT(*)
FROM dbo.[log]
WHERE [code] = 'tib_imp.8'
GROUP BY
DATENAME(WEEKDAY, [time]),
DATEPART(WEEKDAY, [time]),
DATEPART(HOUR, [time])
) AS q
GROUP BY [Day], [Hour], [DayN]
ORDER BY DayN;
Again, without data, I might once again be throwing a handful of mud at the wall and hoping it sticks, but perhaps what you need is:
SELECT [Day], [Hour], [DayN], AVG(Totals) AS [Avg]
FROM
(
SELECT
w = DATEDIFF(WEEK, 0, [time]),
[Day] = DATENAME(WEEKDAY, [time]),
[DayN] = DATEPART(WEEKDAY, [time]),
[Hour] = DATEPART(HOUR, [time]),
Totals = COUNT(*)
FROM dbo.[log]
WHERE [code] = 'tib_imp.8'
GROUP BY
DATEDIFF(WEEK, 0, [time]),
DATENAME(WEEKDAY, [time]),
DATEPART(WEEKDAY, [time]),
DATEPART(HOUR, [time])
) AS q
GROUP BY [Day], [Hour], [DayN]
ORDER BY DayN;
This is also going to produce integer-based averages, so you may want to cast the Totals alias on the inner query to DECIMAL(something,something).

; WITH a AS (
SELECT CAST([time] AS date) AS ForDate
, DATEPART(hour, [time]) AS OnHour
, txtW=DATENAME(WEEKDAY,[time])
, intW=DATEPART(WEEKDAY,[time])
, Totals=COUNT(*)
FROM [log] WHERE [code] = 'tib_imp.8'
GROUP BY CAST(time AS date)
, DATENAME(WEEKDAY,[time])
, DATEPART(WEEKDAY,[time])
, DATEPART(hour,[time])
)
SELECT [Day]=txtW
, [Hour]=OnHour
, [Average Count]=AVG(Totals)
FROM a
GROUP BY txtW, intW, OnHour
ORDER BY intW, OnHour

Related

Average Counts by Hour by Day of Week

This question helped get me part of the way there:
SELECT
[Day],
[Hour],
[DayN],
AVG(Totals) AS [Avg]
FROM
(
SELECT
w = DATEDIFF(WEEK, 0, ForDateTime),
[Day] = DATENAME(WEEKDAY, ForDateTime),
[DayN] = DATEPART(WEEKDAY, ForDateTime),
[Hour] = DATEPART(HOUR, ForDateTime),
Totals = COUNT(*)
FROM
#Visit
GROUP BY
DATEDIFF(WEEK, 0, ForDateTime),
DATENAME(WEEKDAY, ForDateTime),
DATEPART(WEEKDAY, ForDateTime),
DATEPART(HOUR, ForDateTime)
) AS q
GROUP BY
[Day],
[Hour],
[DayN]
ORDER BY
DayN;
How could this be changed so rather than showing the average by Hour, e.g. 9, 10, 11, 12, etc. It shows it by 09:30-10:30,10:30-11:30,11:30-12:30,12:30-13:30 all the way up to 23:30.
A simple approach is to offset ForDateTime by 30 minutes. Basically you just need to replace every occurence of ForDateTime with dateadd(minute, 30, ForDateTime) in the query.
In the resultset, Hour 9 gives you the timeslot from 8:30 to 9:30, and so on.

Is there a better way to group a log on minutes?

I have the following that selects from a log and groups down to minute (excluding seconds and milisec):
SELECT DATEPART(YEAR, [Date]) AS YEAR, DATEPART(MONTH, [Date]) AS MONTH,
DATEPART(DAY, [Date]) AS DAY, DATEPART(HOUR, [Date]) AS HOUR,
DATEPART(MINUTE, [Date]) AS MIN, COUNT(*) AS COUNT
FROM [database].[dbo].[errorlog]
GROUP BY DATEPART(YEAR, [Date]), DATEPART(MONTH, [Date]), DATEPART(DAY, [Date]),
DATEPART(HOUR, [Date]), DATEPART(MINUTE, [Date])
ORDER BY DATEPART(YEAR, [Date]) DESC, DATEPART(MONTH, [Date]) DESC,
DATEPART(DAY, [Date]) DESC, DATEPART(HOUR, [Date]) DESC,
DATEPART(MINUTE, [Date]) DESC;
But as you can see thats a lot of fuzz just for getting a count, so I wonder if there is a better way to group it so I get grouped down to minutes in respect to year, month, day and hour?
This should would work:
select CAST([Date] AS smalldatetime) as time_stamp, count(*) as count
FROM [database].[dbo].[errorlog]
group by CAST([Date] AS smalldatetime)
order by CAST([Date] AS smalldatetime) desc;
Update after comments on this answer:
select dateadd(second,-datepart(ss,[Date]),[Date]) as time_stamp, count(*) as count
FROM [database].[dbo].[errorlog]
group by dateadd(second,-datepart(ss,[Date]),[Date])
order by dateadd(second,-datepart(ss,[Date]),[Date]) desc ;
The first solution rounds up the timestamp to the nearest minute. I realised that this is not exactly what the OP wanted.
So, the second solution just substracts the seconds part from the timestamp and leaves the timestamp with seconds as zero(Assuming [Date] does not have fractional seconds)
DATEADD(minute,DATEDIFF(minute,'20010101',[Date]),'20010101')
Should round all Date column values down to the nearest minute. So:
SELECT DATEADD(minute,DATEDIFF(minute,'20010101',[Date]),'20010101'),
COUNT(*) AS COUNT
FROM [database].[dbo].[errorlog]
GROUP BY DATEADD(minute,DATEDIFF(minute,'20010101',[Date]),'20010101')
ORDER BY DATEADD(minute,DATEDIFF(minute,'20010101',[Date]),'20010101') DESC;
(You could move this expression into a subquery if you want to further reduce the repetition)
You could do something like this to get
declare #now datetime
set #now = GETDATE()
select dateadd(minute, mm, #now) as date, c from (
select DATEDIFF(minute, #now, [Date]) as mm, COUNT(1) as c
from [database].[dbo].[errorlog]
group by DATEDIFF(minute, #now, [Date])
) t

Calculate conversion using SQL

Suppose that there is activities table with two columns: string action and created_at timestamp. And there is only two possible values in action column visit or signup.
I want to see % of signups per visit for each day. Like:
Date | Conversion
--------------------------
2013-01-01 | 30%
2013-01-02 | 27%
2013-01-03 | 15%
2013-01-04 | 22%
Is it possible with a single SQL query?
For SQL server, this should work:
SELECT coalesce(v.y, s.y) y, coalesce(v.m, s.m) m, coalesce(v.d, s.d) d,
100.0 * isnull(s.num, 0) / (isnull(s.num, 0) + isnull(v.num, 0)) conversion
FROM
(
SELECT datepart(YEAR, [date]) y, datepart(m, [date]) m,datepart(d, [date]) d,
count(action) num
FROM actions
WHERE action = 'visit'
GROUP BY datepart(YEAR, [date]), datepart(m, [date]), datepart(d, [date])
) v
FULL OUTER JOIN
(
SELECT datepart(YEAR, [date]) y, datepart(m, [date]) m,datepart(d, [date]) d,
count(action) num
FROM actions
WHERE action = 'signup'
GROUP BY datepart(YEAR, [date]), datepart(m, [date]), datepart(d, [date])
) s
ON v.y = s.y AND v.m = s.m AND v.d = s.d
I leave string concatenation, rounding and general robustness as an exercise for you.
If this is SQL Server 2005+, you could try the following approach:
SELECT
Date,
Conversion = signup * 100.0 / (signup + visit)
FROM (
SELECT
Date = DATEADD(DAY, DATEDIFF(DAY, 0, created_at), 0),
action
FROM activities
) AS a
PIVOT (
COUNT(action) FOR action IN (visit, signup)
) AS p
;
The Date expression strips the created_at timestamp off the time part. In SQL Server 2008 and later versions, it can be replaced with the much cleaner and more efficient operation of casting to the date type:
Date = CAST(created_at AS date)

SQL Hurdle - SQL Server 2008

The following query returns the total amount of orders, per week, for the past 12 months (for a specific customer):
SELECT DATEPART(year, orderDate) AS [year],
DATEPART(month, orderDate) AS [month],
DATEPART(wk, orderDate) AS [week],
COUNT(1) AS orderCount
FROM dbo.Orders (NOLOCK)
WHERE customerNumber = #custnum
AND orderDate >= DATEADD(month, -12, GETDATE())
GROUP BY DATEPART(year, orderDate),
DATEPART(wk, orderDate),
DATEPART(month, orderDate)
ORDER BY DATEPART(year, orderDate),
DATEPART(wk, orderDate)
This returns results like:
year month week orderCount
2008 1 1 23
2008 3 12 5
...
As you can see, only weeks that have orders for this customer will be returned in the resultset. I need it to return a row for every week in the past 12 months... if no order exists in the week then returning 0 for orderCount would be fine, but I still need the year, week, and month. I can probably do it by creating a separate table storing the weeks of the year, then left outer join against it, but would prefer not to. Perhaps there's something in SQL that can accomplish this? Can I create a query using built in functions to return all the weeks in the past 12 months with built in SQL functions? I'm on SQL Server 2008.
Edit:
Using Scott's suggestion I posted the query solving this problem below.
You could join to a recursive CTE - something like below should give you a start...
WITH MyCte AS
(SELECT MyWeek = 1
UNION ALL
SELECT MyWeek + 1
FROM MyCte
WHERE MyWeek < 53)
SELECT MyWeek,
DATEPART(year, DATEADD(wk, -MyWeek, GETDATE())),
DATEPART(month, DATEADD(wk, -MyWeek, GETDATE())),
DATEPART(wk, DATEADD(wk, -MyWeek, GETDATE()))
FROM MyCte
The table method you are already aware is the best way to go. Not only does it give you alot of control, but it is the best performing.
You could write sql code (user function) to do this, but it won't be as flexible. RDBMs are made for handling sets.
Solution using CTE: (thanks to Scott's suggestion)
;WITH MyCte AS
(SELECT MyWeek = 1
UNION ALL
SELECT MyWeek + 1
FROM MyCte
WHERE MyWeek < 53)
SELECT myc.[year],
myc.[month],
myc.[week],
isnull(t.orderCount,0) AS orderCount,
isnull(t.orderTotal,0) AS orderTotal
FROM (SELECT MyWeek,
DATEPART(year, DATEADD(wk, -MyWeek, GETDATE())) AS [year],
DATEPART(month, DATEADD(wk, -MyWeek, GETDATE())) AS [month],
DATEPART(wk, DATEADD(wk, -MyWeek, GETDATE())) AS [week]
FROM MyCte) myc
LEFT OUTER JOIN
(SELECT DATEPART(year, orderDate) AS [year],
DATEPART(month, orderDate) AS [month],
DATEPART(wk, orderDate) AS [week],
COUNT(1) AS orderCount,
SUM(orderTotal) AS orderTotal
FROM dbo.Orders (NOLOCK)
WHERE customerID = #custnum
AND orderDate >= DATEADD(month, -12, GETDATE())
GROUP BY DATEPART(year, ODR_DATE),
DATEPART(wk, orderDate),
DATEPART(month, orderDate)) t ON t.[year] = myc.[year] AND t.[week] = myc.[week]
ORDER BY myc.[year],
myc.[week]
Edit: just noticed one week is being duplicated (2 records for the same week)... probably a simple logical error... disregard... ID-10-T... apparently a week can span months... who would have known lol
In the past I've done this using the table approach that you mention, the other way was to create a table function that I could pass arguments to specifying the start and end range that I wanted and it would build results dynamically to save needing to add a table with all the data.

Calculate mode (or frequency) distribution of a value over time in SQL Server

Given the following table, how does one calculate the hourly mode, or value with the highest frequency by hour?
CREATE TABLE Values
(
ValueID int NOT NULL,
Value int NOT NULL,
LogTime datetime NOT NULL
)
So far, I've come up with the following query.
SELECT count(*) AS Frequency,
DatePart(yy, LogTime) as [Year],
DatePart(mm, LogTime) as [Month],
DatePart(dd, LogTime) as [Day],
DatePart(hh, LogTime) as [Hour]
FROM Values
GROUP BY
Value,
DatePart(yy, LogTime),
DatePart(mm, LogTime),
DatePart(dd, LogTime),
DatePart(hh, LogTime)
However, this yields the frequency of each distinct value by hour. How do I add a constraint to only return the value with the maximum frequency by hour?
Thanks
The following query may look odd... but it works and it gives you what you want. This query will give you the value that had the highest frequency in a particular "hour" (slice of time).
I am NOT dividing into Year, Month, Day, etc... only hour (as you requested) even though you had those other fields in your example query.
I chose to do "MAX(Value)" below, because the case can come up where more than one "value" tied for first place with the highest frequency by hour. You can choose to do MIN, or MAX or some other 'tiebreaker' if you want.
WITH GroupedValues (Value, Frequency, Hour) AS
(SELECT
Value,
COUNT(*) AS Frequency,
DATEPART(hh, LogTime) AS Hour
FROM
dbo.MyValues
GROUP BY
Value,
DATEPART(hh, LogTime))
SELECT
MAX(Value) AS Value,
a.Hour
FROM
GroupedValues a INNER JOIN
(SELECT MAX(Frequency) AS MaxFrequency,
Hour FROM GroupedValues GROUP BY Hour) b
ON a.Frequency = b.MaxFrequency AND a.Hour = b.Hour
GROUP BY
a.Hour
Nest the aggregates...
SELECT
MAX(Frequency) AS [Mode],
[Year],[Month],[Day],[Hour]
FROM
(SELECT
COUNT(*) AS Frequency,
DatePart(yy, LogTime) as [Year],
DatePart(mm, LogTime) as [Month],
DatePart(dd, LogTime) as [Day],
DatePart(hh, LogTime) as [Hour]
FROM
Values
GROUP BY
Value,
DatePart(yy, LogTime),
DatePart(mm, LogTime),
DatePart(dd, LogTime),
DatePart(hh, LogTime)
) foo
GROUP By
[Year],[Month],[Day],[Hour]