Average Insertion Rate - sql

I've got a table with a column indicating the date and time each row was inserted into the table. I'm trying to get a statistics for the average and peak rates of insertions:
Peak insertions per minute
Peak insertions per second
Average insertions per minute
Average insertions per second
I can envisage a solution using a GROUP BY to put the data into "buckets" (one for each interval) and then average the count of items in each, however it seems a very clunky solution.
Is there a more elegant T-SQL solution to this problem?

Grouping Sets are the way to go, they're are intended for this very application of grouping by multiple sets of grouping attributes (grouping sets) in one query, and should result in better execution plans i.e. better performance:
-- if you weren't grouping by minutes and seconds this would
-- probably look more 'elegant'
SELECT
GROUPING_ID(
YEAR(orderdate),
MONTH(orderdate),
DAY(orderdate),
DATEPART(hour, orderdate),
DATEPART(MINUTE, orderdate),
DATEPART(SECOND, orderdate)) AS grp_id,
MAX([Insertions]) AS max_insertions,
AVG([Average]) AS avg_insertions,
YEAR(orderdate) AS order_year,
MONTH(orderdate) AS order_month,
DAY(orderdate) AS order_day,
DATEPART(HOUR, orderdate) AS order_hour,
DATEPART(MINUTE, orderdate) AS order_minute,
DATEPART(SECOND, orderdate) AS order_second -- this will be null if the grouping set is minute
FROM Sales.Orders
GROUP BY
GROUPING SETS
(
(
-- grouping set 1: order second
YEAR(orderdate),
MONTH(orderdate),
DAY(orderdate),
DATEPART(hour, orderdate),
DATEPART(MINUTE, orderdate),
DATEPART(SECOND, orderdate)
),
(
-- grouping set 2: order minute
YEAR(orderdate),
MONTH(orderdate),
DAY(orderdate),
DATEPART(hour, orderdate),
DATEPART(MINUTE, orderdate)
)
);

GROUP BY is the way to go.
I would just make a CTE for each time interval you want, and select the max for each one:
;WITH CTEMinute AS
(
SELECT YEAR(datefield) yr,
MONTH(datefield) mo,
DAY(datefield) d,
DATEPART(hour, datefield) hr,
DATEPART(minute, datefield) Mint,
COUNT(*) as 'Inserts'
FROM MyTable
GROUP BY YEAR(datefield),
MONTH(datefield),
DAY(datefield),
DATEPART(hour, datefield),
DATEPART(minute, datefield)
)
,CTESecond AS
(
SELECT YEAR(datefield) yr,
MONTH(datefield) mo,
DAY(datefield) d,
DATEPART(hour, datefield) hr,
DATEPART(minute, datefield) Mint,
DATEPART(second, datefield) sec,
COUNT(*) as 'Inserts'
FROM MyTable
GROUP BY YEAR(datefield),
MONTH(datefield),
DAY(datefield),
DATEPART(hour, datefield),
DATEPART(minute, datefield),
DATEPART(second, datefield)
)
Then you can just select from those CTEs to get max/min/avg values per time unit.
If you want it to be more elegant you can potentially just make on CTE for as fine granularity as you are likely to want (i.e. milliseconds or whatever), and then you can SELECT/GROUP BY that.
The issue with doing that is CTEs don't really perform that well since they are basically disposable views with no indexes or anything, so aggregating a CTE within another query will quickly bog down.

Expanding on J Coopers answer, I think the Rollup Feature might be what you're after.
SELECT
MAX([Insertions]) AS max_insertions,
AVG([Average]) AS avg_insertions,
YEAR(orderdate), AS YEAR
MONTH(orderdate), AS MONTH
DAY(orderdate), AS DAY
DATEPART(hour, orderdate), AS HOUR
DATEPART(MINUTE, orderdate), AS MINUTE
DATEPART(SECOND, orderdate) AS SECOND
FROM Sales.Orders
GROUP BY ROLLUP(
YEAR(orderdate),
MONTH(orderdate),
DAY(orderdate),
DATEPART(hour, orderdate),
DATEPART(MINUTE, orderdate),
DATEPART(SECOND, orderdate)
)

Related

Is there a better way to group a log on minutes?

I have the following that selects from a log and groups down to minute (excluding seconds and milisec):
SELECT DATEPART(YEAR, [Date]) AS YEAR, DATEPART(MONTH, [Date]) AS MONTH,
DATEPART(DAY, [Date]) AS DAY, DATEPART(HOUR, [Date]) AS HOUR,
DATEPART(MINUTE, [Date]) AS MIN, COUNT(*) AS COUNT
FROM [database].[dbo].[errorlog]
GROUP BY DATEPART(YEAR, [Date]), DATEPART(MONTH, [Date]), DATEPART(DAY, [Date]),
DATEPART(HOUR, [Date]), DATEPART(MINUTE, [Date])
ORDER BY DATEPART(YEAR, [Date]) DESC, DATEPART(MONTH, [Date]) DESC,
DATEPART(DAY, [Date]) DESC, DATEPART(HOUR, [Date]) DESC,
DATEPART(MINUTE, [Date]) DESC;
But as you can see thats a lot of fuzz just for getting a count, so I wonder if there is a better way to group it so I get grouped down to minutes in respect to year, month, day and hour?
This should would work:
select CAST([Date] AS smalldatetime) as time_stamp, count(*) as count
FROM [database].[dbo].[errorlog]
group by CAST([Date] AS smalldatetime)
order by CAST([Date] AS smalldatetime) desc;
Update after comments on this answer:
select dateadd(second,-datepart(ss,[Date]),[Date]) as time_stamp, count(*) as count
FROM [database].[dbo].[errorlog]
group by dateadd(second,-datepart(ss,[Date]),[Date])
order by dateadd(second,-datepart(ss,[Date]),[Date]) desc ;
The first solution rounds up the timestamp to the nearest minute. I realised that this is not exactly what the OP wanted.
So, the second solution just substracts the seconds part from the timestamp and leaves the timestamp with seconds as zero(Assuming [Date] does not have fractional seconds)
DATEADD(minute,DATEDIFF(minute,'20010101',[Date]),'20010101')
Should round all Date column values down to the nearest minute. So:
SELECT DATEADD(minute,DATEDIFF(minute,'20010101',[Date]),'20010101'),
COUNT(*) AS COUNT
FROM [database].[dbo].[errorlog]
GROUP BY DATEADD(minute,DATEDIFF(minute,'20010101',[Date]),'20010101')
ORDER BY DATEADD(minute,DATEDIFF(minute,'20010101',[Date]),'20010101') DESC;
(You could move this expression into a subquery if you want to further reduce the repetition)
You could do something like this to get
declare #now datetime
set #now = GETDATE()
select dateadd(minute, mm, #now) as date, c from (
select DATEDIFF(minute, #now, [Date]) as mm, COUNT(1) as c
from [database].[dbo].[errorlog]
group by DATEDIFF(minute, #now, [Date])
) t

SQL - select most 'active' time from db

Very closely related to SQL - Select most 'active' timespan fromdb but different question.
"I have a table of transactions. In this table I store the transaction datetime in UTC. I have a few months of data, about 20,000 transactions a day."
How would change
select datepart(hour, the_column) as [hour], count(*) as total
from t
group by datepart(hour, the_column)
order by total desc
so that I can select the specific year, month, day, hour, minute, and second that was the most 'active'.
To clarify, I'm not looking for which hour or minute of the day was most active. Rather, which moment in time was the most active.
Select
DATEPART(year, the_column) as year
,DATEPART(dayofyear,the_column) as day
,DATEPART(hh, the_column) as hour
,DATEPART(mi,the_column) as minute
,DATEPART(ss, the_column) as second
,count(*) as count from t
Group By
DATEPART(year, the_column)
, DATEPART(dayofyear,the_column)
, DATEPART(hh, the_column)
, DATEPART(mi,the_column)
, DATEPART(ss, the_column)
order by count desc
If minute resolution is enough:
select top 1 cast(the_column as smalldatetime) as moment, count(*) as total
from t
group by cast(the_column as smalldatetime)
order by total desc

SQL Server / T-SQL: Selecting a specific interval ( group by )

I want to write a select that aggregates over data (which has a DATETIME column as ID) with ANY interval theoretically possible (like 1hr, 1hr and 22seconds, 1year and 3minutes, etc. ).
This select should be able to aggregate by 1hr, 12min, 14seconds and should return 3 rows
SELECT DATEPART(YEAR,id) as year,
DATEPART(MONTH,id) as month,
DATEPART(DAY,id) as day,
DATEPART(HOUR,id) as hour,
DATEPART(MINUTE,id) as minute,
AVG([Open]),
AVG([Close]),
AVG([Min]),
AVG([Max])
FROM QuoteHistory
where id between '2000-02-06 17:00:00.000' and '2000-02-06 20:36:42.000'
GROUP BY
DATEPART(YEAR,id),
DATEPART(MONTH,id),
DATEPART(DAY,id),
DATEPART(HOUR,id),
DATEPART(MINUTE,id)
ORDER BY 1,2,3,4,5;
I am kind of stuck here and can't get my head around this problem.. For "simple intervals" like "30 minutes" i could just add a modulo
DATEPART(MINUTE,id)%2
but when the interval "touches" more than 1 part of the date, I'm stuck.
Any help appreciated, thx!
Assuming some parameters here:
;WITH Date_Ranges AS (
SELECT
#min_datetime AS start_datetime,
DATEADD(SECOND, #seconds,
DATEADD(MINUTE, #minutes,
DATEADD(HOUR, #hours,
DATEADD(DAY, #days,
DATEADD(WEEK, #weeks,
DATEADD(MONTH, #months,
DATEADD(YEAR, #years, #min_datetime))))))) AS end_datetime
UNION ALL
SELECT
DATEADD(SECOND, 1, end_datetime),
DATEADD(SECOND, #seconds,
DATEADD(MINUTE, #minutes,
DATEADD(HOUR, #hours,
DATEADD(DAY, #days,
DATEADD(WEEK, #weeks,
DATEADD(MONTH, #months,
DATEADD(YEAR, #years, end_datetime)))))))
FROM
Date_Ranges
WHERE
DATEADD(SECOND, 1, end_datetime) < #max_datetime
)
SELECT
DR.min_datetime,
DR.max_datetime,
AVG([Open]),
AVG([Close]),
AVG([Min]),
AVG([Max])
FROM
Date_Ranges DR
LEFT OUTER JOIN Quote_History QH ON
QH.id BETWEEN DR.min_datetime AND DR.max_datetime
GROUP BY
DR.min_datetime,
DR.max_datetime
ORDER BY
DR.min_datetime,
DR.max_datetime
You might need to fiddle with how to handle the edge cases (that 1 second range between date ranges could be a problem depending on your data). This should hopefully point you in the right direction though.

SQL Hurdle - SQL Server 2008

The following query returns the total amount of orders, per week, for the past 12 months (for a specific customer):
SELECT DATEPART(year, orderDate) AS [year],
DATEPART(month, orderDate) AS [month],
DATEPART(wk, orderDate) AS [week],
COUNT(1) AS orderCount
FROM dbo.Orders (NOLOCK)
WHERE customerNumber = #custnum
AND orderDate >= DATEADD(month, -12, GETDATE())
GROUP BY DATEPART(year, orderDate),
DATEPART(wk, orderDate),
DATEPART(month, orderDate)
ORDER BY DATEPART(year, orderDate),
DATEPART(wk, orderDate)
This returns results like:
year month week orderCount
2008 1 1 23
2008 3 12 5
...
As you can see, only weeks that have orders for this customer will be returned in the resultset. I need it to return a row for every week in the past 12 months... if no order exists in the week then returning 0 for orderCount would be fine, but I still need the year, week, and month. I can probably do it by creating a separate table storing the weeks of the year, then left outer join against it, but would prefer not to. Perhaps there's something in SQL that can accomplish this? Can I create a query using built in functions to return all the weeks in the past 12 months with built in SQL functions? I'm on SQL Server 2008.
Edit:
Using Scott's suggestion I posted the query solving this problem below.
You could join to a recursive CTE - something like below should give you a start...
WITH MyCte AS
(SELECT MyWeek = 1
UNION ALL
SELECT MyWeek + 1
FROM MyCte
WHERE MyWeek < 53)
SELECT MyWeek,
DATEPART(year, DATEADD(wk, -MyWeek, GETDATE())),
DATEPART(month, DATEADD(wk, -MyWeek, GETDATE())),
DATEPART(wk, DATEADD(wk, -MyWeek, GETDATE()))
FROM MyCte
The table method you are already aware is the best way to go. Not only does it give you alot of control, but it is the best performing.
You could write sql code (user function) to do this, but it won't be as flexible. RDBMs are made for handling sets.
Solution using CTE: (thanks to Scott's suggestion)
;WITH MyCte AS
(SELECT MyWeek = 1
UNION ALL
SELECT MyWeek + 1
FROM MyCte
WHERE MyWeek < 53)
SELECT myc.[year],
myc.[month],
myc.[week],
isnull(t.orderCount,0) AS orderCount,
isnull(t.orderTotal,0) AS orderTotal
FROM (SELECT MyWeek,
DATEPART(year, DATEADD(wk, -MyWeek, GETDATE())) AS [year],
DATEPART(month, DATEADD(wk, -MyWeek, GETDATE())) AS [month],
DATEPART(wk, DATEADD(wk, -MyWeek, GETDATE())) AS [week]
FROM MyCte) myc
LEFT OUTER JOIN
(SELECT DATEPART(year, orderDate) AS [year],
DATEPART(month, orderDate) AS [month],
DATEPART(wk, orderDate) AS [week],
COUNT(1) AS orderCount,
SUM(orderTotal) AS orderTotal
FROM dbo.Orders (NOLOCK)
WHERE customerID = #custnum
AND orderDate >= DATEADD(month, -12, GETDATE())
GROUP BY DATEPART(year, ODR_DATE),
DATEPART(wk, orderDate),
DATEPART(month, orderDate)) t ON t.[year] = myc.[year] AND t.[week] = myc.[week]
ORDER BY myc.[year],
myc.[week]
Edit: just noticed one week is being duplicated (2 records for the same week)... probably a simple logical error... disregard... ID-10-T... apparently a week can span months... who would have known lol
In the past I've done this using the table approach that you mention, the other way was to create a table function that I could pass arguments to specifying the start and end range that I wanted and it would build results dynamically to save needing to add a table with all the data.

Calculate mode (or frequency) distribution of a value over time in SQL Server

Given the following table, how does one calculate the hourly mode, or value with the highest frequency by hour?
CREATE TABLE Values
(
ValueID int NOT NULL,
Value int NOT NULL,
LogTime datetime NOT NULL
)
So far, I've come up with the following query.
SELECT count(*) AS Frequency,
DatePart(yy, LogTime) as [Year],
DatePart(mm, LogTime) as [Month],
DatePart(dd, LogTime) as [Day],
DatePart(hh, LogTime) as [Hour]
FROM Values
GROUP BY
Value,
DatePart(yy, LogTime),
DatePart(mm, LogTime),
DatePart(dd, LogTime),
DatePart(hh, LogTime)
However, this yields the frequency of each distinct value by hour. How do I add a constraint to only return the value with the maximum frequency by hour?
Thanks
The following query may look odd... but it works and it gives you what you want. This query will give you the value that had the highest frequency in a particular "hour" (slice of time).
I am NOT dividing into Year, Month, Day, etc... only hour (as you requested) even though you had those other fields in your example query.
I chose to do "MAX(Value)" below, because the case can come up where more than one "value" tied for first place with the highest frequency by hour. You can choose to do MIN, or MAX or some other 'tiebreaker' if you want.
WITH GroupedValues (Value, Frequency, Hour) AS
(SELECT
Value,
COUNT(*) AS Frequency,
DATEPART(hh, LogTime) AS Hour
FROM
dbo.MyValues
GROUP BY
Value,
DATEPART(hh, LogTime))
SELECT
MAX(Value) AS Value,
a.Hour
FROM
GroupedValues a INNER JOIN
(SELECT MAX(Frequency) AS MaxFrequency,
Hour FROM GroupedValues GROUP BY Hour) b
ON a.Frequency = b.MaxFrequency AND a.Hour = b.Hour
GROUP BY
a.Hour
Nest the aggregates...
SELECT
MAX(Frequency) AS [Mode],
[Year],[Month],[Day],[Hour]
FROM
(SELECT
COUNT(*) AS Frequency,
DatePart(yy, LogTime) as [Year],
DatePart(mm, LogTime) as [Month],
DatePart(dd, LogTime) as [Day],
DatePart(hh, LogTime) as [Hour]
FROM
Values
GROUP BY
Value,
DatePart(yy, LogTime),
DatePart(mm, LogTime),
DatePart(dd, LogTime),
DatePart(hh, LogTime)
) foo
GROUP By
[Year],[Month],[Day],[Hour]